[SRU][Bionic][PATCH 0/1] block: do not use interruptible wait anywhere

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][Bionic][PATCH 0/1] block: do not use interruptible wait anywhere

Joseph Salisbury-3
BugLink: http://bugs.launchpad.net/bugs/1776887

== SRU Justification ==
This upstream bug has been confirmed to affect Ubuntu users[1]. As per the
fix commit (below), the most frequent symptom is a crash of Xorg/Xwayland,
i.e. killing the entire GUI, when a laptop is woken from system sleep.
Frequency of the bug is described as once every few days[2].

[1] E.g. this user confirms the bug & very specific workaround:
https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1760450/comments/11
[2] E.g. this log of crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1553979#c23

This is a bug in blk-core.c. It is not specific to any one hardware driver.
Technically the suspend bug is triggered by the SCSI core - which is used by *all SATA devices*.

== Fix ==
1dc3039bc87a ("block: do not use interruptible wait anywhere")

== Regression Potential ==
Low.  This patch has been sent to stable, so it has had additional
upstream review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

Alan Jenkins (1):
  block: do not use interruptible wait anywhere

 block/blk-core.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Bionic][PATCH 1/1] block: do not use interruptible wait anywhere

Joseph Salisbury-3
From: Alan Jenkins <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1776887

When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Reviewed-by: Bart Van Assche <[hidden email]>
Cc: [hidden email]
Signed-off-by: Alan Jenkins <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
(cherry picked from commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 block/blk-core.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 322c47f..246ea84 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
  while (true) {
  bool success = false;
- int ret;
 
  rcu_read_lock();
  if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
  */
  smp_rmb();
 
- ret = wait_event_interruptible(q->mq_freeze_wq,
- (atomic_read(&q->mq_freeze_depth) == 0 &&
- (preempt || !blk_queue_preempt_only(q))) ||
- blk_queue_dying(q));
+ wait_event(q->mq_freeze_wq,
+   (atomic_read(&q->mq_freeze_depth) == 0 &&
+    (preempt || !blk_queue_preempt_only(q))) ||
+   blk_queue_dying(q));
  if (blk_queue_dying(q))
  return -ENODEV;
- if (ret)
- return ret;
  }
 }
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Bionic][PATCH 0/1] block: do not use interruptible wait anywhere

Seth Forshee
In reply to this post by Joseph Salisbury-3
On Fri, Jun 29, 2018 at 09:39:33AM -0400, Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1776887
>
> == SRU Justification ==
> This upstream bug has been confirmed to affect Ubuntu users[1]. As per the
> fix commit (below), the most frequent symptom is a crash of Xorg/Xwayland,
> i.e. killing the entire GUI, when a laptop is woken from system sleep.
> Frequency of the bug is described as once every few days[2].
>
> [1] E.g. this user confirms the bug & very specific workaround:
> https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1760450/comments/11
> [2] E.g. this log of crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1553979#c23
>
> This is a bug in blk-core.c. It is not specific to any one hardware driver.
> Technically the suspend bug is triggered by the SCSI core - which is used by *all SATA devices*.
>
> == Fix ==
> 1dc3039bc87a ("block: do not use interruptible wait anywhere")
>
> == Regression Potential ==
> Low.  This patch has been sent to stable, so it has had additional
> upstream review.
>
> == Test Case ==
> A test kernel was built with this patch and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.

Acked-by: Seth Forshee <[hidden email]>

Note that this commit is already in unstable as it was merged in
v4.17-rc3.

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Bionic][PATCH 0/1] block: do not use interruptible wait anywhere

Kleber Sacilotto de Souza
In reply to this post by Joseph Salisbury-3
On 06/29/18 15:39, Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1776887
>
> == SRU Justification ==
> This upstream bug has been confirmed to affect Ubuntu users[1]. As per the
> fix commit (below), the most frequent symptom is a crash of Xorg/Xwayland,
> i.e. killing the entire GUI, when a laptop is woken from system sleep.
> Frequency of the bug is described as once every few days[2].
>
> [1] E.g. this user confirms the bug & very specific workaround:
> https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1760450/comments/11
> [2] E.g. this log of crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1553979#c23
>
> This is a bug in blk-core.c. It is not specific to any one hardware driver.
> Technically the suspend bug is triggered by the SCSI core - which is used by *all SATA devices*.
>
> == Fix ==
> 1dc3039bc87a ("block: do not use interruptible wait anywhere")
>
> == Regression Potential ==
> Low.  This patch has been sent to stable, so it has had additional
> upstream review.
>
> == Test Case ==
> A test kernel was built with this patch and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
> Alan Jenkins (1):
>   block: do not use interruptible wait anywhere
>
>  block/blk-core.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][Bionic][PATCH 0/1] block: do not use interruptible wait anywhere

Khaled Elmously
In reply to this post by Joseph Salisbury-3
This commit already exists in Bionic as part of "upstream stable patchset 2018-06-22"

On 2018-06-29 09:39:33 , Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1776887
>
> == SRU Justification ==
> This upstream bug has been confirmed to affect Ubuntu users[1]. As per the
> fix commit (below), the most frequent symptom is a crash of Xorg/Xwayland,
> i.e. killing the entire GUI, when a laptop is woken from system sleep.
> Frequency of the bug is described as once every few days[2].
>
> [1] E.g. this user confirms the bug & very specific workaround:
> https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1760450/comments/11
> [2] E.g. this log of crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1553979#c23
>
> This is a bug in blk-core.c. It is not specific to any one hardware driver.
> Technically the suspend bug is triggered by the SCSI core - which is used by *all SATA devices*.
>
> == Fix ==
> 1dc3039bc87a ("block: do not use interruptible wait anywhere")
>
> == Regression Potential ==
> Low.  This patch has been sent to stable, so it has had additional
> upstream review.
>
> == Test Case ==
> A test kernel was built with this patch and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
> Alan Jenkins (1):
>   block: do not use interruptible wait anywhere
>
>  block/blk-core.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> --
> 2.7.4
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team