[SRU][Xenial][PATCH 0/1] Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][Xenial][PATCH 0/1] Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver

Heitor Alves de Siqueira
BugLink: https://bugs.launchpad.net/bugs/1869229

[Impact]
When mounting LVM snapshots using xfs, it's possible to hit a BUG_ON()
in nvme driver.

Upstream commit 729204ef49ec ("block: relax check on sg gap") introduced
a way to merge bios if they are physically contiguous. This can lead to
issues if one rq starts with a non-aligned buffer, as it can cause the
merged segment to end in an unaligned virtual boundary. In some AWS
instances, it's possible to craft such a request when attempting to
mount LVM snapshots using xfs. This will then cause a kernel spew due to
a BUG_ON in nvme_setup_prps(), which checks if dma_len is aligned to the
page size.

[Fix]
Upstream commit 5a8d75a1b8c9 ("block: fix bio_will_gap() for first bvec
with offset") prevents requests that begin with an unaligned buffer
from being merged.

[Test Case]
This has been verified on AWS with c5d.large instances.

1) Prepare the LVM device + snapshot:
$ sudo vgcreate vg0 /dev/nvme1n1
$ sudo lvcreate -L5G -n data0 vg0
$ sudo mkfs.xfs /dev/vg0/data0
$ sudo mount /dev/vg0/data0 /mnt
$ sudo touch /mnt/test
$ sudo touch /mnt/test2
$ sudo ls /mnt
$ sudo umount /mnt
$ sudo lvcreate -l100%FREE -s /dev/vg0/data0 -n data0_snap

2) Attempting to mount the snapshot results in the Oops:
$ sudo mount /dev/vg0/data0_snap /mnt
Segmentation fault (core dumped)

[Regression Potential]
The fix prevents some bios from being merged, so it can have a
performance impact in certain scenarios. The patch only targets
misaligned segments, so the impact should be less noticeable in the
general case.  The commit is also present in mainline kernels since
4.13, and hasn't been changed significantly, so potential for other
regressions should be low.

Ming Lei (1):
  block: fix bio_will_gap() for first bvec with offset

 include/linux/blkdev.h | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

--
2.26.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Xenial][PATCH 1/1] block: fix bio_will_gap() for first bvec with offset

Heitor Alves de Siqueira
From: Ming Lei <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1869229

Commit 729204ef49ec("block: relax check on sg gap") allows us to merge
bios, if both are physically contiguous.  This change can merge a huge
number of small bios, through mkfs for example, mkfs.ntfs running time
can be decreased to ~1/10.

But if one rq starts with a non-aligned buffer (the 1st bvec's bv_offset
is non-zero) and if we allow the merge, it is quite difficult to respect
sg gap limit, especially the max segment size, or we risk having an
unaligned virtual boundary.  This patch tries to avoid the issue by
disallowing a merge, if the req starts with an unaligned buffer.

Also add comments to explain why the merged segment can't end in
unaligned virt boundary.

Fixes: 729204ef49ec ("block: relax check on sg gap")
Tested-by: Johannes Thumshirn <[hidden email]>
Reviewed-by: Johannes Thumshirn <[hidden email]>
Signed-off-by: Ming Lei <[hidden email]>

Rewrote parts of the commit message and comments.

Signed-off-by: Jens Axboe <[hidden email]>
(cherry picked from commit 5a8d75a1b8c99bdc926ba69b7b7dbe4fae81a5af)
Signed-off-by: Matthew Ruffell <[hidden email]>
Signed-off-by: Heitor Alves de Siqueira <[hidden email]>
---
 include/linux/blkdev.h | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7548f332121a..01a696b0a4d3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1672,12 +1672,36 @@ static inline bool bios_segs_mergeable(struct request_queue *q,
  return true;
 }
 
-static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
- struct bio *next)
+static inline bool bio_will_gap(struct request_queue *q,
+ struct request *prev_rq,
+ struct bio *prev,
+ struct bio *next)
 {
  if (bio_has_data(prev) && queue_virt_boundary(q)) {
  struct bio_vec pb, nb;
 
+ /*
+ * don't merge if the 1st bio starts with non-zero
+ * offset, otherwise it is quite difficult to respect
+ * sg gap limit. We work hard to merge a huge number of small
+ * single bios in case of mkfs.
+ */
+ if (prev_rq)
+ bio_get_first_bvec(prev_rq->bio, &pb);
+ else
+ bio_get_first_bvec(prev, &pb);
+ if (pb.bv_offset)
+ return true;
+
+ /*
+ * We don't need to worry about the situation that the
+ * merged segment ends in unaligned virt boundary:
+ *
+ * - if 'pb' ends aligned, the merged segment ends aligned
+ * - if 'pb' ends unaligned, the next bio must include
+ *   one single bvec of 'nb', otherwise the 'nb' can't
+ *   merge with 'pb'
+ */
  bio_get_last_bvec(prev, &pb);
  bio_get_first_bvec(next, &nb);
 
@@ -1690,12 +1714,12 @@ static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
 
 static inline bool req_gap_back_merge(struct request *req, struct bio *bio)
 {
- return bio_will_gap(req->q, req->biotail, bio);
+ return bio_will_gap(req->q, req, req->biotail, bio);
 }
 
 static inline bool req_gap_front_merge(struct request *req, struct bio *bio)
 {
- return bio_will_gap(req->q, bio, req->bio);
+ return bio_will_gap(req->q, NULL, bio, req->bio);
 }
 
 int kblockd_schedule_work(struct work_struct *work);
--
2.26.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Xenial][PATCH 0/1] Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver

Marcelo Henrique Cerri
In reply to this post by Heitor Alves de Siqueira
Reasonable change, clean cherry-pick with positive test results:

Acked-by: Marcelo Henrique Cerri <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Xenial][PATCH 1/1] block: fix bio_will_gap() for first bvec with offset

Sultan Alsawaf
In reply to this post by Heitor Alves de Siqueira
Looks good and doesn't have a subsequent 'Fixes' commit in upstream.

Acked-by: Sultan Alsawaf <[hidden email]>

On Thu, Mar 26, 2020 at 03:27:36PM -0300, Heitor Alves de Siqueira wrote:

> From: Ming Lei <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1869229
>
> Commit 729204ef49ec("block: relax check on sg gap") allows us to merge
> bios, if both are physically contiguous.  This change can merge a huge
> number of small bios, through mkfs for example, mkfs.ntfs running time
> can be decreased to ~1/10.
>
> But if one rq starts with a non-aligned buffer (the 1st bvec's bv_offset
> is non-zero) and if we allow the merge, it is quite difficult to respect
> sg gap limit, especially the max segment size, or we risk having an
> unaligned virtual boundary.  This patch tries to avoid the issue by
> disallowing a merge, if the req starts with an unaligned buffer.
>
> Also add comments to explain why the merged segment can't end in
> unaligned virt boundary.
>
> Fixes: 729204ef49ec ("block: relax check on sg gap")
> Tested-by: Johannes Thumshirn <[hidden email]>
> Reviewed-by: Johannes Thumshirn <[hidden email]>
> Signed-off-by: Ming Lei <[hidden email]>
>
> Rewrote parts of the commit message and comments.
>
> Signed-off-by: Jens Axboe <[hidden email]>
> (cherry picked from commit 5a8d75a1b8c99bdc926ba69b7b7dbe4fae81a5af)
> Signed-off-by: Matthew Ruffell <[hidden email]>
> Signed-off-by: Heitor Alves de Siqueira <[hidden email]>
> ---
>  include/linux/blkdev.h | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 7548f332121a..01a696b0a4d3 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1672,12 +1672,36 @@ static inline bool bios_segs_mergeable(struct request_queue *q,
>   return true;
>  }
>  
> -static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
> - struct bio *next)
> +static inline bool bio_will_gap(struct request_queue *q,
> + struct request *prev_rq,
> + struct bio *prev,
> + struct bio *next)
>  {
>   if (bio_has_data(prev) && queue_virt_boundary(q)) {
>   struct bio_vec pb, nb;
>  
> + /*
> + * don't merge if the 1st bio starts with non-zero
> + * offset, otherwise it is quite difficult to respect
> + * sg gap limit. We work hard to merge a huge number of small
> + * single bios in case of mkfs.
> + */
> + if (prev_rq)
> + bio_get_first_bvec(prev_rq->bio, &pb);
> + else
> + bio_get_first_bvec(prev, &pb);
> + if (pb.bv_offset)
> + return true;
> +
> + /*
> + * We don't need to worry about the situation that the
> + * merged segment ends in unaligned virt boundary:
> + *
> + * - if 'pb' ends aligned, the merged segment ends aligned
> + * - if 'pb' ends unaligned, the next bio must include
> + *   one single bvec of 'nb', otherwise the 'nb' can't
> + *   merge with 'pb'
> + */
>   bio_get_last_bvec(prev, &pb);
>   bio_get_first_bvec(next, &nb);
>  
> @@ -1690,12 +1714,12 @@ static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
>  
>  static inline bool req_gap_back_merge(struct request *req, struct bio *bio)
>  {
> - return bio_will_gap(req->q, req->biotail, bio);
> + return bio_will_gap(req->q, req, req->biotail, bio);
>  }
>  
>  static inline bool req_gap_front_merge(struct request *req, struct bio *bio)
>  {
> - return bio_will_gap(req->q, bio, req->bio);
> + return bio_will_gap(req->q, NULL, bio, req->bio);
>  }
>  
>  int kblockd_schedule_work(struct work_struct *work);
> --
> 2.26.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][Xenial][PATCH 0/1] Mounting LVM snapshots with xfs can hit kernel BUG in nvme driver

Kelsey Skunberg
In reply to this post by Heitor Alves de Siqueira
On 2020-03-26 15:27:35 , Heitor Alves de Siqueira wrote:

> BugLink: https://bugs.launchpad.net/bugs/1869229
>
> [Impact]
> When mounting LVM snapshots using xfs, it's possible to hit a BUG_ON()
> in nvme driver.
>
> Upstream commit 729204ef49ec ("block: relax check on sg gap") introduced
> a way to merge bios if they are physically contiguous. This can lead to
> issues if one rq starts with a non-aligned buffer, as it can cause the
> merged segment to end in an unaligned virtual boundary. In some AWS
> instances, it's possible to craft such a request when attempting to
> mount LVM snapshots using xfs. This will then cause a kernel spew due to
> a BUG_ON in nvme_setup_prps(), which checks if dma_len is aligned to the
> page size.
>
> [Fix]
> Upstream commit 5a8d75a1b8c9 ("block: fix bio_will_gap() for first bvec
> with offset") prevents requests that begin with an unaligned buffer
> from being merged.
>
> [Test Case]
> This has been verified on AWS with c5d.large instances.
>
> 1) Prepare the LVM device + snapshot:
> $ sudo vgcreate vg0 /dev/nvme1n1
> $ sudo lvcreate -L5G -n data0 vg0
> $ sudo mkfs.xfs /dev/vg0/data0
> $ sudo mount /dev/vg0/data0 /mnt
> $ sudo touch /mnt/test
> $ sudo touch /mnt/test2
> $ sudo ls /mnt
> $ sudo umount /mnt
> $ sudo lvcreate -l100%FREE -s /dev/vg0/data0 -n data0_snap
>
> 2) Attempting to mount the snapshot results in the Oops:
> $ sudo mount /dev/vg0/data0_snap /mnt
> Segmentation fault (core dumped)
>
> [Regression Potential]
> The fix prevents some bios from being merged, so it can have a
> performance impact in certain scenarios. The patch only targets
> misaligned segments, so the impact should be less noticeable in the
> general case.  The commit is also present in mainline kernels since
> 4.13, and hasn't been changed significantly, so potential for other
> regressions should be low.
>
> Ming Lei (1):
>   block: fix bio_will_gap() for first bvec with offset
>
>  include/linux/blkdev.h | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)
>

Applied to xenial/master-next, thank you!

-Kelsey
 
> --
> 2.26.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team