[SRU][E][PATCH v2 0/1] bcache: fix hung task timeout in bch_bucket_alloc()

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][E][PATCH v2 0/1] bcache: fix hung task timeout in bch_bucket_alloc()

Andrea Righi
BugLink: https://bugs.launchpad.net/bugs/1784665

[Impact]

bcache_allocator can call the following:

 bch_allocator_thread()
  -> bch_prio_write()
     -> bch_bucket_alloc()
        -> wait on &ca->set->bucket_wait

But the wake up event on bucket_wait is supposed to come from
bch_allocator_thread() itself causing a deadlock.

[Test Case]

This is a simple script that can easily trigger the deadlock condition:
https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh

A better test case has been also provided in LP: #1796292:
https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh

[Fix]

Fix by making the call to bch_prio_write() non-blocking, so that
bch_allocator_thread() never waits on itself. Moreover, make sure to
wake up the garbage collector thread when bch_prio_write() is failing to
allocate buckets to increase the chance of freeing up more buckets.

[Regression Potential]

Even if this patch is not yet applied upstream (posted to the LKML), it
seems to reliably fix/prevent the specific deadlock problem reported in
this bug, so it should be considered safe to apply it as it is for now,
to prevent potential hung task timeout conditions.

Changes in v2:
 - fix potential buckets leak

----------------------------------------------------------------
Andrea Righi (1):
      UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator

 drivers/md/bcache/alloc.c  |  5 ++++-
 drivers/md/bcache/bcache.h |  2 +-
 drivers/md/bcache/super.c  | 27 +++++++++++++++++++++------
 3 files changed, 26 insertions(+), 8 deletions(-)


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH] UBUNTU: SAUCE: bcache: fix deadlock in bcache_allocator

Andrea Righi
bcache_allocator() can call the following:

 bch_allocator_thread()
  -> bch_prio_write()
     -> bch_bucket_alloc()
        -> wait on &ca->set->bucket_wait

But the wake up event on bucket_wait is supposed to come from
bch_allocator_thread() itself => deadlock:

[ 1158.490744] INFO: task bcache_allocato:15861 blocked for more than 10 seconds.
[ 1158.495929]       Not tainted 5.3.0-050300rc3-generic #201908042232
[ 1158.500653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1158.504413] bcache_allocato D    0 15861      2 0x80004000
[ 1158.504419] Call Trace:
[ 1158.504429]  __schedule+0x2a8/0x670
[ 1158.504432]  schedule+0x2d/0x90
[ 1158.504448]  bch_bucket_alloc+0xe5/0x370 [bcache]
[ 1158.504453]  ? wait_woken+0x80/0x80
[ 1158.504466]  bch_prio_write+0x1dc/0x390 [bcache]
[ 1158.504476]  bch_allocator_thread+0x233/0x490 [bcache]
[ 1158.504491]  kthread+0x121/0x140
[ 1158.504503]  ? invalidate_buckets+0x890/0x890 [bcache]
[ 1158.504506]  ? kthread_park+0xb0/0xb0
[ 1158.504510]  ret_from_fork+0x35/0x40

Fix by making the call to bch_prio_write() non-blocking, so that
bch_allocator_thread() never waits on itself.

Moreover, make sure to wake up the garbage collector thread when
bch_prio_write() is failing to allocate buckets.

BugLink: https://bugs.launchpad.net/bugs/1784665
BugLink: https://bugs.launchpad.net/bugs/1796292
Signed-off-by: Andrea Righi <[hidden email]>
---
 drivers/md/bcache/alloc.c  |  5 ++++-
 drivers/md/bcache/bcache.h |  2 +-
 drivers/md/bcache/super.c  | 27 +++++++++++++++++++++------
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index 6f776823b9ba..a1df0d95151c 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -377,7 +377,10 @@ static int bch_allocator_thread(void *arg)
  if (!fifo_full(&ca->free_inc))
  goto retry_invalidate;
 
- bch_prio_write(ca);
+ if (bch_prio_write(ca, false) < 0) {
+ ca->invalidate_needs_gc = 1;
+ wake_up_gc(ca->set);
+ }
  }
  }
 out:
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index c71bd2062683..577e9d5dfdd0 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -977,7 +977,7 @@ bool bch_cached_dev_error(struct cached_dev *dc);
 __printf(2, 3)
 bool bch_cache_set_error(struct cache_set *c, const char *fmt, ...);
 
-void bch_prio_write(struct cache *ca);
+int bch_prio_write(struct cache *ca, bool wait);
 void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent);
 
 extern struct workqueue_struct *bcache_wq;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index dec03eb7a14f..7b7e1e121474 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -527,12 +527,29 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op,
  closure_sync(cl);
 }
 
-void bch_prio_write(struct cache *ca)
+int bch_prio_write(struct cache *ca, bool wait)
 {
  int i;
  struct bucket *b;
  struct closure cl;
 
+ pr_debug("free_prio=%zu, free_none=%zu, free_inc=%zu",
+ fifo_used(&ca->free[RESERVE_PRIO]),
+ fifo_used(&ca->free[RESERVE_NONE]),
+ fifo_used(&ca->free_inc));
+
+ /*
+ * Pre-check if there are enough free buckets. In the non-blocking
+ * scenario it's better to fail early rather than starting to allocate
+ * buckets and do a cleanup later in case of failure.
+ */
+ if (!wait) {
+ size_t avail = fifo_used(&ca->free[RESERVE_PRIO]) +
+       fifo_used(&ca->free[RESERVE_NONE]);
+ if (prio_buckets(ca) > avail)
+ return -ENOMEM;
+ }
+
  closure_init_stack(&cl);
 
  lockdep_assert_held(&ca->set->bucket_lock);
@@ -542,9 +559,6 @@ void bch_prio_write(struct cache *ca)
  atomic_long_add(ca->sb.bucket_size * prio_buckets(ca),
  &ca->meta_sectors_written);
 
- //pr_debug("free %zu, free_inc %zu, unused %zu", fifo_used(&ca->free),
- // fifo_used(&ca->free_inc), fifo_used(&ca->unused));
-
  for (i = prio_buckets(ca) - 1; i >= 0; --i) {
  long bucket;
  struct prio_set *p = ca->disk_buckets;
@@ -562,7 +576,7 @@ void bch_prio_write(struct cache *ca)
  p->magic = pset_magic(&ca->sb);
  p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8);
 
- bucket = bch_bucket_alloc(ca, RESERVE_PRIO, true);
+ bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
  BUG_ON(bucket == -1);
 
  mutex_unlock(&ca->set->bucket_lock);
@@ -591,6 +605,7 @@ void bch_prio_write(struct cache *ca)
 
  ca->prio_last_buckets[i] = ca->prio_buckets[i];
  }
+ return 0;
 }
 
 static void prio_read(struct cache *ca, uint64_t bucket)
@@ -1891,7 +1906,7 @@ static int run_cache_set(struct cache_set *c)
 
  mutex_lock(&c->bucket_lock);
  for_each_cache(ca, c, i)
- bch_prio_write(ca);
+ bch_prio_write(ca, true);
  mutex_unlock(&c->bucket_lock);
 
  err = "cannot allocate new UUID bucket";
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU][E][PATCH v2 0/1] bcache: fix hung task timeout in bch_bucket_alloc()

Seth Forshee
In reply to this post by Andrea Righi
On Wed, Aug 07, 2019 at 02:58:46PM +0200, Andrea Righi wrote:

> BugLink: https://bugs.launchpad.net/bugs/1784665
>
> [Impact]
>
> bcache_allocator can call the following:
>
>  bch_allocator_thread()
>   -> bch_prio_write()
>      -> bch_bucket_alloc()
>         -> wait on &ca->set->bucket_wait
>
> But the wake up event on bucket_wait is supposed to come from
> bch_allocator_thread() itself causing a deadlock.
>
> [Test Case]
>
> This is a simple script that can easily trigger the deadlock condition:
> https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh
>
> A better test case has been also provided in LP: #1796292:
> https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh

Since I don't see it stated anywhere, I take this to mean that the patch
has been tested against this script. Was any other regression testing
done?

Thanks,
Seth

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU][E][PATCH v2 0/1] bcache: fix hung task timeout in bch_bucket_alloc()

Andrea Righi
On Thu, Aug 08, 2019 at 10:30:08AM -0500, Seth Forshee wrote:

> On Wed, Aug 07, 2019 at 02:58:46PM +0200, Andrea Righi wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1784665
> >
> > [Impact]
> >
> > bcache_allocator can call the following:
> >
> >  bch_allocator_thread()
> >   -> bch_prio_write()
> >      -> bch_bucket_alloc()
> >         -> wait on &ca->set->bucket_wait
> >
> > But the wake up event on bucket_wait is supposed to come from
> > bch_allocator_thread() itself causing a deadlock.
> >
> > [Test Case]
> >
> > This is a simple script that can easily trigger the deadlock condition:
> > https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh
> >
> > A better test case has been also provided in LP: #1796292:
> > https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh
>
> Since I don't see it stated anywhere, I take this to mean that the patch
> has been tested against this script. Was any other regression testing
> done?

Correct, sorry I wasn't clear, curtin-nvme.sh was used to replicate the
bug and verify that the bug was fixed.

-Andrea

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][E][PATCH v2 0/1] bcache: fix hung task timeout in bch_bucket_alloc()

Seth Forshee
In reply to this post by Andrea Righi
On Wed, Aug 07, 2019 at 02:58:46PM +0200, Andrea Righi wrote:

> BugLink: https://bugs.launchpad.net/bugs/1784665
>
> [Impact]
>
> bcache_allocator can call the following:
>
>  bch_allocator_thread()
>   -> bch_prio_write()
>      -> bch_bucket_alloc()
>         -> wait on &ca->set->bucket_wait
>
> But the wake up event on bucket_wait is supposed to come from
> bch_allocator_thread() itself causing a deadlock.
>
> [Test Case]
>
> This is a simple script that can easily trigger the deadlock condition:
> https://launchpadlibrarian.net/381282009/bcache-basic-repro.sh
>
> A better test case has been also provided in LP: #1796292:
> https://bugs.launchpad.net/curtin/+bug/1796292/+attachment/5280353/+files/curtin-nvme.sh
>
> [Fix]
>
> Fix by making the call to bch_prio_write() non-blocking, so that
> bch_allocator_thread() never waits on itself. Moreover, make sure to
> wake up the garbage collector thread when bch_prio_write() is failing to
> allocate buckets to increase the chance of freeing up more buckets.
>
> [Regression Potential]
>
> Even if this patch is not yet applied upstream (posted to the LKML), it
> seems to reliably fix/prevent the specific deadlock problem reported in
> this bug, so it should be considered safe to apply it as it is for now,
> to prevent potential hung task timeout conditions.
>
> Changes in v2:
>  - fix potential buckets leak

Applied to eoan/master-next, thanks!

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team