[SRU X] [PATCH 0/1] NVMe polling on timeout

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU X] [PATCH 0/1] NVMe polling on timeout

Guilherme Piccoli
BugLink: https://launchpad.net/bugs/1807393


[Impact]

* NVMe controllers potentially could miss to send an interrupt, specially
due to bugs in virtual devices(which are common those days - qemu has its
own NVMe virtual device, so does AWS). This would be a difficult to
debug situation, because NVMe driver only reports the request timeout,
not the reason.

* The upstream patch proposed to SRU here was designed to provide more
information in these cases, by pro-actively polling the CQEs on request
timeouts, to check if the specific request was completed and some issue
(probably a missed interrupt) prevented the driver to notice, or if the
request really wasn't completed, which indicates more severe issues.

* Although quite useful for debugging, this patch could help to mitigate
issues in cloud environments like AWS, in case we may have jitter in
request completion and the i/o timeout was set to low values, or even
in case of atypical bugs in the virtual NVMe controller. With this patch,
if polling succeeds the NVMe driver will continue working instead of
trying a reset controller procedure, which may lead to fails in the
rootfs - refer to https://launchpad.net/bugs/1788035.


[Test Case]

* It's a bit tricky to artificially create a situation of missed
interrupt; one idea was to implement a small hack in the NVMe qemu
virtual device that given a trigger in guest kernel, will induce the
virtual device to skip an interrupt. The hack patch is present in a
Launchpad comment, along with instructions to reproduce.


[Regression Potential]

* There are no clear risks in adding such polling mechanism to the NVMe driver;
one bad thing that was neverreported but could happen with this patch is the
device could be in a bad state IRQ-wise that a reset would fix, but
the patch could cause all requests to be completed via polling, which
prevents the adapter reset. This is however a very hypothetical situation,
which would also happen in the mainline kernel (since it has the patch).


Keith Busch (1):
  nvme/pci: Poll CQ on timeout

 drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

--
2.19.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU X] [PATCH 1/1] nvme/pci: Poll CQ on timeout

Guilherme Piccoli
From: Keith Busch <[hidden email]>

BugLink: https://launchpad.net/bugs/1807393

If an IO timeout occurs, it's helpful to know if the controller did not
post a completion or the driver missed an interrupt. While we never expect
the latter, this patch will make it possible to tell the difference so
we don't have to guess.

Signed-off-by: Keith Busch <[hidden email]>
Signed-off-by: Christoph Hellwig <[hidden email]>
Tested-by: Johannes Thumshirn <[hidden email]>
Reviewed-by: Johannes Thumshirn <[hidden email]>
(backported from 7776db1ccc123d5944a8c170c9c45f7e91d49643 upstream)
[gpiccoli: context adjustment, fixed struct member access that changed]
Signed-off-by: Guilherme G. Piccoli <[hidden email]>
---
 drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3752052ae20a..5d2a7ee2f922 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -929,10 +929,8 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
  return IRQ_NONE;
 }
 
-static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
+static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
 {
- struct nvme_queue *nvmeq = hctx->driver_data;
-
  if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) {
  spin_lock_irq(&nvmeq->q_lock);
  __nvme_process_cq(nvmeq, &tag);
@@ -945,6 +943,13 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
  return 0;
 }
 
+static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
+{
+ struct nvme_queue *nvmeq = hctx->driver_data;
+
+ return __nvme_poll(nvmeq, tag);
+}
+
 static void nvme_async_event_work(struct work_struct *work)
 {
  struct nvme_dev *dev = container_of(work, struct nvme_dev, async_work);
@@ -1045,6 +1050,16 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
  struct request *abort_req;
  struct nvme_command cmd;
 
+ /*
+ * Did we miss an interrupt?
+ */
+ if (__nvme_poll(nvmeq, req->tag)) {
+ dev_warn(dev->dev,
+ "I/O %d QID %d timeout, completion polled\n",
+ req->tag, nvmeq->qid);
+ return BLK_EH_HANDLED;
+ }
+
  /*
  * Shutdown immediately if controller times out while starting. The
  * reset work will see the pci device disabled when it gets the forced
--
2.19.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU X] [PATCH 0/1] NVMe polling on timeout

Kleber Souza
In reply to this post by Guilherme Piccoli
On 12/7/18 10:28 PM, Guilherme G. Piccoli wrote:

> BugLink: https://launchpad.net/bugs/1807393
>
>
> [Impact]
>
> * NVMe controllers potentially could miss to send an interrupt, specially
> due to bugs in virtual devices(which are common those days - qemu has its
> own NVMe virtual device, so does AWS). This would be a difficult to
> debug situation, because NVMe driver only reports the request timeout,
> not the reason.
>
> * The upstream patch proposed to SRU here was designed to provide more
> information in these cases, by pro-actively polling the CQEs on request
> timeouts, to check if the specific request was completed and some issue
> (probably a missed interrupt) prevented the driver to notice, or if the
> request really wasn't completed, which indicates more severe issues.
>
> * Although quite useful for debugging, this patch could help to mitigate
> issues in cloud environments like AWS, in case we may have jitter in
> request completion and the i/o timeout was set to low values, or even
> in case of atypical bugs in the virtual NVMe controller. With this patch,
> if polling succeeds the NVMe driver will continue working instead of
> trying a reset controller procedure, which may lead to fails in the
> rootfs - refer to https://launchpad.net/bugs/1788035.
>
>
> [Test Case]
>
> * It's a bit tricky to artificially create a situation of missed
> interrupt; one idea was to implement a small hack in the NVMe qemu
> virtual device that given a trigger in guest kernel, will induce the
> virtual device to skip an interrupt. The hack patch is present in a
> Launchpad comment, along with instructions to reproduce.
>
>
> [Regression Potential]
>
> * There are no clear risks in adding such polling mechanism to the NVMe driver;
> one bad thing that was neverreported but could happen with this patch is the
> device could be in a bad state IRQ-wise that a reset would fix, but
> the patch could cause all requests to be completed via polling, which
> prevents the adapter reset. This is however a very hypothetical situation,
> which would also happen in the mainline kernel (since it has the patch).

Hi Guilherme,

Have we run any I/O stressing test to make sure it didn't introduce at
least a easily triggered regression?


Thanks,

Kleber


>
>
> Keith Busch (1):
>   nvme/pci: Poll CQ on timeout
>
>  drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU X] [PATCH 0/1] NVMe polling on timeout

Guilherme Piccoli
On 21/12/2018 11:59, Kleber Souza wrote:
>
> Hi Guilherme,
>
> Have we run any I/O stressing test to make sure it didn't introduce at
> least a easily triggered regression?
>

Sure Kleber, we have exercised this patch in AWS instance for a while -
it does not show any issues, but if an Abort happens, it may prevent
more serious issues (like a RO remount of rootfs). It's a like safe
measure introduced by the nvme maintainer.

Thanks,


Guilherme


>
> Thanks,
>
> Kleber
>
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU X] [PATCH 1/1] nvme/pci: Poll CQ on timeout

Kleber Souza
In reply to this post by Guilherme Piccoli
On 12/7/18 10:28 PM, Guilherme G. Piccoli wrote:

> From: Keith Busch <[hidden email]>
>
> BugLink: https://launchpad.net/bugs/1807393
>
> If an IO timeout occurs, it's helpful to know if the controller did not
> post a completion or the driver missed an interrupt. While we never expect
> the latter, this patch will make it possible to tell the difference so
> we don't have to guess.
>
> Signed-off-by: Keith Busch <[hidden email]>
> Signed-off-by: Christoph Hellwig <[hidden email]>
> Tested-by: Johannes Thumshirn <[hidden email]>
> Reviewed-by: Johannes Thumshirn <[hidden email]>
> (backported from 7776db1ccc123d5944a8c170c9c45f7e91d49643 upstream)
> [gpiccoli: context adjustment, fixed struct member access that changed]
> Signed-off-by: Guilherme G. Piccoli <[hidden email]>

Thanks for the additional info, Guilherme.


Upstream fix with low regression potential:

Acked-by: Kleber Sacilotto de Souza <[hidden email]>


> ---
>  drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 3752052ae20a..5d2a7ee2f922 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -929,10 +929,8 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
>   return IRQ_NONE;
>  }
>  
> -static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> +static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
>  {
> - struct nvme_queue *nvmeq = hctx->driver_data;
> -
>   if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) {
>   spin_lock_irq(&nvmeq->q_lock);
>   __nvme_process_cq(nvmeq, &tag);
> @@ -945,6 +943,13 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
>   return 0;
>  }
>  
> +static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> +{
> + struct nvme_queue *nvmeq = hctx->driver_data;
> +
> + return __nvme_poll(nvmeq, tag);
> +}
> +
>  static void nvme_async_event_work(struct work_struct *work)
>  {
>   struct nvme_dev *dev = container_of(work, struct nvme_dev, async_work);
> @@ -1045,6 +1050,16 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>   struct request *abort_req;
>   struct nvme_command cmd;
>  
> + /*
> + * Did we miss an interrupt?
> + */
> + if (__nvme_poll(nvmeq, req->tag)) {
> + dev_warn(dev->dev,
> + "I/O %d QID %d timeout, completion polled\n",
> + req->tag, nvmeq->qid);
> + return BLK_EH_HANDLED;
> + }
> +
>   /*
>   * Shutdown immediately if controller times out while starting. The
>   * reset work will see the pci device disabled when it gets the forced



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU X] [PATCH 1/1] nvme/pci: Poll CQ on timeout

Stefan Bader-2
In reply to this post by Guilherme Piccoli
On 07.12.18 22:28, Guilherme G. Piccoli wrote:

> From: Keith Busch <[hidden email]>
>
> BugLink: https://launchpad.net/bugs/1807393
>
> If an IO timeout occurs, it's helpful to know if the controller did not
> post a completion or the driver missed an interrupt. While we never expect
> the latter, this patch will make it possible to tell the difference so
> we don't have to guess.
>
> Signed-off-by: Keith Busch <[hidden email]>
> Signed-off-by: Christoph Hellwig <[hidden email]>
> Tested-by: Johannes Thumshirn <[hidden email]>
> Reviewed-by: Johannes Thumshirn <[hidden email]>
> (backported from 7776db1ccc123d5944a8c170c9c45f7e91d49643 upstream)
> [gpiccoli: context adjustment, fixed struct member access that changed]
> Signed-off-by: Guilherme G. Piccoli <[hidden email]>
Acked-by: Stefan Bader <[hidden email]>

> ---
>  drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 3752052ae20a..5d2a7ee2f922 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -929,10 +929,8 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
>   return IRQ_NONE;
>  }
>  
> -static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> +static int __nvme_poll(struct nvme_queue *nvmeq, unsigned int tag)
>  {
> - struct nvme_queue *nvmeq = hctx->driver_data;
> -
>   if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) {
>   spin_lock_irq(&nvmeq->q_lock);
>   __nvme_process_cq(nvmeq, &tag);
> @@ -945,6 +943,13 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
>   return 0;
>  }
>  
> +static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> +{
> + struct nvme_queue *nvmeq = hctx->driver_data;
> +
> + return __nvme_poll(nvmeq, tag);
> +}
> +
>  static void nvme_async_event_work(struct work_struct *work)
>  {
>   struct nvme_dev *dev = container_of(work, struct nvme_dev, async_work);
> @@ -1045,6 +1050,16 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>   struct request *abort_req;
>   struct nvme_command cmd;
>  
> + /*
> + * Did we miss an interrupt?
> + */
> + if (__nvme_poll(nvmeq, req->tag)) {
> + dev_warn(dev->dev,
> + "I/O %d QID %d timeout, completion polled\n",
> + req->tag, nvmeq->qid);
> + return BLK_EH_HANDLED;
> + }
> +
>   /*
>   * Shutdown immediately if controller times out while starting. The
>   * reset work will see the pci device disabled when it gets the forced
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED/cmt: [SRU X] [PATCH 0/1] NVMe polling on timeout

Khaled Elmously
In reply to this post by Guilherme Piccoli
Changed
    (backported from 7776db1ccc123d5944a8c170c9c45f7e91d49643 upstream)
to
    (backported from commit 7776db1ccc123d5944a8c170c9c45f7e91d49643)

..which is how we normally do it


On 2018-12-07 19:28:10 , Guilherme G. Piccoli wrote:

> BugLink: https://launchpad.net/bugs/1807393
>
>
> [Impact]
>
> * NVMe controllers potentially could miss to send an interrupt, specially
> due to bugs in virtual devices(which are common those days - qemu has its
> own NVMe virtual device, so does AWS). This would be a difficult to
> debug situation, because NVMe driver only reports the request timeout,
> not the reason.
>
> * The upstream patch proposed to SRU here was designed to provide more
> information in these cases, by pro-actively polling the CQEs on request
> timeouts, to check if the specific request was completed and some issue
> (probably a missed interrupt) prevented the driver to notice, or if the
> request really wasn't completed, which indicates more severe issues.
>
> * Although quite useful for debugging, this patch could help to mitigate
> issues in cloud environments like AWS, in case we may have jitter in
> request completion and the i/o timeout was set to low values, or even
> in case of atypical bugs in the virtual NVMe controller. With this patch,
> if polling succeeds the NVMe driver will continue working instead of
> trying a reset controller procedure, which may lead to fails in the
> rootfs - refer to https://launchpad.net/bugs/1788035.
>
>
> [Test Case]
>
> * It's a bit tricky to artificially create a situation of missed
> interrupt; one idea was to implement a small hack in the NVMe qemu
> virtual device that given a trigger in guest kernel, will induce the
> virtual device to skip an interrupt. The hack patch is present in a
> Launchpad comment, along with instructions to reproduce.
>
>
> [Regression Potential]
>
> * There are no clear risks in adding such polling mechanism to the NVMe driver;
> one bad thing that was neverreported but could happen with this patch is the
> device could be in a bad state IRQ-wise that a reset would fix, but
> the patch could cause all requests to be completed via polling, which
> prevents the adapter reset. This is however a very hypothetical situation,
> which would also happen in the mainline kernel (since it has the patch).
>
>
> Keith Busch (1):
>   nvme/pci: Poll CQ on timeout
>
>  drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> --
> 2.19.2
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: APPLIED/cmt: [SRU X] [PATCH 0/1] NVMe polling on timeout

Kleber Souza
On 1/9/19 10:28 AM, Khaled Elmously wrote:
> Changed
>     (backported from 7776db1ccc123d5944a8c170c9c45f7e91d49643 upstream)
> to
>     (backported from commit 7776db1ccc123d5944a8c170c9c45f7e91d49643)
>
> ..which is how we normally do it

Good catch, thanks!


>
>
> On 2018-12-07 19:28:10 , Guilherme G. Piccoli wrote:
>> BugLink: https://launchpad.net/bugs/1807393
>>
>>
>> [Impact]
>>
>> * NVMe controllers potentially could miss to send an interrupt, specially
>> due to bugs in virtual devices(which are common those days - qemu has its
>> own NVMe virtual device, so does AWS). This would be a difficult to
>> debug situation, because NVMe driver only reports the request timeout,
>> not the reason.
>>
>> * The upstream patch proposed to SRU here was designed to provide more
>> information in these cases, by pro-actively polling the CQEs on request
>> timeouts, to check if the specific request was completed and some issue
>> (probably a missed interrupt) prevented the driver to notice, or if the
>> request really wasn't completed, which indicates more severe issues.
>>
>> * Although quite useful for debugging, this patch could help to mitigate
>> issues in cloud environments like AWS, in case we may have jitter in
>> request completion and the i/o timeout was set to low values, or even
>> in case of atypical bugs in the virtual NVMe controller. With this patch,
>> if polling succeeds the NVMe driver will continue working instead of
>> trying a reset controller procedure, which may lead to fails in the
>> rootfs - refer to https://launchpad.net/bugs/1788035.
>>
>>
>> [Test Case]
>>
>> * It's a bit tricky to artificially create a situation of missed
>> interrupt; one idea was to implement a small hack in the NVMe qemu
>> virtual device that given a trigger in guest kernel, will induce the
>> virtual device to skip an interrupt. The hack patch is present in a
>> Launchpad comment, along with instructions to reproduce.
>>
>>
>> [Regression Potential]
>>
>> * There are no clear risks in adding such polling mechanism to the NVMe driver;
>> one bad thing that was neverreported but could happen with this patch is the
>> device could be in a bad state IRQ-wise that a reset would fix, but
>> the patch could cause all requests to be completed via polling, which
>> prevents the adapter reset. This is however a very hypothetical situation,
>> which would also happen in the mainline kernel (since it has the patch).
>>
>>
>> Keith Busch (1):
>>   nvme/pci: Poll CQ on timeout
>>
>>  drivers/nvme/host/pci.c | 21 ++++++++++++++++++---
>>  1 file changed, 18 insertions(+), 3 deletions(-)
>>
>> --
>> 2.19.2
>>
>>
>> --
>> kernel-team mailing list
>> [hidden email]
>> https://lists.ubuntu.com/mailman/listinfo/kernel-team



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: APPLIED/cmt: [SRU X] [PATCH 0/1] NVMe polling on timeout

Guilherme Piccoli
In reply to this post by Khaled Elmously
On Wed, Jan 9, 2019 at 7:28 AM Khaled Elmously
<[hidden email]> wrote:
>
> Changed
>     (backported from 7776db1ccc123d5944a8c170c9c45f7e91d49643 upstream)
> to
>     (backported from commit 7776db1ccc123d5944a8c170c9c45f7e91d49643)
>
> ..which is how we normally do it

Thanks Khaled, seems I've missed this here too - sorry!

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team