[xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Marcelo Henrique Cerri
BugLink: https://bugs.launchpad.net/bugs/1788222

That's a custom backport for Xenial of the commit 07e860202d18
("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
from git://git.infradead.org/nvme.git).

Original description:

In many architectures loads may be reordered with older stores to
different locations.  In the nvme driver the following two operations
could be reordered:

    - Write shadow doorbell (dbbuf_db) into memory.
    - Read EventIdx (dbbuf_ei) from memory.

This can result in a potential race condition between driver and VM host
processing requests (if given virtual NVMe controller has a support for
shadow doorbell).  If that occurs, then the NVMe controller may decide to
wait for MMIO doorbell from guest operating system, and guest driver may
decide not to issue MMIO doorbell on any of subsequent commands.

This issue is purely timing-dependent one, so there is no easy way to
reproduce it. Currently the easiest known approach is to run "Oracle IO
Numbers" (orion) that is shipped with Oracle DB:

orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
        concat -write 40 -duration 120 -matrix row -testname nvme_test

Where nvme_test is a .lun file that contains a list of NVMe block
devices to run test against. Limiting number of vCPUs assigned to given
VM instance seems to increase chances for this bug to occur. On test
environment with VM that got 4 NVMe drives and 1 vCPU assigned the
virtual NVMe controller hang could be observed within 10-20 minutes.
That correspond to about 400-500k IO operations processed (or about
100GB of IO read/writes).

Orion tool was used as a validation and set to run in a loop for 36
hours (equivalent of pushing 550M IO operations). No issues were
observed. That suggest that the patch fixes the issue.

Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
---
 drivers/nvme/host/pci.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5e5f065bf729..3752052ae20a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
  old_value = *db_addr;
  *db_addr = value;
 
- rmb();
+ /*
+ * Ensure that the doorbell is updated before reading the event
+ * index from memory.  The controller needs to provide similar
+ * ordering to ensure the envent index is updated before reading
+ * the doorbell.
+ */
+ mb();
  if (!nvme_ext_need_event(*event_idx, value, old_value))
  goto no_doorbell;
 
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Colin Ian King-2
On 11/09/18 20:50, Marcelo Henrique Cerri wrote:

> BugLink: https://bugs.launchpad.net/bugs/1788222
>
> That's a custom backport for Xenial of the commit 07e860202d18
> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> from git://git.infradead.org/nvme.git).
>
> Original description:
>
> In many architectures loads may be reordered with older stores to
> different locations.  In the nvme driver the following two operations
> could be reordered:
>
>     - Write shadow doorbell (dbbuf_db) into memory.
>     - Read EventIdx (dbbuf_ei) from memory.
>
> This can result in a potential race condition between driver and VM host
> processing requests (if given virtual NVMe controller has a support for
> shadow doorbell).  If that occurs, then the NVMe controller may decide to
> wait for MMIO doorbell from guest operating system, and guest driver may
> decide not to issue MMIO doorbell on any of subsequent commands.
>
> This issue is purely timing-dependent one, so there is no easy way to
> reproduce it. Currently the easiest known approach is to run "Oracle IO
> Numbers" (orion) that is shipped with Oracle DB:
>
> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> concat -write 40 -duration 120 -matrix row -testname nvme_test
>
> Where nvme_test is a .lun file that contains a list of NVMe block
> devices to run test against. Limiting number of vCPUs assigned to given
> VM instance seems to increase chances for this bug to occur. On test
> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> virtual NVMe controller hang could be observed within 10-20 minutes.
> That correspond to about 400-500k IO operations processed (or about
> 100GB of IO read/writes).
>
> Orion tool was used as a validation and set to run in a loop for 36
> hours (equivalent of pushing 550M IO operations). No issues were
> observed. That suggest that the patch fixes the issue.
>
> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
> Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
> ---
>  drivers/nvme/host/pci.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5e5f065bf729..3752052ae20a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
>   old_value = *db_addr;
>   *db_addr = value;
>  
> - rmb();
> + /*
> + * Ensure that the doorbell is updated before reading the event
> + * index from memory.  The controller needs to provide similar
> + * ordering to ensure the envent index is updated before reading
> + * the doorbell.
> + */
> + mb();
>   if (!nvme_ext_need_event(*event_idx, value, old_value))
>   goto no_doorbell;
>  
>

Positive test results, so..

Acked-by: Colin Ian King <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

NACK: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Stefan Bader-2
In reply to this post by Marcelo Henrique Cerri
On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
> BugLink: https://bugs.launchpad.net/bugs/1788222
>
--- cut --
> That's a custom backport for Xenial of the commit 07e860202d18
> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> from git://git.infradead.org/nvme.git).
>
> Original description:
>
-- cut end --

> In many architectures loads may be reordered with older stores to
> different locations.  In the nvme driver the following two operations
> could be reordered:
>
>     - Write shadow doorbell (dbbuf_db) into memory.
>     - Read EventIdx (dbbuf_ei) from memory.
>
> This can result in a potential race condition between driver and VM host
> processing requests (if given virtual NVMe controller has a support for
> shadow doorbell).  If that occurs, then the NVMe controller may decide to
> wait for MMIO doorbell from guest operating system, and guest driver may
> decide not to issue MMIO doorbell on any of subsequent commands.
>
> This issue is purely timing-dependent one, so there is no easy way to
> reproduce it. Currently the easiest known approach is to run "Oracle IO
> Numbers" (orion) that is shipped with Oracle DB:
>
> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> concat -write 40 -duration 120 -matrix row -testname nvme_test
>
> Where nvme_test is a .lun file that contains a list of NVMe block
> devices to run test against. Limiting number of vCPUs assigned to given
> VM instance seems to increase chances for this bug to occur. On test
> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> virtual NVMe controller hang could be observed within 10-20 minutes.
> That correspond to about 400-500k IO operations processed (or about
> 100GB of IO read/writes).
>
> Orion tool was used as a validation and set to run in a loop for 36
> hours (equivalent of pushing 550M IO operations). No issues were
> observed. That suggest that the patch fixes the issue.
>
> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
(backported from commit ??? git://git.infradead.org/nvme.git)
[mhcerry: <rough description of what had to be changed>]
> Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
> ---

I cannot find the commit you refer to via the git web interface. Also, even
reading through the comments, I am not sure this really is a backport of some
upstream patch (in which case I would expect the commit message to be the same
as the source of the backport.
But the way it is now, I have no clue where this comes from and what was done to
it before submission.

-Stefan

>  drivers/nvme/host/pci.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5e5f065bf729..3752052ae20a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
>   old_value = *db_addr;
>   *db_addr = value;
>  
> - rmb();
> + /*
> + * Ensure that the doorbell is updated before reading the event
> + * index from memory.  The controller needs to provide similar
> + * ordering to ensure the envent index is updated before reading
> + * the doorbell.
> + */
> + mb();
>   if (!nvme_ext_need_event(*event_idx, value, old_value))
>   goto no_doorbell;
>  
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NACK: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Marcelo Henrique Cerri
On Wed, Sep 12, 2018 at 10:26:29AM +0200, Stefan Bader wrote:

> On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1788222
> >
> --- cut --
> > That's a custom backport for Xenial of the commit 07e860202d18
> > ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> > from git://git.infradead.org/nvme.git).
> >
> > Original description:
> >
> -- cut end --
> > In many architectures loads may be reordered with older stores to
> > different locations.  In the nvme driver the following two operations
> > could be reordered:
> >
> >     - Write shadow doorbell (dbbuf_db) into memory.
> >     - Read EventIdx (dbbuf_ei) from memory.
> >
> > This can result in a potential race condition between driver and VM host
> > processing requests (if given virtual NVMe controller has a support for
> > shadow doorbell).  If that occurs, then the NVMe controller may decide to
> > wait for MMIO doorbell from guest operating system, and guest driver may
> > decide not to issue MMIO doorbell on any of subsequent commands.
> >
> > This issue is purely timing-dependent one, so there is no easy way to
> > reproduce it. Currently the easiest known approach is to run "Oracle IO
> > Numbers" (orion) that is shipped with Oracle DB:
> >
> > orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> > concat -write 40 -duration 120 -matrix row -testname nvme_test
> >
> > Where nvme_test is a .lun file that contains a list of NVMe block
> > devices to run test against. Limiting number of vCPUs assigned to given
> > VM instance seems to increase chances for this bug to occur. On test
> > environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> > virtual NVMe controller hang could be observed within 10-20 minutes.
> > That correspond to about 400-500k IO operations processed (or about
> > 100GB of IO read/writes).
> >
> > Orion tool was used as a validation and set to run in a loop for 36
> > hours (equivalent of pushing 550M IO operations). No issues were
> > observed. That suggest that the patch fixes the issue.
> >
> > Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
> (backported from commit ??? git://git.infradead.org/nvme.git)
> [mhcerry: <rough description of what had to be changed>]
I can add these lines describing the origin and changes.

> > Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
> > ---
>
> I cannot find the commit you refer to via the git web interface. Also, even
> reading through the comments, I am not sure this really is a backport of some
> upstream patch (in which case I would expect the commit message to be the same
> as the source of the backport.

What commit couldn't you find? That fix was crafted by me because
xenial carries a version of the original patchset that is completely
different from upstream.

I didn't keep the original title because it refers to the upstream
function that doesn't exist in Xenial.

> But the way it is now, I have no clue where this comes from and what was done to
> it before submission.
>
> -Stefan
>
> >  drivers/nvme/host/pci.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 5e5f065bf729..3752052ae20a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
> >   old_value = *db_addr;
> >   *db_addr = value;
> >  
> > - rmb();
> > + /*
> > + * Ensure that the doorbell is updated before reading the event
> > + * index from memory.  The controller needs to provide similar
> > + * ordering to ensure the envent index is updated before reading
> > + * the doorbell.
> > + */
> > + mb();
> >   if (!nvme_ext_need_event(*event_idx, value, old_value))
> >   goto no_doorbell;
> >  
> >
>
>

--
Regards,
Marcelo

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NACK: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Stefan Bader-2
On 13.09.2018 16:02, Marcelo Henrique Cerri wrote:

> On Wed, Sep 12, 2018 at 10:26:29AM +0200, Stefan Bader wrote:
>> On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
>>> BugLink: https://bugs.launchpad.net/bugs/1788222
>>>
>> --- cut --
>>> That's a custom backport for Xenial of the commit 07e860202d18
>>> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
>>> from git://git.infradead.org/nvme.git).
>>>
>>> Original description:
>>>
>> -- cut end --
>>> In many architectures loads may be reordered with older stores to
>>> different locations.  In the nvme driver the following two operations
>>> could be reordered:
>>>
>>>     - Write shadow doorbell (dbbuf_db) into memory.
>>>     - Read EventIdx (dbbuf_ei) from memory.
>>>
>>> This can result in a potential race condition between driver and VM host
>>> processing requests (if given virtual NVMe controller has a support for
>>> shadow doorbell).  If that occurs, then the NVMe controller may decide to
>>> wait for MMIO doorbell from guest operating system, and guest driver may
>>> decide not to issue MMIO doorbell on any of subsequent commands.
>>>
>>> This issue is purely timing-dependent one, so there is no easy way to
>>> reproduce it. Currently the easiest known approach is to run "Oracle IO
>>> Numbers" (orion) that is shipped with Oracle DB:
>>>
>>> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
>>> concat -write 40 -duration 120 -matrix row -testname nvme_test
>>>
>>> Where nvme_test is a .lun file that contains a list of NVMe block
>>> devices to run test against. Limiting number of vCPUs assigned to given
>>> VM instance seems to increase chances for this bug to occur. On test
>>> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
>>> virtual NVMe controller hang could be observed within 10-20 minutes.
>>> That correspond to about 400-500k IO operations processed (or about
>>> 100GB of IO read/writes).
>>>
>>> Orion tool was used as a validation and set to run in a loop for 36
>>> hours (equivalent of pushing 550M IO operations). No issues were
>>> observed. That suggest that the patch fixes the issue.
>>>
>>> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
>> (backported from commit ??? git://git.infradead.org/nvme.git)
>> [mhcerry: <rough description of what had to be changed>]
>
> I can add these lines describing the origin and changes.
>
>>> Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
>>> ---
>>
>> I cannot find the commit you refer to via the git web interface. Also, even
>> reading through the comments, I am not sure this really is a backport of some
>> upstream patch (in which case I would expect the commit message to be the same
>> as the source of the backport.
>
> What commit couldn't you find? That fix was crafted by me because
> xenial carries a version of the original patchset that is completely
> different from upstream.
>
> I didn't keep the original title because it refers to the upstream
> function that doesn't exist in Xenial.
At the top you wrote:
>>> That's a custom backport for Xenial of the commit 07e860202d18
>>> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
>>> from git://git.infradead.org/nvme.git).

So I can only assume that this patch here is somehow related to that and to a
certain degree adjusted. Like the rest of the commit message seems to come from
that source.

As a reviewer I have no history of things. I can looks at the related upstream
change and maybe make an educated guess whether this looks ok. But I just don't
know what to do with this as searching the infradead git did find nothing with
that sha1 or the commit message.

-Stefan

>
>> But the way it is now, I have no clue where this comes from and what was done to
>> it before submission.
>>
>> -Stefan
>>
>>>  drivers/nvme/host/pci.c | 8 +++++++-
>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>> index 5e5f065bf729..3752052ae20a 100644
>>> --- a/drivers/nvme/host/pci.c
>>> +++ b/drivers/nvme/host/pci.c
>>> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
>>>   old_value = *db_addr;
>>>   *db_addr = value;
>>>  
>>> - rmb();
>>> + /*
>>> + * Ensure that the doorbell is updated before reading the event
>>> + * index from memory.  The controller needs to provide similar
>>> + * ordering to ensure the envent index is updated before reading
>>> + * the doorbell.
>>> + */
>>> + mb();
>>>   if (!nvme_ext_need_event(*event_idx, value, old_value))
>>>   goto no_doorbell;
>>>  
>>>
>>
>>
>
>
> --
> Regards,
> Marcelo
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NACK: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Marcelo Henrique Cerri
Hi, Stefan.

I searched for the commit on the web interface and I found it
(although it was pretty slow):

http://git.infradead.org/nvme.git/commit/07e860202d18

Let me know if you want to adjust anything.

--
Regards,
Marcelo

On Thu, Sep 13, 2018 at 05:39:23PM +0200, Stefan Bader wrote:

> On 13.09.2018 16:02, Marcelo Henrique Cerri wrote:
> > On Wed, Sep 12, 2018 at 10:26:29AM +0200, Stefan Bader wrote:
> >> On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
> >>> BugLink: https://bugs.launchpad.net/bugs/1788222
> >>>
> >> --- cut --
> >>> That's a custom backport for Xenial of the commit 07e860202d18
> >>> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> >>> from git://git.infradead.org/nvme.git).
> >>>
> >>> Original description:
> >>>
> >> -- cut end --
> >>> In many architectures loads may be reordered with older stores to
> >>> different locations.  In the nvme driver the following two operations
> >>> could be reordered:
> >>>
> >>>     - Write shadow doorbell (dbbuf_db) into memory.
> >>>     - Read EventIdx (dbbuf_ei) from memory.
> >>>
> >>> This can result in a potential race condition between driver and VM host
> >>> processing requests (if given virtual NVMe controller has a support for
> >>> shadow doorbell).  If that occurs, then the NVMe controller may decide to
> >>> wait for MMIO doorbell from guest operating system, and guest driver may
> >>> decide not to issue MMIO doorbell on any of subsequent commands.
> >>>
> >>> This issue is purely timing-dependent one, so there is no easy way to
> >>> reproduce it. Currently the easiest known approach is to run "Oracle IO
> >>> Numbers" (orion) that is shipped with Oracle DB:
> >>>
> >>> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> >>> concat -write 40 -duration 120 -matrix row -testname nvme_test
> >>>
> >>> Where nvme_test is a .lun file that contains a list of NVMe block
> >>> devices to run test against. Limiting number of vCPUs assigned to given
> >>> VM instance seems to increase chances for this bug to occur. On test
> >>> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> >>> virtual NVMe controller hang could be observed within 10-20 minutes.
> >>> That correspond to about 400-500k IO operations processed (or about
> >>> 100GB of IO read/writes).
> >>>
> >>> Orion tool was used as a validation and set to run in a loop for 36
> >>> hours (equivalent of pushing 550M IO operations). No issues were
> >>> observed. That suggest that the patch fixes the issue.
> >>>
> >>> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
> >> (backported from commit ??? git://git.infradead.org/nvme.git)
> >> [mhcerry: <rough description of what had to be changed>]
> >
> > I can add these lines describing the origin and changes.
> >
> >>> Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
> >>> ---
> >>
> >> I cannot find the commit you refer to via the git web interface. Also, even
> >> reading through the comments, I am not sure this really is a backport of some
> >> upstream patch (in which case I would expect the commit message to be the same
> >> as the source of the backport.
> >
> > What commit couldn't you find? That fix was crafted by me because
> > xenial carries a version of the original patchset that is completely
> > different from upstream.
> >
> > I didn't keep the original title because it refers to the upstream
> > function that doesn't exist in Xenial.
>
> At the top you wrote:
> >>> That's a custom backport for Xenial of the commit 07e860202d18
> >>> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> >>> from git://git.infradead.org/nvme.git).
>
> So I can only assume that this patch here is somehow related to that and to a
> certain degree adjusted. Like the rest of the commit message seems to come from
> that source.
>
> As a reviewer I have no history of things. I can looks at the related upstream
> change and maybe make an educated guess whether this looks ok. But I just don't
> know what to do with this as searching the infradead git did find nothing with
> that sha1 or the commit message.
>
> -Stefan
>
> >
> >> But the way it is now, I have no clue where this comes from and what was done to
> >> it before submission.
> >>
> >> -Stefan
> >>
> >>>  drivers/nvme/host/pci.c | 8 +++++++-
> >>>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> >>> index 5e5f065bf729..3752052ae20a 100644
> >>> --- a/drivers/nvme/host/pci.c
> >>> +++ b/drivers/nvme/host/pci.c
> >>> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
> >>>   old_value = *db_addr;
> >>>   *db_addr = value;
> >>>  
> >>> - rmb();
> >>> + /*
> >>> + * Ensure that the doorbell is updated before reading the event
> >>> + * index from memory.  The controller needs to provide similar
> >>> + * ordering to ensure the envent index is updated before reading
> >>> + * the doorbell.
> >>> + */
> >>> + mb();
> >>>   if (!nvme_ext_need_event(*event_idx, value, old_value))
> >>>   goto no_doorbell;
> >>>  
> >>>
> >>
> >>
> >
> >
> > --
> > Regards,
> > Marcelo
> >
>
>



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK/Cmnt: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Stefan Bader-2
In reply to this post by Marcelo Henrique Cerri
Subject: [Xenial SRU] nvme-pci: add a memory barrier to
nvme_dbbuf_update_and_check_event

On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
> BugLink: https://bugs.launchpad.net/bugs/1788222
>

-- cut ---
> That's a custom backport for Xenial of the commit 07e860202d18
> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> from git://git.infradead.org/nvme.git).
>
> Original description:
-- cut --

>
> In many architectures loads may be reordered with older stores to
> different locations.  In the nvme driver the following two operations
> could be reordered:
>
>     - Write shadow doorbell (dbbuf_db) into memory.
>     - Read EventIdx (dbbuf_ei) from memory.
>
> This can result in a potential race condition between driver and VM host
> processing requests (if given virtual NVMe controller has a support for
> shadow doorbell).  If that occurs, then the NVMe controller may decide to
> wait for MMIO doorbell from guest operating system, and guest driver may
> decide not to issue MMIO doorbell on any of subsequent commands.
>
> This issue is purely timing-dependent one, so there is no easy way to
> reproduce it. Currently the easiest known approach is to run "Oracle IO
> Numbers" (orion) that is shipped with Oracle DB:
>
> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> concat -write 40 -duration 120 -matrix row -testname nvme_test
>
> Where nvme_test is a .lun file that contains a list of NVMe block
> devices to run test against. Limiting number of vCPUs assigned to given
> VM instance seems to increase chances for this bug to occur. On test
> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> virtual NVMe controller hang could be observed within 10-20 minutes.
> That correspond to about 400-500k IO operations processed (or about
> 100GB of IO read/writes).
>
> Orion tool was used as a validation and set to run in a loop for 36
> hours (equivalent of pushing 550M IO operations). No issues were
> observed. That suggest that the patch fixes the issue.
Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices")
Signed-off-by: Michal Wnukowski <[hidden email]>
Reviewed-by: Keith Busch <[hidden email]>
Reviewed-by: Sagi Grimberg <[hidden email]>
[hch: updated changelog and comment a bit]
Signed-off-by: Christoph Hellwig <[hidden email]>
>
> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
(backported from commit 07e860202d187d35ad24f0100caa0d244d4df2af
http://git.infradead.org/nvme.git)
> Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
Acked-by: Stefan Bader <[hidden email]>
> ---

Since this is at least a backport to some degree, this should preserve the
original commit subject and message and only add bug reference and a new sob
area like I tried to comment inline.

>  drivers/nvme/host/pci.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5e5f065bf729..3752052ae20a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
>   old_value = *db_addr;
>   *db_addr = value;
>  
> - rmb();
> + /*
> + * Ensure that the doorbell is updated before reading the event
> + * index from memory.  The controller needs to provide similar
> + * ordering to ensure the envent index is updated before reading
> + * the doorbell.
> + */
> + mb();
>   if (!nvme_ext_need_event(*event_idx, value, old_value))
>   goto no_doorbell;
>  
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ACK/Cmnt: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Kleber Souza
On 09/27/18 17:15, Stefan Bader wrote:

> Subject: [Xenial SRU] nvme-pci: add a memory barrier to
> nvme_dbbuf_update_and_check_event
>
> On 11.09.2018 21:50, Marcelo Henrique Cerri wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1788222
>>
>
> -- cut ---
>> That's a custom backport for Xenial of the commit 07e860202d18
>> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
>> from git://git.infradead.org/nvme.git).
>>
>> Original description:
> -- cut --
>>
>> In many architectures loads may be reordered with older stores to
>> different locations.  In the nvme driver the following two operations
>> could be reordered:
>>
>>     - Write shadow doorbell (dbbuf_db) into memory.
>>     - Read EventIdx (dbbuf_ei) from memory.
>>
>> This can result in a potential race condition between driver and VM host
>> processing requests (if given virtual NVMe controller has a support for
>> shadow doorbell).  If that occurs, then the NVMe controller may decide to
>> wait for MMIO doorbell from guest operating system, and guest driver may
>> decide not to issue MMIO doorbell on any of subsequent commands.
>>
>> This issue is purely timing-dependent one, so there is no easy way to
>> reproduce it. Currently the easiest known approach is to run "Oracle IO
>> Numbers" (orion) that is shipped with Oracle DB:
>>
>> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
>> concat -write 40 -duration 120 -matrix row -testname nvme_test
>>
>> Where nvme_test is a .lun file that contains a list of NVMe block
>> devices to run test against. Limiting number of vCPUs assigned to given
>> VM instance seems to increase chances for this bug to occur. On test
>> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
>> virtual NVMe controller hang could be observed within 10-20 minutes.
>> That correspond to about 400-500k IO operations processed (or about
>> 100GB of IO read/writes).
>>
>> Orion tool was used as a validation and set to run in a loop for 36
>> hours (equivalent of pushing 550M IO operations). No issues were
>> observed. That suggest that the patch fixes the issue.
>
> Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices")
> Signed-off-by: Michal Wnukowski <[hidden email]>
> Reviewed-by: Keith Busch <[hidden email]>
> Reviewed-by: Sagi Grimberg <[hidden email]>
> [hch: updated changelog and comment a bit]
> Signed-off-by: Christoph Hellwig <[hidden email]>
>>
>> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
> (backported from commit 07e860202d187d35ad24f0100caa0d244d4df2af
> http://git.infradead.org/nvme.git)

The fix has landed upstream on v4.19-rc2 as
f1ed3df20d2d223e0852cc4ac1f19bba869a7e3c. We can fix that when applying
the patch.

>> Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
> Acked-by: Stefan Bader <[hidden email]>
>> ---
>
> Since this is at least a backport to some degree, this should preserve the
> original commit subject and message and only add bug reference and a new sob
> area like I tried to comment inline.
>
>>  drivers/nvme/host/pci.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 5e5f065bf729..3752052ae20a 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
>>   old_value = *db_addr;
>>   *db_addr = value;
>>  
>> - rmb();
>> + /*
>> + * Ensure that the doorbell is updated before reading the event
>> + * index from memory.  The controller needs to provide similar
>> + * ordering to ensure the envent index is updated before reading
>> + * the doorbell.
>> + */
>> + mb();
>>   if (!nvme_ext_need_event(*event_idx, value, old_value))
>>   goto no_doorbell;
>>  
>>
>
>
>
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED/cmnt: [xenial][PATCH] UBUNTU: SAUCE: nvme-pci: fix memory barrier at nvme_ext_write_doorbell

Kleber Souza
In reply to this post by Marcelo Henrique Cerri
On 09/11/18 21:50, Marcelo Henrique Cerri wrote:

> BugLink: https://bugs.launchpad.net/bugs/1788222
>
> That's a custom backport for Xenial of the commit 07e860202d18
> ("nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event"
> from git://git.infradead.org/nvme.git).
>
> Original description:
>
> In many architectures loads may be reordered with older stores to
> different locations.  In the nvme driver the following two operations
> could be reordered:
>
>     - Write shadow doorbell (dbbuf_db) into memory.
>     - Read EventIdx (dbbuf_ei) from memory.
>
> This can result in a potential race condition between driver and VM host
> processing requests (if given virtual NVMe controller has a support for
> shadow doorbell).  If that occurs, then the NVMe controller may decide to
> wait for MMIO doorbell from guest operating system, and guest driver may
> decide not to issue MMIO doorbell on any of subsequent commands.
>
> This issue is purely timing-dependent one, so there is no easy way to
> reproduce it. Currently the easiest known approach is to run "Oracle IO
> Numbers" (orion) that is shipped with Oracle DB:
>
> orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
> concat -write 40 -duration 120 -matrix row -testname nvme_test
>
> Where nvme_test is a .lun file that contains a list of NVMe block
> devices to run test against. Limiting number of vCPUs assigned to given
> VM instance seems to increase chances for this bug to occur. On test
> environment with VM that got 4 NVMe drives and 1 vCPU assigned the
> virtual NVMe controller hang could be observed within 10-20 minutes.
> That correspond to about 400-500k IO operations processed (or about
> 100GB of IO read/writes).
>
> Orion tool was used as a validation and set to run in a loop for 36
> hours (equivalent of pushing 550M IO operations). No issues were
> observed. That suggest that the patch fixes the issue.
>
> Fixes: 4a4037f212ae ("UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices")
> Signed-off-by: Marcelo Henrique Cerri <[hidden email]>
> ---
>  drivers/nvme/host/pci.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5e5f065bf729..3752052ae20a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -375,7 +375,13 @@ static void nvme_ext_write_doorbell(u16 value, u32 __iomem* q_db,
>   old_value = *db_addr;
>   *db_addr = value;
>  
> - rmb();
> + /*
> + * Ensure that the doorbell is updated before reading the event
> + * index from memory.  The controller needs to provide similar
> + * ordering to ensure the envent index is updated before reading
> + * the doorbell.
> + */
> + mb();
>   if (!nvme_ext_need_event(*event_idx, value, old_value))
>   goto no_doorbell;
>  
>

Applied to xenial/master-next branch, with the fixes for subject, commit
body and SOB area.

Thanks,
Kleber

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team