[SRU] [Bionic/Cosmic/Unstable] [PATCH 0/1] Fix unusable NVIDIA GPU after S3

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU] [Bionic/Cosmic/Unstable] [PATCH 0/1] Fix unusable NVIDIA GPU after S3

Kai-Heng Feng
BugLink: https://bugs.launchpad.net/bugs/1793338

[Impact]
NVIDIA GPU on a new system we have becomes unusable after S3.

[Test]
The NVIDIA GPU in question works well after applying this patch.

[Fix]
Rewrite 0 to Intel PCI bridge 'Prefetchable Base Upper 32 Bits'.
This should be done by BIOS S3 code [1], but now the issue reappears, let's
do the same in Linux PCI subsystem.

[Regression Potential]
Low.
After applying this patch I haven't noticed any change on already
working PCI devices.
Also this patch is acked by PCI and ACPI maintainers, so we can be more
confident.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23

Daniel Drake (1):
  PCI: Reprogram bridge prefetch registers on resume

 drivers/pci/pci.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/1] PCI: Reprogram bridge prefetch registers on resume

Kai-Heng Feng
From: Daniel Drake <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1793338

On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3
suspend/resume.  The affected products include multiple generations of
NVIDIA GPUs and Intel SoCs.  After resume, nouveau logs many errors such
as:

  fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04
        [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
  DRM: failed to idle channel 0 [DRM]

Similarly, the NVIDIA proprietary driver also fails after resume (black
screen, 100% CPU usage in Xorg process).  We shipped a sample to NVIDIA for
diagnosis, and their response indicated that it's a problem with the parent
PCI bridge (on the Intel SoC), not the GPU.

Runtime suspend/resume works fine, only S3 suspend is affected.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32).  In the
cases that I checked, this register has value 0 and we just have to rewrite
that value.

Linux already saves and restores PCI config space during suspend/resume,
but this register was being skipped because upon resume, it already has
value 0 (the correct, pre-suspend value).

Intel appear to have previously acknowledged this behaviour and the
requirement to rewrite this register:
https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23

Based on that, rewrite the prefetch register values even when that appears
unnecessary.

We have confirmed this solution on all the affected models we have in-hands
(X542UQ, UX533FD, X530UN, V272UN).

Additionally, this solves an issue where r8169 MSI-X interrupts were broken
after S3 suspend/resume on ASUS X441UAR.  This issue was recently worked
around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e").  It
also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop
that we had not yet patched.  I suspect it will also fix the issue that was
worked around in commit 7c53a722459c ("r8169: don't use MSI-X on
RTL8168g").

Thomas Martitz reports that this change also solves an issue where the AMD
Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3
suspend/resume.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
Signed-off-by: Daniel Drake <[hidden email]>
Signed-off-by: Bjorn Helgaas <[hidden email]>
Reviewed-by: Rafael J. Wysocki <[hidden email]>
Reviewed-By: Peter Wu <[hidden email]>
(cherry picked from commit 04cb3ae895d7efdc60f0fe17182b200a3da20f09 git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git pci/enumeration)
Signed-off-by: Kai-Heng Feng <[hidden email]>
---
 drivers/pci/pci.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f953042c7cdf..7e600b75d186 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1113,12 +1113,12 @@ int pci_save_state(struct pci_dev *dev)
 EXPORT_SYMBOL(pci_save_state);
 
 static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
-     u32 saved_val, int retry)
+     u32 saved_val, int retry, bool force)
 {
  u32 val;
 
  pci_read_config_dword(pdev, offset, &val);
- if (val == saved_val)
+ if (!force && val == saved_val)
  return;
 
  for (;;) {
@@ -1137,25 +1137,36 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
 }
 
 static void pci_restore_config_space_range(struct pci_dev *pdev,
-   int start, int end, int retry)
+   int start, int end, int retry,
+   bool force)
 {
  int index;
 
  for (index = end; index >= start; index--)
  pci_restore_config_dword(pdev, 4 * index,
  pdev->saved_config_space[index],
- retry);
+ retry, force);
 }
 
 static void pci_restore_config_space(struct pci_dev *pdev)
 {
  if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
- pci_restore_config_space_range(pdev, 10, 15, 0);
+ pci_restore_config_space_range(pdev, 10, 15, 0, false);
  /* Restore BARs before the command register. */
- pci_restore_config_space_range(pdev, 4, 9, 10);
- pci_restore_config_space_range(pdev, 0, 3, 0);
+ pci_restore_config_space_range(pdev, 4, 9, 10, false);
+ pci_restore_config_space_range(pdev, 0, 3, 0, false);
+ } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+ pci_restore_config_space_range(pdev, 12, 15, 0, false);
+
+ /*
+ * Force rewriting of prefetch registers to avoid S3 resume
+ * issues on Intel PCI bridges that occur when these
+ * registers are not explicitly written.
+ */
+ pci_restore_config_space_range(pdev, 9, 11, 0, true);
+ pci_restore_config_space_range(pdev, 0, 8, 0, false);
  } else {
- pci_restore_config_space_range(pdev, 0, 15, 0);
+ pci_restore_config_space_range(pdev, 0, 15, 0, false);
  }
 }
 
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt / APPLIED[C/Unstable]: [SRU] [Bionic/Cosmic/Unstable] [PATCH 0/1] Fix unusable NVIDIA GPU after S3

Seth Forshee
In reply to this post by Kai-Heng Feng
On Wed, Sep 19, 2018 at 11:29:15PM +0800, Kai-Heng Feng wrote:

> BugLink: https://bugs.launchpad.net/bugs/1793338
>
> [Impact]
> NVIDIA GPU on a new system we have becomes unusable after S3.
>
> [Test]
> The NVIDIA GPU in question works well after applying this patch.
>
> [Fix]
> Rewrite 0 to Intel PCI bridge 'Prefetchable Base Upper 32 Bits'.
> This should be done by BIOS S3 code [1], but now the issue reappears, let's
> do the same in Linux PCI subsystem.
>
> [Regression Potential]
> Low.
> After applying this patch I haven't noticed any change on already
> working PCI devices.
> Also this patch is acked by PCI and ACPI maintainers, so we can be more
> confident.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23

I think this looks safe enough, though based on the branch it's been
committed to and the discussion it's not entirely clear to me whether
this patch will go upstream or something a little different. So I think
we can apply it, but I'd ask that you keep an eye on the upstream
discussions and send an update if upstream goes a different way.

Based on the above I think this should be committed as SAUCE. With that
done:

Acked-by: Seth Forshee <[hidden email]>

Applied to cosmic/master-next and unstable/master. Thanks!

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: ACK/cmnt / APPLIED[C/Unstable]: [SRU] [Bionic/Cosmic/Unstable] [PATCH 0/1] Fix unusable NVIDIA GPU after S3

Kai-Heng Feng
at 15:20, Seth Forshee <[hidden email]> wrote:

> On Wed, Sep 19, 2018 at 11:29:15PM +0800, Kai-Heng Feng wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1793338
>>
>> [Impact]
>> NVIDIA GPU on a new system we have becomes unusable after S3.
>>
>> [Test]
>> The NVIDIA GPU in question works well after applying this patch.
>>
>> [Fix]
>> Rewrite 0 to Intel PCI bridge 'Prefetchable Base Upper 32 Bits'.
>> This should be done by BIOS S3 code [1], but now the issue reappears,  
>> let's
>> do the same in Linux PCI subsystem.
>>
>> [Regression Potential]
>> Low.
>> After applying this patch I haven't noticed any change on already
>> working PCI devices.
>> Also this patch is acked by PCI and ACPI maintainers, so we can be more
>> confident.
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
>
> I think this looks safe enough, though based on the branch it's been
> committed to and the discussion it's not entirely clear to me whether
> this patch will go upstream or something a little different. So I think
> we can apply it, but I'd ask that you keep an eye on the upstream
> discussions and send an update if upstream goes a different way.

 From what I can understand, this patch is meant to be non-intrusive, so it  
can easily be backported to older kernel releases.

This patch should land at v4.20. An more intrusive approach will land in  
later release.

For Cosmic I think we can backport the intrusive version, but for Bionic I  
think we should keep using this one.

Kai-Heng

>
> Based on the above I think this should be committed as SAUCE. With that
> done:
>
> Acked-by: Seth Forshee <[hidden email]>
>
> Applied to cosmic/master-next and unstable/master. Thanks!



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED/oem Re: [SRU] [Bionic/Cosmic/Unstable] [PATCH 0/1] Fix unusable NVIDIA GPU after S3

Timo Aaltonen-6
In reply to this post by Kai-Heng Feng
On 19.09.2018 18:29, Kai-Heng Feng wrote:

> BugLink: https://bugs.launchpad.net/bugs/1793338
>
> [Impact]
> NVIDIA GPU on a new system we have becomes unusable after S3.
>
> [Test]
> The NVIDIA GPU in question works well after applying this patch.
>
> [Fix]
> Rewrite 0 to Intel PCI bridge 'Prefetchable Base Upper 32 Bits'.
> This should be done by BIOS S3 code [1], but now the issue reappears, let's
> do the same in Linux PCI subsystem.
>
> [Regression Potential]
> Low.
> After applying this patch I haven't noticed any change on already
> working PCI devices.
> Also this patch is acked by PCI and ACPI maintainers, so we can be more
> confident.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
>
> Daniel Drake (1):
>   PCI: Reprogram bridge prefetch registers on resume
>
>  drivers/pci/pci.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)

applied to bionic linux-oem


--
t

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/1] PCI: Reprogram bridge prefetch registers on resume

Stefan Bader-2
In reply to this post by Kai-Heng Feng
On 19.09.2018 17:29, Kai-Heng Feng wrote:

> From: Daniel Drake <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1793338
>
> On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3
> suspend/resume.  The affected products include multiple generations of
> NVIDIA GPUs and Intel SoCs.  After resume, nouveau logs many errors such
> as:
>
>   fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04
>         [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
>   DRM: failed to idle channel 0 [DRM]
>
> Similarly, the NVIDIA proprietary driver also fails after resume (black
> screen, 100% CPU usage in Xorg process).  We shipped a sample to NVIDIA for
> diagnosis, and their response indicated that it's a problem with the parent
> PCI bridge (on the Intel SoC), not the GPU.
>
> Runtime suspend/resume works fine, only S3 suspend is affected.
>
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32).  In the
> cases that I checked, this register has value 0 and we just have to rewrite
> that value.
>
> Linux already saves and restores PCI config space during suspend/resume,
> but this register was being skipped because upon resume, it already has
> value 0 (the correct, pre-suspend value).
>
> Intel appear to have previously acknowledged this behaviour and the
> requirement to rewrite this register:
> https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
>
> Based on that, rewrite the prefetch register values even when that appears
> unnecessary.
>
> We have confirmed this solution on all the affected models we have in-hands
> (X542UQ, UX533FD, X530UN, V272UN).
>
> Additionally, this solves an issue where r8169 MSI-X interrupts were broken
> after S3 suspend/resume on ASUS X441UAR.  This issue was recently worked
> around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e").  It
> also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop
> that we had not yet patched.  I suspect it will also fix the issue that was
> worked around in commit 7c53a722459c ("r8169: don't use MSI-X on
> RTL8168g").
>
> Thomas Martitz reports that this change also solves an issue where the AMD
> Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3
> suspend/resume.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> Signed-off-by: Daniel Drake <[hidden email]>
> Signed-off-by: Bjorn Helgaas <[hidden email]>
> Reviewed-by: Rafael J. Wysocki <[hidden email]>
> Reviewed-By: Peter Wu <[hidden email]>
> (cherry picked from commit 04cb3ae895d7efdc60f0fe17182b200a3da20f09 git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git pci/enumeration)
> Signed-off-by: Kai-Heng Feng <[hidden email]>
> ---
I looks like the change effectively only changes behaviour for bridges, just not
sure whether it is ok to make this change to all bridges. Probably not a real
problem to write a value that is already there but I am not that familiar with
the magics of pci.
So just double checking before acking this.

-Stefan

>  drivers/pci/pci.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f953042c7cdf..7e600b75d186 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1113,12 +1113,12 @@ int pci_save_state(struct pci_dev *dev)
>  EXPORT_SYMBOL(pci_save_state);
>  
>  static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> -     u32 saved_val, int retry)
> +     u32 saved_val, int retry, bool force)
>  {
>   u32 val;
>  
>   pci_read_config_dword(pdev, offset, &val);
> - if (val == saved_val)
> + if (!force && val == saved_val)
>   return;
>  
>   for (;;) {
> @@ -1137,25 +1137,36 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
>  }
>  
>  static void pci_restore_config_space_range(struct pci_dev *pdev,
> -   int start, int end, int retry)
> +   int start, int end, int retry,
> +   bool force)
>  {
>   int index;
>  
>   for (index = end; index >= start; index--)
>   pci_restore_config_dword(pdev, 4 * index,
>   pdev->saved_config_space[index],
> - retry);
> + retry, force);
>  }
>  
>  static void pci_restore_config_space(struct pci_dev *pdev)
>  {
>   if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
> - pci_restore_config_space_range(pdev, 10, 15, 0);
> + pci_restore_config_space_range(pdev, 10, 15, 0, false);
>   /* Restore BARs before the command register. */
> - pci_restore_config_space_range(pdev, 4, 9, 10);
> - pci_restore_config_space_range(pdev, 0, 3, 0);
> + pci_restore_config_space_range(pdev, 4, 9, 10, false);
> + pci_restore_config_space_range(pdev, 0, 3, 0, false);
> + } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> + pci_restore_config_space_range(pdev, 12, 15, 0, false);
> +
> + /*
> + * Force rewriting of prefetch registers to avoid S3 resume
> + * issues on Intel PCI bridges that occur when these
> + * registers are not explicitly written.
> + */
> + pci_restore_config_space_range(pdev, 9, 11, 0, true);
> + pci_restore_config_space_range(pdev, 0, 8, 0, false);
>   } else {
> - pci_restore_config_space_range(pdev, 0, 15, 0);
> + pci_restore_config_space_range(pdev, 0, 15, 0, false);
>   }
>  }
>  
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [SRU] [Bionic/Cosmic/Unstable] [PATCH 0/1] Fix unusable NVIDIA GPU after S3

Kleber Souza
In reply to this post by Kai-Heng Feng
On 09/19/18 17:29, Kai-Heng Feng wrote:

> BugLink: https://bugs.launchpad.net/bugs/1793338
>
> [Impact]
> NVIDIA GPU on a new system we have becomes unusable after S3.
>
> [Test]
> The NVIDIA GPU in question works well after applying this patch.
>
> [Fix]
> Rewrite 0 to Intel PCI bridge 'Prefetchable Base Upper 32 Bits'.
> This should be done by BIOS S3 code [1], but now the issue reappears, let's
> do the same in Linux PCI subsystem.
>
> [Regression Potential]
> Low.
> After applying this patch I haven't noticed any change on already
> working PCI devices.
> Also this patch is acked by PCI and ACPI maintainers, so we can be more
> confident.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
>
> Daniel Drake (1):
>   PCI: Reprogram bridge prefetch registers on resume
>
>  drivers/pci/pci.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>

The patch has been queued for v4.19 (instead of v4.20) by the PCI
maintainer:
https://www.spinics.net/lists/netdev/msg525724.html

So I think it's safe to include this in Bionic, even without a SAUCE tag.

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED[B]/Cmnt: [SRU] [Bionic/Cosmic/Unstable] [PATCH 0/1] Fix unusable NVIDIA GPU after S3

Stefan Bader-2
In reply to this post by Kai-Heng Feng
On 19.09.2018 17:29, Kai-Heng Feng wrote:

> BugLink: https://bugs.launchpad.net/bugs/1793338
>
> [Impact]
> NVIDIA GPU on a new system we have becomes unusable after S3.
>
> [Test]
> The NVIDIA GPU in question works well after applying this patch.
>
> [Fix]
> Rewrite 0 to Intel PCI bridge 'Prefetchable Base Upper 32 Bits'.
> This should be done by BIOS S3 code [1], but now the issue reappears, let's
> do the same in Linux PCI subsystem.
>
> [Regression Potential]
> Low.
> After applying this patch I haven't noticed any change on already
> working PCI devices.
> Also this patch is acked by PCI and ACPI maintainers, so we can be more
> confident.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
>
> Daniel Drake (1):
>   PCI: Reprogram bridge prefetch registers on resume
>
>  drivers/pci/pci.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
As Kleber mentioned this was queued for 4.19 and is already on linux-next. So I
replaced the pick by the one from linux-next (it appeared to be identical).

Applied to bionic/master-next. Thanks.

-Stefan


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment