[PATCH 0/2][SRU][OEM-B]System hangs after play video from thunderbolt HDD and then S3

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/2][SRU][OEM-B]System hangs after play video from thunderbolt HDD and then S3

AceLan Kao
BugLink: https://bugs.launchpad.net/bugs/1834550

[Impact]
System hangs after played video from TBT storage, and then suspend the
system. It enters suspend correctly, but hangs during waking up.

[Fix]
There are 2 commits in upstream which fix this issue.
   95c80bc6952b PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
   3943af9d01e9 PCI: pciehp: Ignore Link State Changes after powering off a slot

[Test]
Verified on below thunderbolt interface with thunderbolt HDD
01:00.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3
Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02)

[Regression Potential]
Low, both commits have been SRU to Ubuntu Disco 5.0 kernel through
stable update, and the fix on remove functions are small and reasonable.

Rafael J. Wysocki (1):
  PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()

Sergey Miroshnichenko (1):
  PCI: pciehp: Ignore Link State Changes after powering off a slot

 drivers/pci/hotplug/pciehp_ctrl.c |  4 ++++
 drivers/pci/pcie/pme.c            | 22 +++++++++++++++-------
 2 files changed, 19 insertions(+), 7 deletions(-)

--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/2][SRU][OEM-B] PCI: pciehp: Ignore Link State Changes after powering off a slot

AceLan Kao
From: Sergey Miroshnichenko <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1834550

During a safe hot remove, the OS powers off the slot, which may cause a
Data Link Layer State Changed event.  The slot has already been set to
OFF_STATE, so that event results in re-enabling the device, making it
impossible to safely remove it.

Clear out the Presence Detect Changed and Data Link Layer State Changed
events when the disabled slot has settled down.

It is still possible to re-enable the device if it remains in the slot
after pressing the Attention Button by pressing it again.

Fixes the problem that Micah reported below: an NVMe drive power button may
not actually turn off the drive.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237
Reported-by: Micah Parrish <[hidden email]>
Tested-by: Micah Parrish <[hidden email]>
Signed-off-by: Sergey Miroshnichenko <[hidden email]>
[bhelgaas: changelog, add bugzilla URL]
Signed-off-by: Bjorn Helgaas <[hidden email]>
Reviewed-by: Lukas Wunner <[hidden email]>
Cc: [hidden email] # v4.19+
(cherry picked from commit 3943af9d01e94330d0cfac6fccdbc829aad50c92)
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index cd0541d80946..5eae29a12ce3 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -115,6 +115,10 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
  * removed from the slot/adapter.
  */
  msleep(1000);
+
+ /* Ignore link or presence changes caused by power off */
+ atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
+   &ctrl->pending_events);
  }
 
  /* turn off Green LED */
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/2][SRU][OEM-B] PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()

AceLan Kao
In reply to this post by AceLan Kao
From: "Rafael J. Wysocki" <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1834550

Dongdong reported a deadlock triggered by a hotplug event during a sysfs
"remove" operation:

  pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
  # echo 1 > 0000:00:0c.0/remove

  PME and hotplug share an MSI/MSI-X vector.  The sysfs "remove" side is:

    remove_store
       pci_stop_and_remove_bus_device_locked
         pci_lock_rescan_remove
         pci_stop_and_remove_bus_device
           ...
           pcie_pme_remove
             pcie_pme_suspend
               synchronize_irq        # wait for hotplug IRQ handler
         pci_unlock_rescan_remove

  The hotplug side is:

    pciehp_ist
       pciehp_handle_presence_or_link_change
         pciehp_configure_device
           pci_lock_rescan_remove     # wait for pci_unlock_rescan_remove()

  INFO: task bash:10913 blocked for more than 120 seconds.

  # ps -ax |grep D
   PID TTY      STAT   TIME COMMAND
  10913 ttyAMA0  Ds+    0:00 -bash
  14022 ?        D      0:00 [irq/745-pciehp]

  # cat /proc/14022/stack
  __switch_to+0x94/0xd8
  pci_lock_rescan_remove+0x20/0x28
  pciehp_configure_device+0x30/0x140
  pciehp_handle_presence_or_link_change+0x324/0x458
  pciehp_ist+0x1dc/0x1e0

  # cat /proc/10913/stack
  __switch_to+0x94/0xd8
  synchronize_irq+0x8c/0xc0
  pcie_pme_suspend+0xa4/0x118
  pcie_pme_remove+0x20/0x40
  pcie_port_remove_service+0x3c/0x58
  ...
  pcie_port_device_remove+0x2c/0x48
  pcie_portdrv_remove+0x68/0x78
  pci_device_remove+0x48/0x120
  ...
  pci_stop_bus_device+0x84/0xc0
  pci_stop_and_remove_bus_device_locked+0x24/0x40
  remove_store+0xa4/0xb8
  dev_attr_store+0x44/0x60
  sysfs_kf_write+0x58/0x80

It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
reasons.

First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
native hotplug interrupt handler as well as for the PME one, because they
share one IRQ (as per the spec).  That may deadlock if hotplug is signaled
while pcie_pme_remove() is running and the latter calls
pci_lock_rescan_remove() before the former.

Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
for the port, it will return without disabling the interrupt as expected by
pcie_pme_remove() which was overlooked by commit c7b5a4e6e8fb ("PCI / PM:
Fix native PME handling during system suspend/resume").

To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
its status and prevent the PME worker function from re-enabling it before
calling free_irq() on it, which should be sufficient.

Fixes: c7b5a4e6e8fb ("PCI / PM: Fix native PME handling during system suspend/resume")
Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@...
Reported-by: Dongdong Liu <[hidden email]>
Signed-off-by: Rafael J. Wysocki <[hidden email]>
[bhelgaas: add URL and deadlock details from Dongdong]
Signed-off-by: Bjorn Helgaas <[hidden email]>
(cherry picked from commit 95c80bc6952b6a5badc7b702d23e5bf14d251e7c)
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/pcie/pme.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c
index d5b2916f4ce3..ca4313d299d0 100644
--- a/drivers/pci/pcie/pme.c
+++ b/drivers/pci/pcie/pme.c
@@ -366,6 +366,16 @@ static bool pcie_pme_check_wakeup(struct pci_bus *bus)
  return false;
 }
 
+static void pcie_pme_disable_interrupt(struct pci_dev *port,
+       struct pcie_pme_service_data *data)
+{
+ spin_lock_irq(&data->lock);
+ pcie_pme_interrupt_enable(port, false);
+ pcie_clear_root_pme_status(port);
+ data->noirq = true;
+ spin_unlock_irq(&data->lock);
+}
+
 /**
  * pcie_pme_suspend - Suspend PCIe PME service device.
  * @srv: PCIe service device to suspend.
@@ -390,11 +400,7 @@ static int pcie_pme_suspend(struct pcie_device *srv)
  return 0;
  }
 
- spin_lock_irq(&data->lock);
- pcie_pme_interrupt_enable(port, false);
- pcie_clear_root_pme_status(port);
- data->noirq = true;
- spin_unlock_irq(&data->lock);
+ pcie_pme_disable_interrupt(port, data);
 
  synchronize_irq(srv->irq);
 
@@ -430,9 +436,11 @@ static int pcie_pme_resume(struct pcie_device *srv)
  */
 static void pcie_pme_remove(struct pcie_device *srv)
 {
- pcie_pme_suspend(srv);
+ struct pcie_pme_service_data *data = get_service_data(srv);
+
+ pcie_pme_disable_interrupt(srv->port, data);
  free_irq(srv->irq, srv);
- kfree(get_service_data(srv));
+ kfree(data);
 }
 
 static struct pcie_port_service_driver pcie_pme_driver = {
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/2][SRU][OEM-B]System hangs after play video from thunderbolt HDD and then S3

Kai-Heng Feng
In reply to this post by AceLan Kao
at 10:18, AceLan Kao <[hidden email]> wrote:

> BugLink: https://bugs.launchpad.net/bugs/1834550
>
> [Impact]
> System hangs after played video from TBT storage, and then suspend the
> system. It enters suspend correctly, but hangs during waking up.
>
> [Fix]
> There are 2 commits in upstream which fix this issue.
>    95c80bc6952b PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
>    3943af9d01e9 PCI: pciehp: Ignore Link State Changes after powering off a slot
>
> [Test]
> Verified on below thunderbolt interface with thunderbolt HDD
> 01:00.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3
> Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02)
>
> [Regression Potential]
> Low, both commits have been SRU to Ubuntu Disco 5.0 kernel through
> stable update, and the fix on remove functions are small and reasonable.
>
> Rafael J. Wysocki (1):
>   PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
>
> Sergey Miroshnichenko (1):
>   PCI: pciehp: Ignore Link State Changes after powering off a slot

Acked-by: Kai-Heng Feng <[hidden email]>

>
>  drivers/pci/hotplug/pciehp_ctrl.c |  4 ++++
>  drivers/pci/pcie/pme.c            | 22 +++++++++++++++-------
>  2 files changed, 19 insertions(+), 7 deletions(-)
>
> --
> 2.17.1
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/2][SRU][OEM-B]System hangs after play video from thunderbolt HDD and then S3

Hui Wang
In reply to this post by AceLan Kao
Acked-by: Hui Wang <[hidden email]>

On 2019/6/28 上午10:18, AceLan Kao wrote:

> BugLink: https://bugs.launchpad.net/bugs/1834550
>
> [Impact]
> System hangs after played video from TBT storage, and then suspend the
> system. It enters suspend correctly, but hangs during waking up.
>
> [Fix]
> There are 2 commits in upstream which fix this issue.
>     95c80bc6952b PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
>     3943af9d01e9 PCI: pciehp: Ignore Link State Changes after powering off a slot
>
> [Test]
> Verified on below thunderbolt interface with thunderbolt HDD
> 01:00.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3
> Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02)
>
> [Regression Potential]
> Low, both commits have been SRU to Ubuntu Disco 5.0 kernel through
> stable update, and the fix on remove functions are small and reasonable.
>
> Rafael J. Wysocki (1):
>    PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
>
> Sergey Miroshnichenko (1):
>    PCI: pciehp: Ignore Link State Changes after powering off a slot
>
>   drivers/pci/hotplug/pciehp_ctrl.c |  4 ++++
>   drivers/pci/pcie/pme.c            | 22 +++++++++++++++-------
>   2 files changed, 19 insertions(+), 7 deletions(-)
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team