[PATCH 0/4][SRU][D][E][F][OEM-B][OEM-B-OSP1] The system cannot resume from S3 if user unplugs the TB16 during suspend state

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/4][SRU][D][E][F][OEM-B][OEM-B-OSP1] The system cannot resume from S3 if user unplugs the TB16 during suspend state

AceLan Kao
BugLink: https://bugs.launchpad.net/bugs/1849269

[Impact]
Unplug TBT dock during S3 and then wakes up the system leads to system
hangs.
This not only happens on Dell TB16, but also Dell WD19TB. It's more like
a BIOS or TBT firmware issue that TBT firmware doesn't re-create the PCIe
tunnels.

[Fix]
Mika provides 4 patches to fix this issue, but only 2 are accepted and are
included in linus' tree. We need especially the 4th commit to fix the
issue, so we keep the 3rd and 4th commits as sauce patches.

[Test]
Verified on Dell Precision 5530 + WD19TB

[Regression Potential]
Low, those patches doesn't change the code path much. The 4th commit adds
some checks to return failure state, and it should be safe for those
machines which is working well.

Mika Westerberg (4):
  bdi: Do not use freezable workqueue
  Revert "libata, freezer: avoid block device removal while system is
    frozen"
  UBUNTU: SAUCE: PCI: pciehp: Do not disable interrupt twice on suspend
  UBUNTU: SAUCE: PCI: pciehp: Prevent deadlock on disconnect

 drivers/ata/libata-scsi.c         | 21 ------------------
 drivers/pci/hotplug/pciehp.h      |  6 +++---
 drivers/pci/hotplug/pciehp_core.c | 36 ++++++++++++++++++++++++++-----
 drivers/pci/hotplug/pciehp_ctrl.c |  4 ++--
 drivers/pci/hotplug/pciehp_hpc.c  | 32 ++++++++++++++++++++-------
 kernel/freezer.c                  |  6 ------
 mm/backing-dev.c                  |  4 ++--
 7 files changed, 62 insertions(+), 47 deletions(-)

--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/4][SRU][OEM-B] bdi: Do not use freezable workqueue

AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

A removable block device, such as NVMe or SSD connected over Thunderbolt
can be hot-removed any time including when the system is suspended. When
device is hot-removed during suspend and the system gets resumed, kernel
first resumes devices and then thaws the userspace including freezable
workqueues. What happens in that case is that the NVMe driver notices
that the device is unplugged and removes it from the system. This ends
up calling bdi_unregister() for the gendisk which then schedules
wb_workfn() to be run one more time.

However, since the bdi_wq is still frozen flush_delayed_work() call in
wb_shutdown() blocks forever halting system resume process. User sees
this as hang as nothing is happening anymore.

Triggering sysrq-w reveals this:

  Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
  Call Trace:
   ? __schedule+0x2c5/0x630
   ? wait_for_completion+0xa4/0x120
   schedule+0x3e/0xc0
   schedule_timeout+0x1c9/0x320
   ? resched_curr+0x1f/0xd0
   ? wait_for_completion+0xa4/0x120
   wait_for_completion+0xc3/0x120
   ? wake_up_q+0x60/0x60
   __flush_work+0x131/0x1e0
   ? flush_workqueue_prep_pwqs+0x130/0x130
   bdi_unregister+0xb9/0x130
   del_gendisk+0x2d2/0x2e0
   nvme_ns_remove+0xed/0x110 [nvme_core]
   nvme_remove_namespaces+0x96/0xd0 [nvme_core]
   nvme_remove+0x5b/0x160 [nvme]
   pci_device_remove+0x36/0x90
   device_release_driver_internal+0xdf/0x1c0
   nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
   process_one_work+0x1c2/0x3f0
   worker_thread+0x48/0x3e0
   kthread+0x100/0x140
   ? current_work+0x30/0x30
   ? kthread_park+0x80/0x80
   ret_from_fork+0x35/0x40

This is not limited to NVMes so exactly same issue can be reproduced by
hot-removing SSD (over Thunderbolt) while the system is suspended.

Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.

Reported-by: AceLan Kao <[hidden email]>
Link: https://marc.info/?l=linux-kernel&m=138695698516487
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385
Link: https://lore.kernel.org/lkml/20191002122136.GD2819@.../#t
Acked-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: Mika Westerberg <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
(cherry picked from commit a2b90f11217790ec0964ba9c93a4abb369758c26)
Signed-off-by: AceLan Kao <[hidden email]>
---
 mm/backing-dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 2918c74c352a4..6babd1b7d0f66 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -260,8 +260,8 @@ static int __init default_bdi_init(void)
 {
  int err;
 
- bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_FREEZABLE |
-      WQ_UNBOUND | WQ_SYSFS, 0);
+ bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_UNBOUND |
+ WQ_SYSFS, 0);
  if (!bdi_wq)
  return -ENOMEM;
 
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/4][SRU][OEM-B] Revert "libata, freezer: avoid block device removal while system is frozen"

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

This reverts commit 85fbd722ad0f5d64d1ad15888cd1eb2188bfb557.

The commit was added as a quick band-aid for a hang that happened when a
block device was removed during system suspend. Now that bdi_wq is not
freezable anymore the hang should not be possible and we can get rid of
this hack by reverting it.

Acked-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: Mika Westerberg <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
(cherry picked from commit 0e48f51cbbfbdb79149806de14dcb8bf0f01ca94)
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/ata/libata-scsi.c | 21 ---------------------
 kernel/freezer.c          |  6 ------
 2 files changed, 27 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 504ad9488ef74..8a6ae66e4434a 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4811,27 +4811,6 @@ void ata_scsi_hotplug(struct work_struct *work)
  return;
  }
 
- /*
- * XXX - UGLY HACK
- *
- * The block layer suspend/resume path is fundamentally broken due
- * to freezable kthreads and workqueue and may deadlock if a block
- * device gets removed while resume is in progress.  I don't know
- * what the solution is short of removing freezable kthreads and
- * workqueues altogether.
- *
- * The following is an ugly hack to avoid kicking off device
- * removal while freezer is active.  This is a joke but does avoid
- * this particular deadlock scenario.
- *
- * https://bugzilla.kernel.org/show_bug.cgi?id=62801
- * http://marc.info/?l=linux-kernel&m=138695698516487
- */
-#ifdef CONFIG_FREEZER
- while (pm_freezing)
- msleep(10);
-#endif
-
  DPRINTK("ENTER\n");
  mutex_lock(&ap->scsi_scan_mutex);
 
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 6f56a9e219fac..43860bdbca387 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -19,12 +19,6 @@ EXPORT_SYMBOL(system_freezing_cnt);
 bool pm_freezing;
 bool pm_nosig_freezing;
 
-/*
- * Temporary export for the deadlock workaround in ata_scsi_hotplug().
- * Remove once the hack becomes unnecessary.
- */
-EXPORT_SYMBOL_GPL(pm_freezing);
-
 /* protects freezing and frozen transitions */
 static DEFINE_SPINLOCK(freezer_lock);
 
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/4][SRU][OEM-B] UBUNTU: SAUCE: PCI: pciehp: Do not disable interrupt twice on suspend

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

We try to keep PCIe hotplug ports runtime suspended when entering system
suspend. Due to the fact that the PCIe portdrv sets NEVER_SKIP driver PM
flag the PM core always calls system suspend/resume hooks even if the
device is left runtime suspended. Since PCIe hotplug driver re-uses the
same function for both it ends up disabling hotplug interrupt twice and
the second time following is printed:

  pciehp 0000:03:01.0:pcie204: pcie_do_write_cmd: no response from device

Prevent this from happening by checking whether the device is already
runtime suspended when system suspend hook is called.

Link: https://patchwork.kernel.org/patch/11089971/
Fixes: 9c62f0bfb832 ("PCI: pciehp: Implement runtime PM callbacks")
Reported-by: Kai-Heng Feng <[hidden email]>
Tested-by: Kai-Heng Feng <[hidden email]>
Signed-off-by: Mika Westerberg <[hidden email]>
Reviewed-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/hotplug/pciehp_core.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 55cf9783703da..20cfa94273029 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -269,7 +269,7 @@ static bool pme_is_native(struct pcie_device *dev)
  return pcie_ports_native || host->native_pme;
 }
 
-static int pciehp_suspend(struct pcie_device *dev)
+static void pciehp_disable_interrupt(struct pcie_device *dev)
 {
  /*
  * Disable hotplug interrupt so that it does not trigger
@@ -277,7 +277,19 @@ static int pciehp_suspend(struct pcie_device *dev)
  */
  if (pme_is_native(dev))
  pcie_disable_interrupt(get_service_data(dev));
+}
 
+#ifdef CONFIG_PM_SLEEP
+static int pciehp_suspend(struct pcie_device *dev)
+{
+ /*
+ * If the port is already runtime suspended we can keep it that
+ * way.
+ */
+ if (dev_pm_smart_suspend_and_suspended(&dev->port->dev))
+ return 0;
+
+ pciehp_disable_interrupt(dev);
  return 0;
 }
 
@@ -295,6 +307,7 @@ static int pciehp_resume_noirq(struct pcie_device *dev)
 
  return 0;
 }
+#endif
 
 static int pciehp_resume(struct pcie_device *dev)
 {
@@ -308,6 +321,12 @@ static int pciehp_resume(struct pcie_device *dev)
  return 0;
 }
 
+static int pciehp_runtime_suspend(struct pcie_device *dev)
+{
+ pciehp_disable_interrupt(dev);
+ return 0;
+}
+
 static int pciehp_runtime_resume(struct pcie_device *dev)
 {
  struct controller *ctrl = get_service_data(dev);
@@ -334,10 +353,12 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
  .remove = pciehp_remove,
 
 #ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
  .suspend = pciehp_suspend,
  .resume_noirq = pciehp_resume_noirq,
  .resume = pciehp_resume,
- .runtime_suspend = pciehp_suspend,
+#endif
+ .runtime_suspend = pciehp_runtime_suspend,
  .runtime_resume = pciehp_runtime_resume,
 #endif /* PM */
 };
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/4][SRU][OEM-B] UBUNTU: SAUCE: PCI: pciehp: Prevent deadlock on disconnect

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

If there are more than one PCIe switch with hotplug downstream ports
hot-removing them leads to a following deadlock:

 INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 irq/126-pciehp  D    0   198      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_timeout+0x246/0x350
  ? ttwu_do_activate+0x67/0x90
  wait_for_completion+0xb7/0x140
  ? wake_up_q+0x80/0x80
  kthread_stop+0x49/0x110
  __free_irq+0x15c/0x2a0
  free_irq+0x32/0x70
  pcie_shutdown_notification+0x2f/0x50
  pciehp_remove+0x27/0x50
  pcie_port_remove_service+0x36/0x50
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  bus_remove_device+0xec/0x160
  device_del+0x13b/0x350
  ? pcie_port_find_device+0x60/0x60
  device_unregister+0x1a/0x60
  remove_iter+0x1e/0x30
  device_for_each_child+0x56/0x90
  pcie_port_device_remove+0x22/0x40
  pcie_portdrv_remove+0x20/0x60
  pci_device_remove+0x3e/0xc0
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  pci_stop_bus_device+0x6f/0x90
  pci_stop_bus_device+0x31/0x90
  pci_stop_and_remove_bus_device+0x12/0x20
  pciehp_unconfigure_device+0x88/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
 irq/190-pciehp  D    0  2288      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_preempt_disabled+0xe/0x10
  __mutex_lock.isra.9+0x2e0/0x4d0
  ? __mutex_lock_slowpath+0x13/0x20
  __mutex_lock_slowpath+0x13/0x20
  mutex_lock+0x2c/0x30
  pci_lock_rescan_remove+0x15/0x20
  pciehp_unconfigure_device+0x4d/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40

What happens here is that the whole hierarchy is runtime resumed and the
parent PCIe downstream port, who got the hot-remove event, starts
removing devices below it taking pci_lock_rescan_remove() lock. When the
child PCIe port is runtime resumed it calls pciehp_check_presence()
which ends up calling pciehp_card_present() and pciehp_check_link_active().
Both of these read their parts of PCIe config space by calling helper
function pcie_capability_read_word(). Now, this function notices that
the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND
with the capability value set to 0. When pciehp gets this value it
thinks that its child device is also hot-removed and schedules its IRQ
thread to handle the event.

The deadlock happens when the child's IRQ thread runs and tries to
acquire pci_lock_rescan_remove() which is already taken by the parent
and the parent waits for the child's IRQ thread to finish.

We can prevent this from happening by checking the return value of
pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
performing any hot-removal activities.

Link: https://patchwork.kernel.org/patch/11089973/
Signed-off-by: Mika Westerberg <[hidden email]>
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/hotplug/pciehp.h      |  6 +++---
 drivers/pci/hotplug/pciehp_core.c | 11 ++++++++---
 drivers/pci/hotplug/pciehp_ctrl.c |  4 ++--
 drivers/pci/hotplug/pciehp_hpc.c  | 32 +++++++++++++++++++++++--------
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index f45f0f01a104a..473207fca5c24 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,10 +189,10 @@ int pciehp_query_power_fault(struct controller *ctrl);
 void pciehp_green_led_on(struct controller *ctrl);
 void pciehp_green_led_off(struct controller *ctrl);
 void pciehp_green_led_blink(struct controller *ctrl);
-bool pciehp_card_present(struct controller *ctrl);
-bool pciehp_card_present_or_link_active(struct controller *ctrl);
+int pciehp_card_present(struct controller *ctrl);
+int pciehp_card_present_or_link_active(struct controller *ctrl);
 int pciehp_check_link_status(struct controller *ctrl);
-bool pciehp_check_link_active(struct controller *ctrl);
+int pciehp_check_link_active(struct controller *ctrl);
 void pciehp_release_ctrl(struct controller *ctrl);
 
 int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 20cfa94273029..67ac542a39184 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -155,10 +155,15 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 {
  struct controller *ctrl = hotplug_slot->private;
  struct pci_dev *pdev = ctrl->pcie->port;
+ int ret;
 
  pci_config_pm_runtime_get(pdev);
- *value = pciehp_card_present_or_link_active(ctrl);
+ ret = pciehp_card_present_or_link_active(ctrl);
  pci_config_pm_runtime_put(pdev);
+ if (ret < 0)
+ return ret;
+
+ *value = ret;
  return 0;
 }
 
@@ -174,13 +179,13 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
  */
 static void pciehp_check_presence(struct controller *ctrl)
 {
- bool occupied;
+ int occupied;
 
  down_read(&ctrl->reset_lock);
  mutex_lock(&ctrl->lock);
 
  occupied = pciehp_card_present_or_link_active(ctrl);
- if ((occupied && (ctrl->state == OFF_STATE ||
+ if ((occupied > 0 && (ctrl->state == OFF_STATE ||
   ctrl->state == BLINKINGON_STATE)) ||
     (!occupied && (ctrl->state == ON_STATE ||
    ctrl->state == BLINKINGOFF_STATE)))
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 5eae29a12ce3d..8dbf4d2bf4927 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -219,7 +219,7 @@ void pciehp_handle_disable_request(struct controller *ctrl)
 
 void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 {
- bool present, link_active;
+ int present, link_active;
 
  /*
  * If the slot is on and presence or link has changed, turn it off.
@@ -250,7 +250,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
  mutex_lock(&ctrl->lock);
  present = pciehp_card_present(ctrl);
  link_active = pciehp_check_link_active(ctrl);
- if (!present && !link_active) {
+ if (present <= 0 && link_active <= 0) {
  mutex_unlock(&ctrl->lock);
  return;
  }
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 6874ab24f9264..7548b2c18641a 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -199,13 +199,16 @@ static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask)
  pcie_do_write_cmd(ctrl, cmd, mask, false);
 }
 
-bool pciehp_check_link_active(struct controller *ctrl)
+int pciehp_check_link_active(struct controller *ctrl)
 {
  struct pci_dev *pdev = ctrl_dev(ctrl);
  u16 lnk_status;
- bool ret;
+ int ret;
+
+ ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+ if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+ return -ENODEV;
 
- pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
  ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
 
  if (ret)
@@ -387,13 +390,17 @@ void pciehp_get_latch_status(struct controller *ctrl, u8 *status)
  *status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
 }
 
-bool pciehp_card_present(struct controller *ctrl)
+int pciehp_card_present(struct controller *ctrl)
 {
  struct pci_dev *pdev = ctrl_dev(ctrl);
  u16 slot_status;
+ int ret;
 
- pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
- return slot_status & PCI_EXP_SLTSTA_PDS;
+ ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
+ if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+ return -ENODEV;
+
+ return !!(slot_status & PCI_EXP_SLTSTA_PDS);
 }
 
 /**
@@ -404,10 +411,19 @@ bool pciehp_card_present(struct controller *ctrl)
  * Presence Detect State bit, this helper also returns true if the Link Active
  * bit is set.  This is a concession to broken hotplug ports which hardwire
  * Presence Detect State to zero, such as Wilocity's [1ae9:0200].
+ *
+ * Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug
+ *    port is not present anymore returns %-ENODEV.
  */
-bool pciehp_card_present_or_link_active(struct controller *ctrl)
+int pciehp_card_present_or_link_active(struct controller *ctrl)
 {
- return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
+ int ret;
+
+ ret = pciehp_card_present(ctrl);
+ if (ret)
+ return ret;
+
+ return pciehp_check_link_active(ctrl);
 }
 
 int pciehp_query_power_fault(struct controller *ctrl)
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/4][SRU][D][E][OEM-B-OSP1] bdi: Do not use freezable workqueue

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

A removable block device, such as NVMe or SSD connected over Thunderbolt
can be hot-removed any time including when the system is suspended. When
device is hot-removed during suspend and the system gets resumed, kernel
first resumes devices and then thaws the userspace including freezable
workqueues. What happens in that case is that the NVMe driver notices
that the device is unplugged and removes it from the system. This ends
up calling bdi_unregister() for the gendisk which then schedules
wb_workfn() to be run one more time.

However, since the bdi_wq is still frozen flush_delayed_work() call in
wb_shutdown() blocks forever halting system resume process. User sees
this as hang as nothing is happening anymore.

Triggering sysrq-w reveals this:

  Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
  Call Trace:
   ? __schedule+0x2c5/0x630
   ? wait_for_completion+0xa4/0x120
   schedule+0x3e/0xc0
   schedule_timeout+0x1c9/0x320
   ? resched_curr+0x1f/0xd0
   ? wait_for_completion+0xa4/0x120
   wait_for_completion+0xc3/0x120
   ? wake_up_q+0x60/0x60
   __flush_work+0x131/0x1e0
   ? flush_workqueue_prep_pwqs+0x130/0x130
   bdi_unregister+0xb9/0x130
   del_gendisk+0x2d2/0x2e0
   nvme_ns_remove+0xed/0x110 [nvme_core]
   nvme_remove_namespaces+0x96/0xd0 [nvme_core]
   nvme_remove+0x5b/0x160 [nvme]
   pci_device_remove+0x36/0x90
   device_release_driver_internal+0xdf/0x1c0
   nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
   process_one_work+0x1c2/0x3f0
   worker_thread+0x48/0x3e0
   kthread+0x100/0x140
   ? current_work+0x30/0x30
   ? kthread_park+0x80/0x80
   ret_from_fork+0x35/0x40

This is not limited to NVMes so exactly same issue can be reproduced by
hot-removing SSD (over Thunderbolt) while the system is suspended.

Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.

Reported-by: AceLan Kao <[hidden email]>
Link: https://marc.info/?l=linux-kernel&m=138695698516487
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385
Link: https://lore.kernel.org/lkml/20191002122136.GD2819@.../#t
Acked-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: Mika Westerberg <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
(cherry picked from commit a2b90f11217790ec0964ba9c93a4abb369758c26)
Signed-off-by: AceLan Kao <[hidden email]>
---
 mm/backing-dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 72e6d0c55cfa..00277c8855c6 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -249,8 +249,8 @@ static int __init default_bdi_init(void)
 {
  int err;
 
- bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_FREEZABLE |
-      WQ_UNBOUND | WQ_SYSFS, 0);
+ bdi_wq = alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_UNBOUND |
+ WQ_SYSFS, 0);
  if (!bdi_wq)
  return -ENOMEM;
 
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/4][SRU][D][E][OEM-B-OSP1] Revert "libata, freezer: avoid block device removal while system is frozen"

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

This reverts commit 85fbd722ad0f5d64d1ad15888cd1eb2188bfb557.

The commit was added as a quick band-aid for a hang that happened when a
block device was removed during system suspend. Now that bdi_wq is not
freezable anymore the hang should not be possible and we can get rid of
this hack by reverting it.

Acked-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: Mika Westerberg <[hidden email]>
Signed-off-by: Jens Axboe <[hidden email]>
(cherry picked from commit 0e48f51cbbfbdb79149806de14dcb8bf0f01ca94)
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/ata/libata-scsi.c | 21 ---------------------
 kernel/freezer.c          |  6 ------
 2 files changed, 27 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 39f401c02b43..d910a3573501 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4808,27 +4808,6 @@ void ata_scsi_hotplug(struct work_struct *work)
  return;
  }
 
- /*
- * XXX - UGLY HACK
- *
- * The block layer suspend/resume path is fundamentally broken due
- * to freezable kthreads and workqueue and may deadlock if a block
- * device gets removed while resume is in progress.  I don't know
- * what the solution is short of removing freezable kthreads and
- * workqueues altogether.
- *
- * The following is an ugly hack to avoid kicking off device
- * removal while freezer is active.  This is a joke but does avoid
- * this particular deadlock scenario.
- *
- * https://bugzilla.kernel.org/show_bug.cgi?id=62801
- * http://marc.info/?l=linux-kernel&m=138695698516487
- */
-#ifdef CONFIG_FREEZER
- while (pm_freezing)
- msleep(10);
-#endif
-
  DPRINTK("ENTER\n");
  mutex_lock(&ap->scsi_scan_mutex);
 
diff --git a/kernel/freezer.c b/kernel/freezer.c
index b162b74611e4..bc6d55f6dce4 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -21,12 +21,6 @@ EXPORT_SYMBOL(system_freezing_cnt);
 bool pm_freezing;
 bool pm_nosig_freezing;
 
-/*
- * Temporary export for the deadlock workaround in ata_scsi_hotplug().
- * Remove once the hack becomes unnecessary.
- */
-EXPORT_SYMBOL_GPL(pm_freezing);
-
 /* protects freezing and frozen transitions */
 static DEFINE_SPINLOCK(freezer_lock);
 
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/4][SRU][D][E][OEM-B-OSP1] UBUNTU: SAUCE: PCI: pciehp: Do not disable interrupt twice on suspend

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

We try to keep PCIe hotplug ports runtime suspended when entering system
suspend. Due to the fact that the PCIe portdrv sets NEVER_SKIP driver PM
flag the PM core always calls system suspend/resume hooks even if the
device is left runtime suspended. Since PCIe hotplug driver re-uses the
same function for both it ends up disabling hotplug interrupt twice and
the second time following is printed:

  pciehp 0000:03:01.0:pcie204: pcie_do_write_cmd: no response from device

Prevent this from happening by checking whether the device is already
runtime suspended when system suspend hook is called.

Link: https://patchwork.kernel.org/patch/11089971/
Fixes: 9c62f0bfb832 ("PCI: pciehp: Implement runtime PM callbacks")
Reported-by: Kai-Heng Feng <[hidden email]>
Tested-by: Kai-Heng Feng <[hidden email]>
Signed-off-by: Mika Westerberg <[hidden email]>
Reviewed-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/hotplug/pciehp_core.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index fc5366b50e95..d7af3db774a7 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -250,7 +250,7 @@ static bool pme_is_native(struct pcie_device *dev)
  return pcie_ports_native || host->native_pme;
 }
 
-static int pciehp_suspend(struct pcie_device *dev)
+static void pciehp_disable_interrupt(struct pcie_device *dev)
 {
  /*
  * Disable hotplug interrupt so that it does not trigger
@@ -258,7 +258,19 @@ static int pciehp_suspend(struct pcie_device *dev)
  */
  if (pme_is_native(dev))
  pcie_disable_interrupt(get_service_data(dev));
+}
 
+#ifdef CONFIG_PM_SLEEP
+static int pciehp_suspend(struct pcie_device *dev)
+{
+ /*
+ * If the port is already runtime suspended we can keep it that
+ * way.
+ */
+ if (dev_pm_smart_suspend_and_suspended(&dev->port->dev))
+ return 0;
+
+ pciehp_disable_interrupt(dev);
  return 0;
 }
 
@@ -276,6 +288,7 @@ static int pciehp_resume_noirq(struct pcie_device *dev)
 
  return 0;
 }
+#endif
 
 static int pciehp_resume(struct pcie_device *dev)
 {
@@ -289,6 +302,12 @@ static int pciehp_resume(struct pcie_device *dev)
  return 0;
 }
 
+static int pciehp_runtime_suspend(struct pcie_device *dev)
+{
+ pciehp_disable_interrupt(dev);
+ return 0;
+}
+
 static int pciehp_runtime_resume(struct pcie_device *dev)
 {
  struct controller *ctrl = get_service_data(dev);
@@ -315,10 +334,12 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
  .remove = pciehp_remove,
 
 #ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
  .suspend = pciehp_suspend,
  .resume_noirq = pciehp_resume_noirq,
  .resume = pciehp_resume,
- .runtime_suspend = pciehp_suspend,
+#endif
+ .runtime_suspend = pciehp_runtime_suspend,
  .runtime_resume = pciehp_runtime_resume,
 #endif /* PM */
 };
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/4][SRU][D][E][OSM-B-OSP1] UBUNTU: SAUCE: PCI: pciehp: Prevent deadlock on disconnect

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

If there are more than one PCIe switch with hotplug downstream ports
hot-removing them leads to a following deadlock:

 INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 irq/126-pciehp  D    0   198      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_timeout+0x246/0x350
  ? ttwu_do_activate+0x67/0x90
  wait_for_completion+0xb7/0x140
  ? wake_up_q+0x80/0x80
  kthread_stop+0x49/0x110
  __free_irq+0x15c/0x2a0
  free_irq+0x32/0x70
  pcie_shutdown_notification+0x2f/0x50
  pciehp_remove+0x27/0x50
  pcie_port_remove_service+0x36/0x50
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  bus_remove_device+0xec/0x160
  device_del+0x13b/0x350
  ? pcie_port_find_device+0x60/0x60
  device_unregister+0x1a/0x60
  remove_iter+0x1e/0x30
  device_for_each_child+0x56/0x90
  pcie_port_device_remove+0x22/0x40
  pcie_portdrv_remove+0x20/0x60
  pci_device_remove+0x3e/0xc0
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  pci_stop_bus_device+0x6f/0x90
  pci_stop_bus_device+0x31/0x90
  pci_stop_and_remove_bus_device+0x12/0x20
  pciehp_unconfigure_device+0x88/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
 irq/190-pciehp  D    0  2288      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_preempt_disabled+0xe/0x10
  __mutex_lock.isra.9+0x2e0/0x4d0
  ? __mutex_lock_slowpath+0x13/0x20
  __mutex_lock_slowpath+0x13/0x20
  mutex_lock+0x2c/0x30
  pci_lock_rescan_remove+0x15/0x20
  pciehp_unconfigure_device+0x4d/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40

What happens here is that the whole hierarchy is runtime resumed and the
parent PCIe downstream port, who got the hot-remove event, starts
removing devices below it taking pci_lock_rescan_remove() lock. When the
child PCIe port is runtime resumed it calls pciehp_check_presence()
which ends up calling pciehp_card_present() and pciehp_check_link_active().
Both of these read their parts of PCIe config space by calling helper
function pcie_capability_read_word(). Now, this function notices that
the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND
with the capability value set to 0. When pciehp gets this value it
thinks that its child device is also hot-removed and schedules its IRQ
thread to handle the event.

The deadlock happens when the child's IRQ thread runs and tries to
acquire pci_lock_rescan_remove() which is already taken by the parent
and the parent waits for the child's IRQ thread to finish.

We can prevent this from happening by checking the return value of
pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
performing any hot-removal activities.

Link: https://patchwork.kernel.org/patch/11089973/
Signed-off-by: Mika Westerberg <[hidden email]>
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/hotplug/pciehp.h      |  6 +++---
 drivers/pci/hotplug/pciehp_core.c | 11 ++++++++---
 drivers/pci/hotplug/pciehp_ctrl.c |  4 ++--
 drivers/pci/hotplug/pciehp_hpc.c  | 32 +++++++++++++++++++++++--------
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 506e1d923a1f..f68278eae061 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -188,10 +188,10 @@ int pciehp_query_power_fault(struct controller *ctrl);
 void pciehp_green_led_on(struct controller *ctrl);
 void pciehp_green_led_off(struct controller *ctrl);
 void pciehp_green_led_blink(struct controller *ctrl);
-bool pciehp_card_present(struct controller *ctrl);
-bool pciehp_card_present_or_link_active(struct controller *ctrl);
+int pciehp_card_present(struct controller *ctrl);
+int pciehp_card_present_or_link_active(struct controller *ctrl);
 int pciehp_check_link_status(struct controller *ctrl);
-bool pciehp_check_link_active(struct controller *ctrl);
+int pciehp_check_link_active(struct controller *ctrl);
 void pciehp_release_ctrl(struct controller *ctrl);
 
 int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index d7af3db774a7..07bb124bb243 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -136,10 +136,15 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 {
  struct controller *ctrl = to_ctrl(hotplug_slot);
  struct pci_dev *pdev = ctrl->pcie->port;
+ int ret;
 
  pci_config_pm_runtime_get(pdev);
- *value = pciehp_card_present_or_link_active(ctrl);
+ ret = pciehp_card_present_or_link_active(ctrl);
  pci_config_pm_runtime_put(pdev);
+ if (ret < 0)
+ return ret;
+
+ *value = ret;
  return 0;
 }
 
@@ -155,13 +160,13 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
  */
 static void pciehp_check_presence(struct controller *ctrl)
 {
- bool occupied;
+ int occupied;
 
  down_read(&ctrl->reset_lock);
  mutex_lock(&ctrl->state_lock);
 
  occupied = pciehp_card_present_or_link_active(ctrl);
- if ((occupied && (ctrl->state == OFF_STATE ||
+ if ((occupied > 0 && (ctrl->state == OFF_STATE ||
   ctrl->state == BLINKINGON_STATE)) ||
     (!occupied && (ctrl->state == ON_STATE ||
    ctrl->state == BLINKINGOFF_STATE)))
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 905282a8ddaa..d8cc6002b96f 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -219,7 +219,7 @@ void pciehp_handle_disable_request(struct controller *ctrl)
 
 void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 {
- bool present, link_active;
+ int present, link_active;
 
  /*
  * If the slot is on and presence or link has changed, turn it off.
@@ -250,7 +250,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
  mutex_lock(&ctrl->state_lock);
  present = pciehp_card_present(ctrl);
  link_active = pciehp_check_link_active(ctrl);
- if (!present && !link_active) {
+ if (present <= 0 && link_active <= 0) {
  mutex_unlock(&ctrl->state_lock);
  return;
  }
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 8bfcb8cd0900..71ca9267eb6f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -199,13 +199,16 @@ static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask)
  pcie_do_write_cmd(ctrl, cmd, mask, false);
 }
 
-bool pciehp_check_link_active(struct controller *ctrl)
+int pciehp_check_link_active(struct controller *ctrl)
 {
  struct pci_dev *pdev = ctrl_dev(ctrl);
  u16 lnk_status;
- bool ret;
+ int ret;
+
+ ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+ if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+ return -ENODEV;
 
- pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
  ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
 
  if (ret)
@@ -371,13 +374,17 @@ void pciehp_get_latch_status(struct controller *ctrl, u8 *status)
  *status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
 }
 
-bool pciehp_card_present(struct controller *ctrl)
+int pciehp_card_present(struct controller *ctrl)
 {
  struct pci_dev *pdev = ctrl_dev(ctrl);
  u16 slot_status;
+ int ret;
 
- pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
- return slot_status & PCI_EXP_SLTSTA_PDS;
+ ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
+ if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+ return -ENODEV;
+
+ return !!(slot_status & PCI_EXP_SLTSTA_PDS);
 }
 
 /**
@@ -388,10 +395,19 @@ bool pciehp_card_present(struct controller *ctrl)
  * Presence Detect State bit, this helper also returns true if the Link Active
  * bit is set.  This is a concession to broken hotplug ports which hardwire
  * Presence Detect State to zero, such as Wilocity's [1ae9:0200].
+ *
+ * Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug
+ *    port is not present anymore returns %-ENODEV.
  */
-bool pciehp_card_present_or_link_active(struct controller *ctrl)
+int pciehp_card_present_or_link_active(struct controller *ctrl)
 {
- return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
+ int ret;
+
+ ret = pciehp_card_present(ctrl);
+ if (ret)
+ return ret;
+
+ return pciehp_check_link_active(ctrl);
 }
 
 int pciehp_query_power_fault(struct controller *ctrl)
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/2][SRU][UNSTABLE] UBUNTU: SAUCE: PCI: pciehp: Do not disable interrupt twice on suspend

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

We try to keep PCIe hotplug ports runtime suspended when entering system
suspend. Due to the fact that the PCIe portdrv sets NEVER_SKIP driver PM
flag the PM core always calls system suspend/resume hooks even if the
device is left runtime suspended. Since PCIe hotplug driver re-uses the
same function for both it ends up disabling hotplug interrupt twice and
the second time following is printed:

  pciehp 0000:03:01.0:pcie204: pcie_do_write_cmd: no response from device

Prevent this from happening by checking whether the device is already
runtime suspended when system suspend hook is called.

Link: https://patchwork.kernel.org/patch/11089971/
Fixes: 9c62f0bfb832 ("PCI: pciehp: Implement runtime PM callbacks")
Reported-by: Kai-Heng Feng <[hidden email]>
Tested-by: Kai-Heng Feng <[hidden email]>
Signed-off-by: Mika Westerberg <[hidden email]>
Reviewed-by: Rafael J. Wysocki <[hidden email]>
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/hotplug/pciehp_core.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index b3122c151b80..56daad828c9e 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -253,7 +253,7 @@ static bool pme_is_native(struct pcie_device *dev)
  return pcie_ports_native || host->native_pme;
 }
 
-static int pciehp_suspend(struct pcie_device *dev)
+static void pciehp_disable_interrupt(struct pcie_device *dev)
 {
  /*
  * Disable hotplug interrupt so that it does not trigger
@@ -261,7 +261,19 @@ static int pciehp_suspend(struct pcie_device *dev)
  */
  if (pme_is_native(dev))
  pcie_disable_interrupt(get_service_data(dev));
+}
 
+#ifdef CONFIG_PM_SLEEP
+static int pciehp_suspend(struct pcie_device *dev)
+{
+ /*
+ * If the port is already runtime suspended we can keep it that
+ * way.
+ */
+ if (dev_pm_smart_suspend_and_suspended(&dev->port->dev))
+ return 0;
+
+ pciehp_disable_interrupt(dev);
  return 0;
 }
 
@@ -279,6 +291,7 @@ static int pciehp_resume_noirq(struct pcie_device *dev)
 
  return 0;
 }
+#endif
 
 static int pciehp_resume(struct pcie_device *dev)
 {
@@ -292,6 +305,12 @@ static int pciehp_resume(struct pcie_device *dev)
  return 0;
 }
 
+static int pciehp_runtime_suspend(struct pcie_device *dev)
+{
+ pciehp_disable_interrupt(dev);
+ return 0;
+}
+
 static int pciehp_runtime_resume(struct pcie_device *dev)
 {
  struct controller *ctrl = get_service_data(dev);
@@ -318,10 +337,12 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
  .remove = pciehp_remove,
 
 #ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
  .suspend = pciehp_suspend,
  .resume_noirq = pciehp_resume_noirq,
  .resume = pciehp_resume,
- .runtime_suspend = pciehp_suspend,
+#endif
+ .runtime_suspend = pciehp_runtime_suspend,
  .runtime_resume = pciehp_runtime_resume,
 #endif /* PM */
 };
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/2][SRU][UNSTABLE] UBUNTU: SAUCE: PCI: pciehp: Prevent deadlock on disconnect

AceLan Kao
In reply to this post by AceLan Kao
From: Mika Westerberg <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1849269

If there are more than one PCIe switch with hotplug downstream ports
hot-removing them leads to a following deadlock:

 INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 irq/126-pciehp  D    0   198      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_timeout+0x246/0x350
  ? ttwu_do_activate+0x67/0x90
  wait_for_completion+0xb7/0x140
  ? wake_up_q+0x80/0x80
  kthread_stop+0x49/0x110
  __free_irq+0x15c/0x2a0
  free_irq+0x32/0x70
  pcie_shutdown_notification+0x2f/0x50
  pciehp_remove+0x27/0x50
  pcie_port_remove_service+0x36/0x50
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  bus_remove_device+0xec/0x160
  device_del+0x13b/0x350
  ? pcie_port_find_device+0x60/0x60
  device_unregister+0x1a/0x60
  remove_iter+0x1e/0x30
  device_for_each_child+0x56/0x90
  pcie_port_device_remove+0x22/0x40
  pcie_portdrv_remove+0x20/0x60
  pci_device_remove+0x3e/0xc0
  device_release_driver_internal+0x18c/0x250
  device_release_driver+0x12/0x20
  pci_stop_bus_device+0x6f/0x90
  pci_stop_bus_device+0x31/0x90
  pci_stop_and_remove_bus_device+0x12/0x20
  pciehp_unconfigure_device+0x88/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
 irq/190-pciehp  D    0  2288      2 0x80000000
 Call Trace:
  __schedule+0x2a2/0x880
  schedule+0x2c/0x80
  schedule_preempt_disabled+0xe/0x10
  __mutex_lock.isra.9+0x2e0/0x4d0
  ? __mutex_lock_slowpath+0x13/0x20
  __mutex_lock_slowpath+0x13/0x20
  mutex_lock+0x2c/0x30
  pci_lock_rescan_remove+0x15/0x20
  pciehp_unconfigure_device+0x4d/0x140
  pciehp_disable_slot+0x6a/0x110
  pciehp_handle_presence_or_link_change+0x263/0x400
  pciehp_ist+0x1c9/0x1d0
  ? irq_forced_thread_fn+0x80/0x80
  irq_thread_fn+0x24/0x60
  irq_thread+0xeb/0x190
  ? irq_thread_fn+0x60/0x60
  kthread+0x120/0x140
  ? irq_thread_check_affinity+0xf0/0xf0
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40

What happens here is that the whole hierarchy is runtime resumed and the
parent PCIe downstream port, who got the hot-remove event, starts
removing devices below it taking pci_lock_rescan_remove() lock. When the
child PCIe port is runtime resumed it calls pciehp_check_presence()
which ends up calling pciehp_card_present() and pciehp_check_link_active().
Both of these read their parts of PCIe config space by calling helper
function pcie_capability_read_word(). Now, this function notices that
the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND
with the capability value set to 0. When pciehp gets this value it
thinks that its child device is also hot-removed and schedules its IRQ
thread to handle the event.

The deadlock happens when the child's IRQ thread runs and tries to
acquire pci_lock_rescan_remove() which is already taken by the parent
and the parent waits for the child's IRQ thread to finish.

We can prevent this from happening by checking the return value of
pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
performing any hot-removal activities.

Link: https://patchwork.kernel.org/patch/11089973/
Signed-off-by: Mika Westerberg <[hidden email]>
Signed-off-by: AceLan Kao <[hidden email]>
---
 drivers/pci/hotplug/pciehp.h      |  6 +++---
 drivers/pci/hotplug/pciehp_core.c | 11 ++++++++---
 drivers/pci/hotplug/pciehp_ctrl.c |  4 ++--
 drivers/pci/hotplug/pciehp_hpc.c  | 32 +++++++++++++++++++++++--------
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 654c972b8ea0..afea59a3aad2 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -172,10 +172,10 @@ void pciehp_set_indicators(struct controller *ctrl, int pwr, int attn);
 
 void pciehp_get_latch_status(struct controller *ctrl, u8 *status);
 int pciehp_query_power_fault(struct controller *ctrl);
-bool pciehp_card_present(struct controller *ctrl);
-bool pciehp_card_present_or_link_active(struct controller *ctrl);
+int pciehp_card_present(struct controller *ctrl);
+int pciehp_card_present_or_link_active(struct controller *ctrl);
 int pciehp_check_link_status(struct controller *ctrl);
-bool pciehp_check_link_active(struct controller *ctrl);
+int pciehp_check_link_active(struct controller *ctrl);
 void pciehp_release_ctrl(struct controller *ctrl);
 
 int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 56daad828c9e..312cc45c44c7 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -139,10 +139,15 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 {
  struct controller *ctrl = to_ctrl(hotplug_slot);
  struct pci_dev *pdev = ctrl->pcie->port;
+ int ret;
 
  pci_config_pm_runtime_get(pdev);
- *value = pciehp_card_present_or_link_active(ctrl);
+ ret = pciehp_card_present_or_link_active(ctrl);
  pci_config_pm_runtime_put(pdev);
+ if (ret < 0)
+ return ret;
+
+ *value = ret;
  return 0;
 }
 
@@ -158,13 +163,13 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
  */
 static void pciehp_check_presence(struct controller *ctrl)
 {
- bool occupied;
+ int occupied;
 
  down_read(&ctrl->reset_lock);
  mutex_lock(&ctrl->state_lock);
 
  occupied = pciehp_card_present_or_link_active(ctrl);
- if ((occupied && (ctrl->state == OFF_STATE ||
+ if ((occupied > 0 && (ctrl->state == OFF_STATE ||
   ctrl->state == BLINKINGON_STATE)) ||
     (!occupied && (ctrl->state == ON_STATE ||
    ctrl->state == BLINKINGOFF_STATE)))
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 21af7b16d7a4..c760a13ec7b1 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -226,7 +226,7 @@ void pciehp_handle_disable_request(struct controller *ctrl)
 
 void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 {
- bool present, link_active;
+ int present, link_active;
 
  /*
  * If the slot is on and presence or link has changed, turn it off.
@@ -257,7 +257,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
  mutex_lock(&ctrl->state_lock);
  present = pciehp_card_present(ctrl);
  link_active = pciehp_check_link_active(ctrl);
- if (!present && !link_active) {
+ if (present <= 0 && link_active <= 0) {
  mutex_unlock(&ctrl->state_lock);
  return;
  }
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 1a522c1c4177..028bad8d5728 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -201,13 +201,16 @@ static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask)
  pcie_do_write_cmd(ctrl, cmd, mask, false);
 }
 
-bool pciehp_check_link_active(struct controller *ctrl)
+int pciehp_check_link_active(struct controller *ctrl)
 {
  struct pci_dev *pdev = ctrl_dev(ctrl);
  u16 lnk_status;
- bool ret;
+ int ret;
+
+ ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
+ if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+ return -ENODEV;
 
- pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
  ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
 
  if (ret)
@@ -373,13 +376,17 @@ void pciehp_get_latch_status(struct controller *ctrl, u8 *status)
  *status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
 }
 
-bool pciehp_card_present(struct controller *ctrl)
+int pciehp_card_present(struct controller *ctrl)
 {
  struct pci_dev *pdev = ctrl_dev(ctrl);
  u16 slot_status;
+ int ret;
 
- pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
- return slot_status & PCI_EXP_SLTSTA_PDS;
+ ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
+ if (ret == PCIBIOS_DEVICE_NOT_FOUND)
+ return -ENODEV;
+
+ return !!(slot_status & PCI_EXP_SLTSTA_PDS);
 }
 
 /**
@@ -390,10 +397,19 @@ bool pciehp_card_present(struct controller *ctrl)
  * Presence Detect State bit, this helper also returns true if the Link Active
  * bit is set.  This is a concession to broken hotplug ports which hardwire
  * Presence Detect State to zero, such as Wilocity's [1ae9:0200].
+ *
+ * Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug
+ *    port is not present anymore returns %-ENODEV.
  */
-bool pciehp_card_present_or_link_active(struct controller *ctrl)
+int pciehp_card_present_or_link_active(struct controller *ctrl)
 {
- return pciehp_card_present(ctrl) || pciehp_check_link_active(ctrl);
+ int ret;
+
+ ret = pciehp_card_present(ctrl);
+ if (ret)
+ return ret;
+
+ return pciehp_check_link_active(ctrl);
 }
 
 int pciehp_query_power_fault(struct controller *ctrl)
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED [OEM-OSP1-B]Re: [PATCH 0/4][SRU][D][E][F][OEM-B][OEM-B-OSP1] The system cannot resume from S3 if user unplugs the TB16 during suspend state

Timo Aaltonen-6
In reply to this post by AceLan Kao
On 23.10.2019 6.14, AceLan Kao wrote:
> BugLink: https://bugs.launchpad.net/bugs/1849269
>
> [Impact]
> Unplug TBT dock during S3 and then wakes up the system leads to system
> hangs.
> This not only happens on Dell TB16, but also Dell WD19TB. It's more like
> a BIOS or TBT firmware issue that TBT firmware doesn't re-create the PCIe
> tunnels.

applied to osp1 oem-next, thanks


--
t

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/4][SRU][D][E][F][OEM-B][OEM-B-OSP1] The system cannot resume from S3 if user unplugs the TB16 during suspend state

Seth Forshee
In reply to this post by AceLan Kao
On Wed, Oct 23, 2019 at 11:14:59AM +0800, AceLan Kao wrote:

> BugLink: https://bugs.launchpad.net/bugs/1849269
>
> [Impact]
> Unplug TBT dock during S3 and then wakes up the system leads to system
> hangs.
> This not only happens on Dell TB16, but also Dell WD19TB. It's more like
> a BIOS or TBT firmware issue that TBT firmware doesn't re-create the PCIe
> tunnels.
>
> [Fix]
> Mika provides 4 patches to fix this issue, but only 2 are accepted and are
> included in linus' tree. We need especially the 4th commit to fix the
> issue, so we keep the 3rd and 4th commits as sauce patches.
>
> [Test]
> Verified on Dell Precision 5530 + WD19TB
>
> [Regression Potential]
> Low, those patches doesn't change the code path much. The 4th commit adds
> some checks to return failure state, and it should be safe for those
> machines which is working well.

I was reading through the upstream thread, and as of a week ago it
appeared that they were discussing a change to the
pciehp_check_link_active() changes, which I do not see in the patches
you sent. Do you know if another iteration of the patches is
forthcoming, and if so shouldn't we wait and take that version?

Thanks,
Seth

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/4][SRU][D][E][F][OEM-B][OEM-B-OSP1] The system cannot resume from S3 if user unplugs the TB16 during suspend state

AceLan Kao
Hi Seth,

I think you are mentioning the 4th commit.
https://lists.ubuntu.com/archives/kernel-team/2019-October/104951.html
It change the return type of pciehp_check_link_active() from bool to
int to indicate different state.

AFAIK, there is no other thread discuss this change, and Mika doesn't
reply to us if he will keep pushing the changes to upstream.
When I asked him if those commits introduce any regression he can
think of, he just reply "I don't think there is any risk of
regressions."

Best regards,
AceLan Kao.

Seth Forshee <[hidden email]> 於 2019年11月1日 週五 上午3:49寫道:

>
> On Wed, Oct 23, 2019 at 11:14:59AM +0800, AceLan Kao wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1849269
> >
> > [Impact]
> > Unplug TBT dock during S3 and then wakes up the system leads to system
> > hangs.
> > This not only happens on Dell TB16, but also Dell WD19TB. It's more like
> > a BIOS or TBT firmware issue that TBT firmware doesn't re-create the PCIe
> > tunnels.
> >
> > [Fix]
> > Mika provides 4 patches to fix this issue, but only 2 are accepted and are
> > included in linus' tree. We need especially the 4th commit to fix the
> > issue, so we keep the 3rd and 4th commits as sauce patches.
> >
> > [Test]
> > Verified on Dell Precision 5530 + WD19TB
> >
> > [Regression Potential]
> > Low, those patches doesn't change the code path much. The 4th commit adds
> > some checks to return failure state, and it should be safe for those
> > machines which is working well.
>
> I was reading through the upstream thread, and as of a week ago it
> appeared that they were discussing a change to the
> pciehp_check_link_active() changes, which I do not see in the patches
> you sent. Do you know if another iteration of the patches is
> forthcoming, and if so shouldn't we wait and take that version?
>
> Thanks,
> Seth

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

NAK: [PATCH 0/4][SRU][D][E][F][OEM-B][OEM-B-OSP1] The system cannot resume from S3 if user unplugs the TB16 during suspend state

Seth Forshee
On Fri, Nov 01, 2019 at 11:46:48AM +0800, AceLan Kao wrote:

> Hi Seth,
>
> I think you are mentioning the 4th commit.
> https://lists.ubuntu.com/archives/kernel-team/2019-October/104951.html
> It change the return type of pciehp_check_link_active() from bool to
> int to indicate different state.
>
> AFAIK, there is no other thread discuss this change, and Mika doesn't
> reply to us if he will keep pushing the changes to upstream.
> When I asked him if those commits introduce any regression he can
> think of, he just reply "I don't think there is any risk of
> regressions."

I'm still not entirely comfortable with these as sauce patches, as they
look like they could complicate backporting other fixes. But the impact
on affected systems is pretty severe, so we probably should take them.

However, it looks like these are the v2 patches and that there is a v3
upstream:

https://lore.kernel.org/lkml/20191029170022.57528-1-mika.westerberg@.../

So I think these should be refreshed to use the newer patches.

Thanks,
Seth

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team