[PATCH 0/4][Xenial SRU] cxlflash fixes for xenial

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/4][Xenial SRU] cxlflash fixes for xenial

Seth Forshee
BugLink: http://bugs.launchpad.net/bugs/1623750

Fixes for cxlflash for xenial. All patches are clean cherry picks from
4.9.

Thanks,
Seth

Matthew R. Ochs (2):
  scsi: cxlflash: Fix to avoid EEH and host reset collisions
  scsi: cxlflash: Improve EEH recovery time

Uma Krishnan (2):
  scsi: cxlflash: Scan host only after the port is ready for I/O
  scsi: cxlflash: Remove the device cleanly in the system shutdown path

 drivers/scsi/cxlflash/main.c | 41 ++++++++++++++++++++++++++---------------
 1 file changed, 26 insertions(+), 15 deletions(-)


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/4][Xenial SRU] scsi: cxlflash: Scan host only after the port is ready for I/O

Seth Forshee
From: Uma Krishnan <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1623750

When a port link is established, the AFU sends a 'link up' interrupt.
After the link is up, corresponding initialization steps are performed
on the card. Following that, when the card is ready for I/O, the AFU
sends 'login succeeded' interrupt. Today, cxlflash invokes
scsi_scan_host() upon receipt of both interrupts.

SCSI commands sent to the port prior to the 'login succeeded' interrupt
will fail with 'port not available' error. This is not desirable.
Moreover, when async_scan is active for the host, subsequent scan calls
are terminated with error. Due to this, the scsi_scan_host() call
performed after 'login succeeded' interrupt could portentially return
error and the devices may not be scanned properly.

To avoid this problem, scsi_scan_host() should be called only after the
'login succeeded' interrupt.

Signed-off-by: Uma Krishnan <[hidden email]>
Acked-by: Matthew R. Ochs <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit bbbfae962b7c221237c0f92547ee0c83f7204747)
Signed-off-by: Seth Forshee <[hidden email]>
---
 drivers/scsi/cxlflash/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 661bb94e2548..b063c41bf673 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -1187,7 +1187,7 @@ static const struct asyc_intr_info ainfo[] = {
  {SISL_ASTATUS_FC0_LOGI_F, "login failed", 0, CLR_FC_ERROR},
  {SISL_ASTATUS_FC0_LOGI_S, "login succeeded", 0, SCAN_HOST},
  {SISL_ASTATUS_FC0_LINK_DN, "link down", 0, 0},
- {SISL_ASTATUS_FC0_LINK_UP, "link up", 0, SCAN_HOST},
+ {SISL_ASTATUS_FC0_LINK_UP, "link up", 0, 0},
  {SISL_ASTATUS_FC1_OTHER, "other error", 1, CLR_FC_ERROR | LINK_RESET},
  {SISL_ASTATUS_FC1_LOGO, "target initiated LOGO", 1, 0},
  {SISL_ASTATUS_FC1_CRC_T, "CRC threshold exceeded", 1, LINK_RESET},
@@ -1195,7 +1195,7 @@ static const struct asyc_intr_info ainfo[] = {
  {SISL_ASTATUS_FC1_LOGI_F, "login failed", 1, CLR_FC_ERROR},
  {SISL_ASTATUS_FC1_LOGI_S, "login succeeded", 1, SCAN_HOST},
  {SISL_ASTATUS_FC1_LINK_DN, "link down", 1, 0},
- {SISL_ASTATUS_FC1_LINK_UP, "link up", 1, SCAN_HOST},
+ {SISL_ASTATUS_FC1_LINK_UP, "link up", 1, 0},
  {0x0, "", 0, 0} /* terminator */
 };
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/4][Xenial SRU] scsi: cxlflash: Remove the device cleanly in the system shutdown path

Seth Forshee
In reply to this post by Seth Forshee
From: Uma Krishnan <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1623750

Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
cards") was recently introduced to notify the AFU when a system is going
down. Due to the position of the cxlflash driver in the device stack,
cxlflash devices are _always_ removed during a reboot/shutdown. This can
lead to a crash if the cxlflash shutdown hook is invoked _after_ the
shutdown hook for the owning virtual PHB. Furthermore, the current
implementation of shutdown/remove hooks for cxlflash are not tolerant to
being invoked when the device is not enabled. This can also lead to a
crash in situations where the remove hook is invoked after the device
has been removed via the vPHBs shutdown hook. An example of this
scenario would be an EEH reset failure while a reboot/shutdown is in
progress.

To solve both problems, the shutdown hook for cxlflash is updated to
simply remove the device. This path already includes the AFU
notification and thus this solution will continue to perform the
original intent. At the same time, the remove hook is updated to protect
against being called when the device is not enabled.

Fixes: 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
cards")
Signed-off-by: Uma Krishnan <[hidden email]>
Acked-by: Matthew R. Ochs <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>

(cherry picked from commit babf985d1e1b0677cb264acd01319d2b9c8f4327)
Signed-off-by: Seth Forshee <[hidden email]>
---
 drivers/scsi/cxlflash/main.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index b063c41bf673..4c2559adf723 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -823,17 +823,6 @@ static void notify_shutdown(struct cxlflash_cfg *cfg, bool wait)
 }
 
 /**
- * cxlflash_shutdown() - shutdown handler
- * @pdev: PCI device associated with the host.
- */
-static void cxlflash_shutdown(struct pci_dev *pdev)
-{
- struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
-
- notify_shutdown(cfg, false);
-}
-
-/**
  * cxlflash_remove() - PCI entry point to tear down host
  * @pdev: PCI device associated with the host.
  *
@@ -844,6 +833,11 @@ static void cxlflash_remove(struct pci_dev *pdev)
  struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
  ulong lock_flags;
 
+ if (!pci_is_enabled(pdev)) {
+ pr_debug("%s: Device is disabled\n", __func__);
+ return;
+ }
+
  /* If a Task Management Function is active, wait for it to complete
  * before continuing with remove.
  */
@@ -2685,7 +2679,7 @@ static struct pci_driver cxlflash_driver = {
  .id_table = cxlflash_pci_table,
  .probe = cxlflash_probe,
  .remove = cxlflash_remove,
- .shutdown = cxlflash_shutdown,
+ .shutdown = cxlflash_remove,
  .err_handler = &cxlflash_err_handler,
 };
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/4][Xenial SRU] scsi: cxlflash: Fix to avoid EEH and host reset collisions

Seth Forshee
In reply to this post by Seth Forshee
From: "Matthew R. Ochs" <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1623750

The EEH reset handler is ignorant to the current state of the driver
when processing a frozen event and initiating a device reset. This can
be an issue if an EEH event occurs while a user or stack initiated reset
is executing. More specifically, if an EEH occurs while the SCSI host
reset handler is active, the reset initiated by the EEH thread will
likely collide with the host reset thread. This can leave the device in
an inconsistent state, or worse, cause a system crash.

As a remedy, the EEH handler is updated to evaluate the device state and
take appropriate action (proceed, wait, or disconnect host). The host
reset handler is also updated to handle situations where an EEH occurred
during a host reset. In such situations, the host reset handler will
delay reporting back a success to give the EEH reset an opportunity to
complete.

Signed-off-by: Matthew R. Ochs <[hidden email]>
Acked-by: Uma Krishnan <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit 1d3324c382b1a617eb567e3650dcb51f22dfec9a)
Signed-off-by: Seth Forshee <[hidden email]>
---
 drivers/scsi/cxlflash/main.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 4c2559adf723..4ef523505364 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -2042,6 +2042,11 @@ retry:
  * cxlflash_eh_host_reset_handler() - reset the host adapter
  * @scp: SCSI command from stack identifying host.
  *
+ * Following a reset, the state is evaluated again in case an EEH occurred
+ * during the reset. In such a scenario, the host reset will either yield
+ * until the EEH recovery is complete or return success or failure based
+ * upon the current device state.
+ *
  * Return:
  * SUCCESS as defined in scsi/scsi.h
  * FAILED as defined in scsi/scsi.h
@@ -2074,7 +2079,8 @@ static int cxlflash_eh_host_reset_handler(struct scsi_cmnd *scp)
  } else
  cfg->state = STATE_NORMAL;
  wake_up_all(&cfg->reset_waitq);
- break;
+ ssleep(1);
+ /* fall through */
  case STATE_RESET:
  wait_event(cfg->reset_waitq, cfg->state != STATE_RESET);
  if (cfg->state == STATE_NORMAL)
@@ -2590,6 +2596,9 @@ out_remove:
  * @pdev: PCI device struct.
  * @state: PCI channel state.
  *
+ * When an EEH occurs during an active reset, wait until the reset is
+ * complete and then take action based upon the device state.
+ *
  * Return: PCI_ERS_RESULT_NEED_RESET or PCI_ERS_RESULT_DISCONNECT
  */
 static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
@@ -2603,6 +2612,10 @@ static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
 
  switch (state) {
  case pci_channel_io_frozen:
+ wait_event(cfg->reset_waitq, cfg->state != STATE_RESET);
+ if (cfg->state == STATE_FAILTERM)
+ return PCI_ERS_RESULT_DISCONNECT;
+
  cfg->state = STATE_RESET;
  scsi_block_requests(cfg->host);
  drain_ioctls(cfg);
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/4][Xenial SRU] scsi: cxlflash: Improve EEH recovery time

Seth Forshee
In reply to this post by Seth Forshee
From: "Matthew R. Ochs" <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1623750

When an EEH occurs during device initialization, the port timeout logic
can cause excessive delays as MMIO reads will fail. Depending on where
they are experienced, these delays can lead to a prolonged reset,
causing an unnecessary triggering of other timeout logic in the SCSI
stack or user applications.

To expedite recovery, the port timeout logic is updated to decay the
timeout at a much faster rate when in the presence of a likely EEH
frozen event.

Signed-off-by: Matthew R. Ochs <[hidden email]>
Acked-by: Uma Krishnan <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit 05dab43230fdc0d14ca885b473a2740fe017ecb1)
Signed-off-by: Seth Forshee <[hidden email]>
---
 drivers/scsi/cxlflash/main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 4ef523505364..42970a40d49b 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -1040,6 +1040,8 @@ static int wait_port_online(__be64 __iomem *fc_regs, u32 delay_us, u32 nretry)
  do {
  msleep(delay_us / 1000);
  status = readq_be(&fc_regs[FC_MTIP_STATUS / 8]);
+ if (status == U64_MAX)
+ nretry /= 2;
  } while ((status & FC_MTIP_STATUS_MASK) != FC_MTIP_STATUS_ONLINE &&
  nretry--);
 
@@ -1071,6 +1073,8 @@ static int wait_port_offline(__be64 __iomem *fc_regs, u32 delay_us, u32 nretry)
  do {
  msleep(delay_us / 1000);
  status = readq_be(&fc_regs[FC_MTIP_STATUS / 8]);
+ if (status == U64_MAX)
+ nretry /= 2;
  } while ((status & FC_MTIP_STATUS_MASK) != FC_MTIP_STATUS_OFFLINE &&
  nretry--);
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/4][Xenial SRU] cxlflash fixes for xenial

Marcelo Henrique Cerri
In reply to this post by Seth Forshee
--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/4][Xenial SRU] cxlflash fixes for xenial

Stefan Bader-2
In reply to this post by Seth Forshee



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED: [PATCH 0/4][Xenial SRU] cxlflash fixes for xenial

Thadeu Lima de Souza Cascardo-3
In reply to this post by Seth Forshee
Applied to xenial master-next branch.

Thanks.
Cascardo.

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team