[PATCH 0/3][Yakkety SRU] cxlflash fixes for yakkety

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/3][Yakkety SRU] cxlflash fixes for yakkety

Seth Forshee
BugLink: http://bugs.launchpad.net/bugs/1623750

Fixes for cxlflash for yakkety. All patches are clean cherry picks from
4.9.

Thanks,
Seth

Matthew R. Ochs (2):
  scsi: cxlflash: Fix to avoid EEH and host reset collisions
  scsi: cxlflash: Improve EEH recovery time

Uma Krishnan (1):
  scsi: cxlflash: Scan host only after the port is ready for I/O

 drivers/scsi/cxlflash/main.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/3][Yakkety SRU] scsi: cxlflash: Scan host only after the port is ready for I/O

Seth Forshee
From: Uma Krishnan <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1623750

When a port link is established, the AFU sends a 'link up' interrupt.
After the link is up, corresponding initialization steps are performed
on the card. Following that, when the card is ready for I/O, the AFU
sends 'login succeeded' interrupt. Today, cxlflash invokes
scsi_scan_host() upon receipt of both interrupts.

SCSI commands sent to the port prior to the 'login succeeded' interrupt
will fail with 'port not available' error. This is not desirable.
Moreover, when async_scan is active for the host, subsequent scan calls
are terminated with error. Due to this, the scsi_scan_host() call
performed after 'login succeeded' interrupt could portentially return
error and the devices may not be scanned properly.

To avoid this problem, scsi_scan_host() should be called only after the
'login succeeded' interrupt.

Signed-off-by: Uma Krishnan <[hidden email]>
Acked-by: Matthew R. Ochs <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit bbbfae962b7c221237c0f92547ee0c83f7204747)
Signed-off-by: Seth Forshee <[hidden email]>
---
 drivers/scsi/cxlflash/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 228b99ee0483..4c2559adf723 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -1181,7 +1181,7 @@ static const struct asyc_intr_info ainfo[] = {
  {SISL_ASTATUS_FC0_LOGI_F, "login failed", 0, CLR_FC_ERROR},
  {SISL_ASTATUS_FC0_LOGI_S, "login succeeded", 0, SCAN_HOST},
  {SISL_ASTATUS_FC0_LINK_DN, "link down", 0, 0},
- {SISL_ASTATUS_FC0_LINK_UP, "link up", 0, SCAN_HOST},
+ {SISL_ASTATUS_FC0_LINK_UP, "link up", 0, 0},
  {SISL_ASTATUS_FC1_OTHER, "other error", 1, CLR_FC_ERROR | LINK_RESET},
  {SISL_ASTATUS_FC1_LOGO, "target initiated LOGO", 1, 0},
  {SISL_ASTATUS_FC1_CRC_T, "CRC threshold exceeded", 1, LINK_RESET},
@@ -1189,7 +1189,7 @@ static const struct asyc_intr_info ainfo[] = {
  {SISL_ASTATUS_FC1_LOGI_F, "login failed", 1, CLR_FC_ERROR},
  {SISL_ASTATUS_FC1_LOGI_S, "login succeeded", 1, SCAN_HOST},
  {SISL_ASTATUS_FC1_LINK_DN, "link down", 1, 0},
- {SISL_ASTATUS_FC1_LINK_UP, "link up", 1, SCAN_HOST},
+ {SISL_ASTATUS_FC1_LINK_UP, "link up", 1, 0},
  {0x0, "", 0, 0} /* terminator */
 };
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/3][Yakkety SRU] scsi: cxlflash: Fix to avoid EEH and host reset collisions

Seth Forshee
In reply to this post by Seth Forshee
From: "Matthew R. Ochs" <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1623750

The EEH reset handler is ignorant to the current state of the driver
when processing a frozen event and initiating a device reset. This can
be an issue if an EEH event occurs while a user or stack initiated reset
is executing. More specifically, if an EEH occurs while the SCSI host
reset handler is active, the reset initiated by the EEH thread will
likely collide with the host reset thread. This can leave the device in
an inconsistent state, or worse, cause a system crash.

As a remedy, the EEH handler is updated to evaluate the device state and
take appropriate action (proceed, wait, or disconnect host). The host
reset handler is also updated to handle situations where an EEH occurred
during a host reset. In such situations, the host reset handler will
delay reporting back a success to give the EEH reset an opportunity to
complete.

Signed-off-by: Matthew R. Ochs <[hidden email]>
Acked-by: Uma Krishnan <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit 1d3324c382b1a617eb567e3650dcb51f22dfec9a)
Signed-off-by: Seth Forshee <[hidden email]>
---
 drivers/scsi/cxlflash/main.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 4c2559adf723..4ef523505364 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -2042,6 +2042,11 @@ retry:
  * cxlflash_eh_host_reset_handler() - reset the host adapter
  * @scp: SCSI command from stack identifying host.
  *
+ * Following a reset, the state is evaluated again in case an EEH occurred
+ * during the reset. In such a scenario, the host reset will either yield
+ * until the EEH recovery is complete or return success or failure based
+ * upon the current device state.
+ *
  * Return:
  * SUCCESS as defined in scsi/scsi.h
  * FAILED as defined in scsi/scsi.h
@@ -2074,7 +2079,8 @@ static int cxlflash_eh_host_reset_handler(struct scsi_cmnd *scp)
  } else
  cfg->state = STATE_NORMAL;
  wake_up_all(&cfg->reset_waitq);
- break;
+ ssleep(1);
+ /* fall through */
  case STATE_RESET:
  wait_event(cfg->reset_waitq, cfg->state != STATE_RESET);
  if (cfg->state == STATE_NORMAL)
@@ -2590,6 +2596,9 @@ out_remove:
  * @pdev: PCI device struct.
  * @state: PCI channel state.
  *
+ * When an EEH occurs during an active reset, wait until the reset is
+ * complete and then take action based upon the device state.
+ *
  * Return: PCI_ERS_RESULT_NEED_RESET or PCI_ERS_RESULT_DISCONNECT
  */
 static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
@@ -2603,6 +2612,10 @@ static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
 
  switch (state) {
  case pci_channel_io_frozen:
+ wait_event(cfg->reset_waitq, cfg->state != STATE_RESET);
+ if (cfg->state == STATE_FAILTERM)
+ return PCI_ERS_RESULT_DISCONNECT;
+
  cfg->state = STATE_RESET;
  scsi_block_requests(cfg->host);
  drain_ioctls(cfg);
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/3][Yakkety SRU] scsi: cxlflash: Improve EEH recovery time

Seth Forshee
In reply to this post by Seth Forshee
From: "Matthew R. Ochs" <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1623750

When an EEH occurs during device initialization, the port timeout logic
can cause excessive delays as MMIO reads will fail. Depending on where
they are experienced, these delays can lead to a prolonged reset,
causing an unnecessary triggering of other timeout logic in the SCSI
stack or user applications.

To expedite recovery, the port timeout logic is updated to decay the
timeout at a much faster rate when in the presence of a likely EEH
frozen event.

Signed-off-by: Matthew R. Ochs <[hidden email]>
Acked-by: Uma Krishnan <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit 05dab43230fdc0d14ca885b473a2740fe017ecb1)
Signed-off-by: Seth Forshee <[hidden email]>
---
 drivers/scsi/cxlflash/main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 4ef523505364..42970a40d49b 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -1040,6 +1040,8 @@ static int wait_port_online(__be64 __iomem *fc_regs, u32 delay_us, u32 nretry)
  do {
  msleep(delay_us / 1000);
  status = readq_be(&fc_regs[FC_MTIP_STATUS / 8]);
+ if (status == U64_MAX)
+ nretry /= 2;
  } while ((status & FC_MTIP_STATUS_MASK) != FC_MTIP_STATUS_ONLINE &&
  nretry--);
 
@@ -1071,6 +1073,8 @@ static int wait_port_offline(__be64 __iomem *fc_regs, u32 delay_us, u32 nretry)
  do {
  msleep(delay_us / 1000);
  status = readq_be(&fc_regs[FC_MTIP_STATUS / 8]);
+ if (status == U64_MAX)
+ nretry /= 2;
  } while ((status & FC_MTIP_STATUS_MASK) != FC_MTIP_STATUS_OFFLINE &&
  nretry--);
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/3][Yakkety SRU] cxlflash fixes for yakkety

Marcelo Henrique Cerri
In reply to this post by Seth Forshee
--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/3][Yakkety SRU] cxlflash fixes for yakkety

Stefan Bader-2
In reply to this post by Seth Forshee



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED: [PATCH 0/3][Yakkety SRU] cxlflash fixes for yakkety

Thadeu Lima de Souza Cascardo-3
In reply to this post by Seth Forshee
Applied to yakkety master-next branch.

Thanks.
Cascardo.

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team