[PATCH 0/5][Bionic] hisi_sas: driver robustness fixes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/5][Bionic] hisi_sas: driver robustness fixes

dann frazier-4
BugLink: https://bugs.launchpad.net/bugs/1739807

This is a set of fixes that landed in the v4.15 merge window to address issues
with error recovery. All clean cherry picks, except for a trivial header
context adjustment.

Xiang Chen (2):
  scsi: hisi_sas: fix internal abort slot timeout bug
  scsi: hisi_sas: us start_phy in PHY_FUNC_LINK_RESET

Xiaofei Tan (3):
  scsi: hisi_sas: fix NULL check in SMP abort task path
  scsi: hisi_sas: fix the risk of freeing slot twice
  scsi: hisi_sas: complete all tasklets prior to host reset

 drivers/scsi/hisi_sas/hisi_sas.h       |  3 ++-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 25 ++++++++++++++++++++-----
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |  2 +-
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 10 +++-------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  5 +++--
 5 files changed, 29 insertions(+), 16 deletions(-)

--
2.15.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/5][Bionic] scsi: hisi_sas: fix internal abort slot timeout bug

dann frazier-4
From: Xiang Chen <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1739807

When an internal abort times out in hisi_sas_internal_task_abort(),
goto the exit label in and not go through the other task status
checks.

Signed-off-by: Xiang Chen <[hidden email]>
Signed-off-by: John Garry <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit f692a677e2cb0ccab4ef91be97d8fb020cca246d)
Signed-off-by: dann frazier <[hidden email]>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 16664f2e15fb..709d533e0d64 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1471,6 +1471,7 @@ hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
  if (slot)
  slot->task = NULL;
  dev_err(dev, "internal task abort: timeout.\n");
+ goto exit;
  }
  }
 
--
2.15.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/5][Bionic] scsi: hisi_sas: us start_phy in PHY_FUNC_LINK_RESET

dann frazier-4
In reply to this post by dann frazier-4
From: Xiang Chen <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1739807

When a PHY_FUNC_LINK_RESET is issued, we need to fill the transport
identify_frame to SAS controller before the PHYs are enabled.

Without this, we may find that if a PHY which belonged to a wideport
before the reset may generate a new port id.

Signed-off-by: Xiang Chen <[hidden email]>
Signed-off-by: John Garry <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit 1eb8eeac17ee808b50b422f5ef2e27f5497f82ad)
Signed-off-by: dann frazier <[hidden email]>
---
 drivers/scsi/hisi_sas/hisi_sas.h       | 2 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 2 +-
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +-
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 2 +-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 07f4a4cfbec1..172e19fbf20f 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -198,7 +198,7 @@ struct hisi_sas_hw {
  int (*slot_complete)(struct hisi_hba *hisi_hba,
      struct hisi_sas_slot *slot);
  void (*phys_init)(struct hisi_hba *hisi_hba);
- void (*phy_enable)(struct hisi_hba *hisi_hba, int phy_no);
+ void (*phy_start)(struct hisi_hba *hisi_hba, int phy_no);
  void (*phy_disable)(struct hisi_hba *hisi_hba, int phy_no);
  void (*phy_hard_reset)(struct hisi_hba *hisi_hba, int phy_no);
  void (*get_events)(struct hisi_hba *hisi_hba, int phy_no);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 709d533e0d64..4a5eface9e01 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -762,7 +762,7 @@ static int hisi_sas_control_phy(struct asd_sas_phy *sas_phy, enum phy_func func,
  case PHY_FUNC_LINK_RESET:
  hisi_hba->hw->phy_disable(hisi_hba, phy_no);
  msleep(100);
- hisi_hba->hw->phy_enable(hisi_hba, phy_no);
+ hisi_hba->hw->phy_start(hisi_hba, phy_no);
  break;
 
  case PHY_FUNC_DISABLE:
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 08eca20b0b81..00b5ee4e49d3 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1857,7 +1857,7 @@ static const struct hisi_sas_hw hisi_sas_v1_hw = {
  .start_delivery = start_delivery_v1_hw,
  .slot_complete = slot_complete_v1_hw,
  .phys_init = phys_init_v1_hw,
- .phy_enable = enable_phy_v1_hw,
+ .phy_start = start_phy_v1_hw,
  .phy_disable = disable_phy_v1_hw,
  .phy_hard_reset = phy_hard_reset_v1_hw,
  .phy_set_linkrate = phy_set_linkrate_v1_hw,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 779af979b6db..7f71528ec59c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3460,7 +3460,7 @@ static const struct hisi_sas_hw hisi_sas_v2_hw = {
  .start_delivery = start_delivery_v2_hw,
  .slot_complete = slot_complete_v2_hw,
  .phys_init = phys_init_v2_hw,
- .phy_enable = enable_phy_v2_hw,
+ .phy_start = start_phy_v2_hw,
  .phy_disable = disable_phy_v2_hw,
  .phy_hard_reset = phy_hard_reset_v2_hw,
  .get_events = phy_get_events_v2_hw,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 2e5fa9717be8..1b50c6eda226 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -1793,7 +1793,7 @@ static const struct hisi_sas_hw hisi_sas_v3_hw = {
  .start_delivery = start_delivery_v3_hw,
  .slot_complete = slot_complete_v3_hw,
  .phys_init = phys_init_v3_hw,
- .phy_enable = enable_phy_v3_hw,
+ .phy_start = start_phy_v3_hw,
  .phy_disable = disable_phy_v3_hw,
  .phy_hard_reset = phy_hard_reset_v3_hw,
  .phy_get_max_linkrate = phy_get_max_linkrate_v3_hw,
--
2.15.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/5][Bionic] scsi: hisi_sas: fix NULL check in SMP abort task path

dann frazier-4
In reply to this post by dann frazier-4
From: Xiaofei Tan <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1739807

This patch adds a NULL check of task->lldd_task before freeing the
slot in SMP path.

This is to guard against the scenario of the slot being freed during
the from the preceding internal abort.

Signed-off-by: Xiaofei Tan <[hidden email]>
Signed-off-by: John Garry <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit 378c233bcb21dfb2d9c2548b9a1fa6a8d35c78dd)
Signed-off-by: dann frazier <[hidden email]>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 4a5eface9e01..3384e7a5d4b1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1160,7 +1160,7 @@ static int hisi_sas_abort_task(struct sas_task *task)
 
  rc = hisi_sas_internal_task_abort(hisi_hba, device,
      HISI_SAS_INT_ABT_CMD, tag);
- if (rc == TMF_RESP_FUNC_FAILED) {
+ if (rc == TMF_RESP_FUNC_FAILED && task->lldd_task) {
  spin_lock_irqsave(&hisi_hba->lock, flags);
  hisi_sas_do_release_task(hisi_hba, task, slot);
  spin_unlock_irqrestore(&hisi_hba->lock, flags);
--
2.15.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/5][Bionic] scsi: hisi_sas: fix the risk of freeing slot twice

dann frazier-4
In reply to this post by dann frazier-4
From: Xiaofei Tan <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1739807

The function hisi_sas_slot_task_free() is used to free the slot and do
tidy-up of LLDD resources. The LLDD generally should know the state of
a slot and decide when to free it, and it should only be done once.

For some scenarios, we really don't know the state, like when TMF
timeout. In this case, we check task->lldd_task before calling
hisi_sas_slot_task_free().

However, we may miss some scenarios when we should also check
task->lldd_task, and it is not SMP safe to check task->lldd_task as we
don't protect it within spin lock.

This patch is to fix this risk of freeing slot twice, as follows:

  1. Check task->lldd_task in the hisi_sas_slot_task_free(), and give
     up freeing of this time if task->lldd_task is NULL.

  2. Set slot->buf to NULL after it is freed.

Signed-off-by: Xiaofei Tan <[hidden email]>
Signed-off-by: John Garry <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(cherry picked from commit 6ba0fbc35aa9f3bc8c12be3b4047055c9ce2ac92)
Signed-off-by: dann frazier <[hidden email]>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 3384e7a5d4b1..e2ed753e76dd 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -185,13 +185,16 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
  struct domain_device *device = task->dev;
  struct hisi_sas_device *sas_dev = device->lldd_dev;
 
+ if (!task->lldd_task)
+ return;
+
+ task->lldd_task = NULL;
+
  if (!sas_protocol_ata(task->task_proto))
  if (slot->n_elem)
  dma_unmap_sg(dev, task->scatter, slot->n_elem,
      task->data_dir);
 
- task->lldd_task = NULL;
-
  if (sas_dev)
  atomic64_dec(&sas_dev->running_req);
  }
@@ -199,8 +202,8 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
  if (slot->buf)
  dma_pool_free(hisi_hba->buffer_pool, slot->buf, slot->buf_dma);
 
-
  list_del_init(&slot->entry);
+ slot->buf = NULL;
  slot->task = NULL;
  slot->port = NULL;
  hisi_sas_slot_index_free(hisi_hba, slot->idx);
--
2.15.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 5/5][Bionic] scsi: hisi_sas: complete all tasklets prior to host reset

dann frazier-4
In reply to this post by dann frazier-4
From: Xiaofei Tan <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1739807

The CQ event is handled in tasklet context, and it could be delayed if
the system loading is high.

It is possible to run into some problems when executing a host reset
when cq_tasklet_vx_hw() is being executed.

So, prior to host reset, execute tasklet_kill() to ensure that all CQ
tasklets are complete.

Besides, as the function hisi_sas_wait_tasklets_done() is added to do
tasklet_kill(), this patch refactors some code where tasklet_kill() is
used.

Signed-off-by: Xiaofei Tan <[hidden email]>
Signed-off-by: John Garry <[hidden email]>
Signed-off-by: Martin K. Petersen <[hidden email]>
(backported from commit 571295f8055c0b69c9911021ae6cf1a6973cf517)
[dannf: resolved trivial merge conflict in hisi_sas.h]
Signed-off-by: dann frazier <[hidden email]>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  1 +
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 11 +++++++++++
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |  8 ++------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  3 ++-
 4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 172e19fbf20f..f3a767e98cee 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -420,4 +420,5 @@ extern void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba,
     struct sas_task *task,
     struct hisi_sas_slot *slot);
 extern void hisi_sas_init_mem(struct hisi_hba *hisi_hba);
+extern void hisi_sas_kill_tasklets(struct hisi_hba *hisi_hba);
 #endif
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index e2ed753e76dd..74b82612c3e1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1546,6 +1546,17 @@ void hisi_sas_phy_down(struct hisi_hba *hisi_hba, int phy_no, int rdy)
 }
 EXPORT_SYMBOL_GPL(hisi_sas_phy_down);
 
+void hisi_sas_kill_tasklets(struct hisi_hba *hisi_hba)
+{
+ int i;
+
+ for (i = 0; i < hisi_hba->queue_count; i++) {
+ struct hisi_sas_cq *cq = &hisi_hba->cq[i];
+
+ tasklet_kill(&cq->tasklet);
+ }
+}
+EXPORT_SYMBOL_GPL(hisi_sas_kill_tasklets);
 
 struct scsi_transport_template *hisi_sas_stt;
 EXPORT_SYMBOL_GPL(hisi_sas_stt);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 7f71528ec59c..93b33766df3c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3410,6 +3410,7 @@ static int soft_reset_v2_hw(struct hisi_hba *hisi_hba)
 
  interrupt_disable_v2_hw(hisi_hba);
  hisi_sas_write32(hisi_hba, DLVRY_QUEUE_ENABLE, 0x0);
+ hisi_sas_kill_tasklets(hisi_hba);
 
  hisi_sas_stop_phys(hisi_hba);
 
@@ -3493,16 +3494,11 @@ static int hisi_sas_v2_remove(struct platform_device *pdev)
 {
  struct sas_ha_struct *sha = platform_get_drvdata(pdev);
  struct hisi_hba *hisi_hba = sha->lldd_ha;
- int i;
 
  if (timer_pending(&hisi_hba->timer))
  del_timer(&hisi_hba->timer);
 
- for (i = 0; i < hisi_hba->queue_count; i++) {
- struct hisi_sas_cq *cq = &hisi_hba->cq[i];
-
- tasklet_kill(&cq->tasklet);
- }
+ hisi_sas_kill_tasklets(hisi_hba);
 
  return hisi_sas_remove(pdev);
 }
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 1b50c6eda226..fe32e0888588 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -1757,6 +1757,7 @@ static int soft_reset_v3_hw(struct hisi_hba *hisi_hba)
 
  interrupt_disable_v3_hw(hisi_hba);
  hisi_sas_write32(hisi_hba, DLVRY_QUEUE_ENABLE, 0x0);
+ hisi_sas_kill_tasklets(hisi_hba);
 
  hisi_sas_stop_phys(hisi_hba);
 
@@ -1964,7 +1965,6 @@ hisi_sas_v3_destroy_irqs(struct pci_dev *pdev, struct hisi_hba *hisi_hba)
  struct hisi_sas_cq *cq = &hisi_hba->cq[i];
 
  free_irq(pci_irq_vector(pdev, i+16), cq);
- tasklet_kill(&cq->tasklet);
  }
  pci_free_irq_vectors(pdev);
 }
@@ -1980,6 +1980,7 @@ static void hisi_sas_v3_remove(struct pci_dev *pdev)
  sas_remove_host(sha->core.shost);
 
  hisi_sas_v3_destroy_irqs(pdev, hisi_hba);
+ hisi_sas_kill_tasklets(hisi_hba);
  pci_release_regions(pdev);
  pci_disable_device(pdev);
  hisi_sas_free(hisi_hba);
--
2.15.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [PATCH 0/5][Bionic] hisi_sas: driver robustness fixes

Seth Forshee
In reply to this post by dann frazier-4
On Fri, Dec 22, 2017 at 02:50:35PM -0700, dann frazier wrote:
> BugLink: https://bugs.launchpad.net/bugs/1739807
>
> This is a set of fixes that landed in the v4.15 merge window to address issues
> with error recovery. All clean cherry picks, except for a trivial header
> context adjustment.

"scsi: hisi_sas: fix the risk of freeing slot twice" had already come in
from upstream stable. All other patches applied to bionic/master-next,
thanks!

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team