[PATCH 0/2][SRU][OEM-5.6] S3 stress test fails with amdgpu errors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/2][SRU][OEM-5.6] S3 stress test fails with amdgpu errors

AceLan Kao
From: "Chia-Lin Kao (AceLan)" <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1909453


[Impact]
It fails to resume from S3 with below error messages
   Nov 17 03:15:27 u kernel: amdgpu 0000:04:00.0:[drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring vcn_dec test failed (-110)
   Nov 17 03:15:27 u kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] ERROR resume of IP block <vcn_v1_0> failed -110
   Nov 17 03:15:27 u kernel: [drm:amdgpu_device_resume [amdgpu]] ERROR amdgpu_device_ip_resume failed (-110).

[Fix]
AMD provides the 2 commits in 5.9-rc1 to fix this issue, and groovy has
these commits from stable update.
   429f3d24384b drm/amdgpu: asd function needs to be unloaded in suspend phase
   90937420c44f drm/amdgpu: add TMR destory function for psp

[Test]
Verified on problematic Dell machineļ¼Œand it passes 500 S3 test.

[Where problems could occur]
TMR will be created after resumed, so it should be destroyed while entering
S3. The patch does what is required, should be pretty safe to include this
commit.

Huang Rui (2):
  drm/amdgpu: asd function needs to be unloaded in suspend phase
  drm/amdgpu: add TMR destory function for psp

 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 68 ++++++++++++++++++++++---
 1 file changed, 60 insertions(+), 8 deletions(-)

--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/2][SRU][OEM-5.6] drm/amdgpu: asd function needs to be unloaded in suspend phase

AceLan Kao
From: Huang Rui <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1909453

Unload ASD function in suspend phase.

Signed-off-by: Huang Rui <[hidden email]>
Reviewed-by: Alex Deucher <[hidden email]>
Signed-off-by: Alex Deucher <[hidden email]>
(cherry picked from commit 429f3d24384b049925771c56b5bc2850cede958f)
Signed-off-by: Chia-Lin Kao (AceLan) <[hidden email]>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 146f96661b6b..d50f29b9a64e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1580,6 +1580,12 @@ static int psp_suspend(void *handle)
  }
  }
 
+ ret = psp_asd_unload(psp);
+ if (ret) {
+ DRM_ERROR("Failed to unload asd\n");
+ return ret;
+ }
+
  ret = psp_ring_stop(psp, PSP_RING_TYPE__KM);
  if (ret) {
  DRM_ERROR("PSP ring stop failed\n");
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/2][SRU][OEM-5.6] drm/amdgpu: add TMR destory function for psp

AceLan Kao
In reply to this post by AceLan Kao
From: Huang Rui <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1909453

TMR is required to be destoried with GFX_CMD_ID_DESTROY_TMR while the
system goes to suspend. Otherwise, PSP may return the failure state
(0xFFFF007) on Gfx-2-PSP command GFX_CMD_ID_SETUP_TMR after do multiple
times suspend/resume.

Signed-off-by: Huang Rui <[hidden email]>
Reviewed-by: Alex Deucher <[hidden email]>
Signed-off-by: Alex Deucher <[hidden email]>
(backported from commit 90937420c44f7535fd3ac4341a48c4c4dd1fe190)
Signed-off-by: Chia-Lin Kao (AceLan) <[hidden email]>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 62 +++++++++++++++++++++----
 1 file changed, 54 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index d50f29b9a64e..849d8588c1c7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -318,6 +318,52 @@ static int psp_tmr_load(struct psp_context *psp)
  return ret;
 }
 
+static void psp_prep_tmr_unload_cmd_buf(struct psp_context *psp,
+ struct psp_gfx_cmd_resp *cmd)
+{
+ if (amdgpu_sriov_vf(psp->adev))
+ cmd->cmd_id = GFX_CMD_ID_DESTROY_VMR;
+ else
+ cmd->cmd_id = GFX_CMD_ID_DESTROY_TMR;
+}
+
+static int psp_tmr_unload(struct psp_context *psp)
+{
+ int ret;
+ struct psp_gfx_cmd_resp *cmd;
+
+ cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
+ if (!cmd)
+ return -ENOMEM;
+
+ psp_prep_tmr_unload_cmd_buf(psp, cmd);
+ DRM_INFO("free PSP TMR buffer\n");
+
+ ret = psp_cmd_submit_buf(psp, NULL, cmd,
+ psp->fence_buf_mc_addr);
+
+ kfree(cmd);
+
+ return ret;
+}
+
+static int psp_tmr_terminate(struct psp_context *psp)
+{
+ int ret;
+ void *tmr_buf;
+ void **pptr;
+
+ ret = psp_tmr_unload(psp);
+ if (ret)
+ return ret;
+
+ /* free TMR memory buffer */
+ pptr = amdgpu_sriov_vf(psp->adev) ? &tmr_buf : NULL;
+ amdgpu_bo_free_kernel(&psp->tmr_bo, &psp->tmr_mc_addr, pptr);
+
+ return 0;
+}
+
 static void psp_prep_asd_load_cmd_buf(struct psp_gfx_cmd_resp *cmd,
  uint64_t asd_mc, uint32_t size)
 {
@@ -1515,12 +1561,7 @@ static int psp_hw_fini(void *handle)
 {
  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
  struct psp_context *psp = &adev->psp;
- void *tmr_buf;
- void **pptr;
-
- if (adev->gmc.xgmi.num_physical_nodes > 1 &&
-    psp->xgmi_context.initialized == 1)
-                psp_xgmi_terminate(psp);
+ int ret;
 
  if (psp->adev->psp.ta_fw) {
  psp_ras_terminate(psp);
@@ -1530,10 +1571,9 @@ static int psp_hw_fini(void *handle)
 
  psp_asd_unload(psp);
 
+ psp_tmr_terminate(psp);
  psp_ring_destroy(psp, PSP_RING_TYPE__KM);
 
- pptr = amdgpu_sriov_vf(psp->adev) ? &tmr_buf : NULL;
- amdgpu_bo_free_kernel(&psp->tmr_bo, &psp->tmr_mc_addr, pptr);
  amdgpu_bo_free_kernel(&psp->fw_pri_bo,
       &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
  amdgpu_bo_free_kernel(&psp->fence_buf_bo,
@@ -1586,6 +1626,12 @@ static int psp_suspend(void *handle)
  return ret;
  }
 
+ ret = psp_tmr_terminate(psp);
+ if (ret) {
+ DRM_ERROR("Falied to terminate tmr\n");
+ return ret;
+ }
+
  ret = psp_ring_stop(psp, PSP_RING_TYPE__KM);
  if (ret) {
  DRM_ERROR("PSP ring stop failed\n");
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED[OEM-5.6]: [PATCH 0/2][SRU][OEM-5.6] S3 stress test fails with amdgpu errors

AceLan Kao
In reply to this post by AceLan Kao