[Patch 0/1][SRU][E] Power9 fix MCE handling for huge pages.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Patch 0/1][SRU][E] Power9 fix MCE handling for huge pages.

Manoj Iyer
Please consider the following patch to fix the bug reported in
https://bugs.launchpad.net/bugs/1848127. Where the Power9 system
is rendered unresponsive after injecting MCE errors.

The patch identified by IBM fixes this issue and the test kernel
avaialable in ppa:ubuntu-power-triage/lp1848127 was tested by
IBM and reported to fix the issue.

The patch was cleanly cherry-picked from linus's tree and applied to
Eoan.



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH] powerpc/mce: Fix MCE handling for huge pages

Manoj Iyer
From: Balbir Singh <[hidden email]>

The current code would fail on huge pages addresses, since the shift would
be incorrect. Use the correct page shift value returned by
__find_linux_pte() to get the correct physical address. The code is more
generic and can handle both regular and compound pages.

BugLink: https://bugs.launchpad.net/bugs/1848127

Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
Signed-off-by: Balbir Singh <[hidden email]>
[[hidden email]: Fixup pseries_do_memory_failure()]
Signed-off-by: Reza Arbab <[hidden email]>
Tested-by: Mahesh Salgaonkar <[hidden email]>
Signed-off-by: Santosh Sivaraj <[hidden email]>
Cc: [hidden email] # v4.15+
Signed-off-by: Michael Ellerman <[hidden email]>
Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@...
(cherry picked from commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee)
Signed-off-by: Manoj Iyer <[hidden email]>
---
 arch/powerpc/kernel/mce_power.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index a814d2dfb5b0..714a98e0927f 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -26,6 +26,7 @@
 unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
 {
  pte_t *ptep;
+ unsigned int shift;
  unsigned long flags;
  struct mm_struct *mm;
 
@@ -35,13 +36,18 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
  mm = &init_mm;
 
  local_irq_save(flags);
- if (mm == current->mm)
- ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL);
- else
- ptep = find_init_mm_pte(addr, NULL);
+ ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift);
  local_irq_restore(flags);
+
  if (!ptep || pte_special(*ptep))
  return ULONG_MAX;
+
+ if (shift > PAGE_SHIFT) {
+ unsigned long rpnmask = (1ul << shift) - PAGE_SIZE;
+
+ return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask)));
+ }
+
  return pte_pfn(*ptep);
 }
 
@@ -344,7 +350,7 @@ static const struct mce_derror_table mce_p9_derror_table[] = {
   MCE_INITIATOR_CPU,   MCE_SEV_SEVERE, true },
 { 0, false, 0, 0, 0, 0, 0 } };
 
-static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr,
+static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr,
  uint64_t *phys_addr)
 {
  /*
@@ -541,7 +547,8 @@ static int mce_handle_derror(struct pt_regs *regs,
  * kernel/exception-64s.h
  */
  if (get_paca()->in_mce < MAX_MCE_DEPTH)
- mce_find_instr_ea_and_pfn(regs, addr, phys_addr);
+ mce_find_instr_ea_and_phys(regs, addr,
+   phys_addr);
  }
  found = 1;
  }
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Patch 0/1][SRU][E] Power9 fix MCE handling for huge pages.

Connor Kuehl
In reply to this post by Manoj Iyer
On 10/16/19 12:50 PM, Manoj Iyer wrote:
> Please consider the following patch to fix the bug reported in
> https://bugs.launchpad.net/bugs/1848127. Where the Power9 system
> is rendered unresponsive after injecting MCE errors.

Hi Manoj,

I see there are tasks for Bionic and Disco in the bug report. Are those
patches forthcoming?

- Connor


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH] powerpc/mce: Fix MCE handling for huge pages

Kleber Souza
In reply to this post by Manoj Iyer
On 16.10.19 21:50, Manoj Iyer wrote:

> From: Balbir Singh <[hidden email]>
>
> The current code would fail on huge pages addresses, since the shift would
> be incorrect. Use the correct page shift value returned by
> __find_linux_pte() to get the correct physical address. The code is more
> generic and can handle both regular and compound pages.
>
> BugLink: https://bugs.launchpad.net/bugs/1848127
>
> Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
> Signed-off-by: Balbir Singh <[hidden email]>
> [[hidden email]: Fixup pseries_do_memory_failure()]
> Signed-off-by: Reza Arbab <[hidden email]>
> Tested-by: Mahesh Salgaonkar <[hidden email]>
> Signed-off-by: Santosh Sivaraj <[hidden email]>
> Cc: [hidden email] # v4.15+
> Signed-off-by: Michael Ellerman <[hidden email]>
> Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@...
> (cherry picked from commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee)
> Signed-off-by: Manoj Iyer <[hidden email]>

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

> ---
>  arch/powerpc/kernel/mce_power.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
> index a814d2dfb5b0..714a98e0927f 100644
> --- a/arch/powerpc/kernel/mce_power.c
> +++ b/arch/powerpc/kernel/mce_power.c
> @@ -26,6 +26,7 @@
>  unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
>  {
>   pte_t *ptep;
> + unsigned int shift;
>   unsigned long flags;
>   struct mm_struct *mm;
>  
> @@ -35,13 +36,18 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
>   mm = &init_mm;
>  
>   local_irq_save(flags);
> - if (mm == current->mm)
> - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL);
> - else
> - ptep = find_init_mm_pte(addr, NULL);
> + ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift);
>   local_irq_restore(flags);
> +
>   if (!ptep || pte_special(*ptep))
>   return ULONG_MAX;
> +
> + if (shift > PAGE_SHIFT) {
> + unsigned long rpnmask = (1ul << shift) - PAGE_SIZE;
> +
> + return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask)));
> + }
> +
>   return pte_pfn(*ptep);
>  }
>  
> @@ -344,7 +350,7 @@ static const struct mce_derror_table mce_p9_derror_table[] = {
>    MCE_INITIATOR_CPU,   MCE_SEV_SEVERE, true },
>  { 0, false, 0, 0, 0, 0, 0 } };
>  
> -static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr,
> +static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr,
>   uint64_t *phys_addr)
>  {
>   /*
> @@ -541,7 +547,8 @@ static int mce_handle_derror(struct pt_regs *regs,
>   * kernel/exception-64s.h
>   */
>   if (get_paca()->in_mce < MAX_MCE_DEPTH)
> - mce_find_instr_ea_and_pfn(regs, addr, phys_addr);
> + mce_find_instr_ea_and_phys(regs, addr,
> +   phys_addr);
>   }
>   found = 1;
>   }
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH] powerpc/mce: Fix MCE handling for huge pages

Stefan Bader-2
In reply to this post by Manoj Iyer
On 16.10.19 21:50, Manoj Iyer wrote:

> From: Balbir Singh <[hidden email]>
>
> The current code would fail on huge pages addresses, since the shift would
> be incorrect. Use the correct page shift value returned by
> __find_linux_pte() to get the correct physical address. The code is more
> generic and can handle both regular and compound pages.
>
> BugLink: https://bugs.launchpad.net/bugs/1848127
>
> Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
> Signed-off-by: Balbir Singh <[hidden email]>
> [[hidden email]: Fixup pseries_do_memory_failure()]
> Signed-off-by: Reza Arbab <[hidden email]>
> Tested-by: Mahesh Salgaonkar <[hidden email]>
> Signed-off-by: Santosh Sivaraj <[hidden email]>
> Cc: [hidden email] # v4.15+
> Signed-off-by: Michael Ellerman <[hidden email]>
> Link: https://lore.kernel.org/r/20190820081352.8641-3-santosh@...
> (cherry picked from commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee)
> Signed-off-by: Manoj Iyer <[hidden email]>
Acked-by: Stefan Bader <[hidden email]>

> ---
>  arch/powerpc/kernel/mce_power.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
> index a814d2dfb5b0..714a98e0927f 100644
> --- a/arch/powerpc/kernel/mce_power.c
> +++ b/arch/powerpc/kernel/mce_power.c
> @@ -26,6 +26,7 @@
>  unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
>  {
>   pte_t *ptep;
> + unsigned int shift;
>   unsigned long flags;
>   struct mm_struct *mm;
>  
> @@ -35,13 +36,18 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
>   mm = &init_mm;
>  
>   local_irq_save(flags);
> - if (mm == current->mm)
> - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL);
> - else
> - ptep = find_init_mm_pte(addr, NULL);
> + ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift);
>   local_irq_restore(flags);
> +
>   if (!ptep || pte_special(*ptep))
>   return ULONG_MAX;
> +
> + if (shift > PAGE_SHIFT) {
> + unsigned long rpnmask = (1ul << shift) - PAGE_SIZE;
> +
> + return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask)));
> + }
> +
>   return pte_pfn(*ptep);
>  }
>  
> @@ -344,7 +350,7 @@ static const struct mce_derror_table mce_p9_derror_table[] = {
>    MCE_INITIATOR_CPU,   MCE_SEV_SEVERE, true },
>  { 0, false, 0, 0, 0, 0, 0 } };
>  
> -static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr,
> +static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr,
>   uint64_t *phys_addr)
>  {
>   /*
> @@ -541,7 +547,8 @@ static int mce_handle_derror(struct pt_regs *regs,
>   * kernel/exception-64s.h
>   */
>   if (get_paca()->in_mce < MAX_MCE_DEPTH)
> - mce_find_instr_ea_and_pfn(regs, addr, phys_addr);
> + mce_find_instr_ea_and_phys(regs, addr,
> +   phys_addr);
>   }
>   found = 1;
>   }
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

NAK: [Patch 0/1][SRU][E] Power9 fix MCE handling for huge pages.

Kleber Souza
In reply to this post by Manoj Iyer
On 16.10.19 21:50, Manoj Iyer wrote:

> Please consider the following patch to fix the bug reported in
> https://bugs.launchpad.net/bugs/1848127. Where the Power9 system
> is rendered unresponsive after injecting MCE errors.
>
> The patch identified by IBM fixes this issue and the test kernel
> avaialable in ppa:ubuntu-power-triage/lp1848127 was tested by
> IBM and reported to fix the issue.
>
> The patch was cleanly cherry-picked from linus's tree and applied to
> Eoan.
>
>
>

This patch has already been applied to Eoan for the following
upstream stable update:

Eoan update: v5.3.6 upstream stable release
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1848039

Thanks,
Kleber

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team