[SRU][Artful][PATCH 0/3] Fixes for LP:1776211

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][Artful][PATCH 0/3] Fixes for LP:1776211

Joseph Salisbury-3
BugLink: http://bugs.launchpad.net/bugs/1776211

== SRU Justification ==
IBM has requested these three commits in Artful.  In Artful, kdump fails to
capture dump when smt=2 or off.

Including these three commits allows kdump to work properly.

== Fixes ==
4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path")
04b9c96eae72 ("powerpc/crash: Remove the test for cpu_online in the IPI callback")
4552d128c26e ("powerpc: System reset avoid interleaving oops using die synchronisation")

== Regression Potential ==
Low.  Fixes are limited to powerpc.

== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

Balbir Singh (1):
  powerpc/crash: Remove the test for cpu_online in the IPI callback

Nicholas Piggin (2):
  powerpc: Do not send system reset request through the oops path
  powerpc: System reset avoid interleaving oops using die
    synchronisation

 arch/powerpc/kernel/crash.c |  3 ---
 arch/powerpc/kernel/traps.c | 45 ++++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 18 deletions(-)

--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Artful][PATCH 1/3] powerpc: Do not send system reset request through the oops path

Joseph Salisbury-3
From: Nicholas Piggin <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1776211

A system reset is a request to crash / debug the system rather than
necessarily caused by encountering a BUG. So there is no need to
serialize all CPUs behind the die lock, adding taints to all
subsequent traces beyond the first, breaking console locks, etc.

The system reset is NMI context which has its own printk buffers to
prevent output being interleaved. Then it's better to have all
secondaries print out their debug as quickly as possible and the
primary will flush out all printk buffers during panic().

So remove the 0x100 path from die, and move it into system_reset. Name
the crash/dump reasons "System Reset".

This gives "not tained" traces when crashing an untainted kernel. It
also gives the panic reason as "System Reset" as opposed to "Fatal
exception in interrupt" (or "die oops" for fadump).

Signed-off-by: Nicholas Piggin <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 4388c9b3a6ee7d6afc36c8a0bb5579b1606229b5)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 arch/powerpc/kernel/traps.c | 47 ++++++++++++++++++++++++++++++---------------
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 5fbe81d..8332c77e 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -163,21 +163,9 @@ static void oops_end(unsigned long flags, struct pt_regs *regs,
 
  crash_fadump(regs, "die oops");
 
- /*
- * A system reset (0x100) is a request to dump, so we always send
- * it through the crashdump code.
- */
- if (kexec_should_crash(current) || (TRAP(regs) == 0x100)) {
+ if (kexec_should_crash(current))
  crash_kexec(regs);
 
- /*
- * We aren't the primary crash CPU. We need to send it
- * to a holding pattern to avoid it ending up in the panic
- * code.
- */
- crash_kexec_secondary(regs);
- }
-
  if (!signr)
  return;
 
@@ -295,17 +283,44 @@ void system_reset_exception(struct pt_regs *regs)
  goto out;
  }
 
- die("System Reset", regs, SIGABRT);
+ if (debugger(regs))
+ goto out;
+
+ /*
+ * A system reset is a request to dump, so we always send
+ * it through the crashdump code (if fadump or kdump are
+ * registered).
+ */
+ crash_fadump(regs, "System Reset");
+
+ crash_kexec(regs);
+
+ /*
+ * We aren't the primary crash CPU. We need to send it
+ * to a holding pattern to avoid it ending up in the panic
+ * code.
+ */
+ crash_kexec_secondary(regs);
+
+ /*
+ * No debugger or crash dump registered, print logs then
+ * panic.
+ */
+ __die("System Reset", regs, SIGABRT);
+
+ mdelay(2*MSEC_PER_SEC); /* Wait a little while for others to print */
+ add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
+ nmi_panic(regs, "System Reset");
 
 out:
 #ifdef CONFIG_PPC_BOOK3S_64
  BUG_ON(get_paca()->in_nmi == 0);
  if (get_paca()->in_nmi > 1)
- panic("Unrecoverable nested System Reset");
+ nmi_panic(regs, "Unrecoverable nested System Reset");
 #endif
  /* Must die if the interrupt is not recoverable */
  if (!(regs->msr & MSR_RI))
- panic("Unrecoverable System Reset");
+ nmi_panic(regs, "Unrecoverable System Reset");
 
  if (!nested)
  nmi_exit();
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Artful][PATCH 2/3] powerpc/crash: Remove the test for cpu_online in the IPI callback

Joseph Salisbury-3
In reply to this post by Joseph Salisbury-3
From: Balbir Singh <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1776211

Our check was extra cautious, we've audited crash_send_ipi
and it sends an IPI only to online CPU's. Removal of this
check should have not functional impact on crash kdump.

Signed-off-by: Balbir Singh <[hidden email]>
Reviewed-by: Nicholas Piggin <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 04b9c96eae72d862726f2f4bfcec2078240c33c5)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 arch/powerpc/kernel/crash.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index cbabb5a..29c56ca 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -69,9 +69,6 @@ static void crash_ipi_callback(struct pt_regs *regs)
 
  int cpu = smp_processor_id();
 
- if (!cpu_online(cpu))
- return;
-
  hard_irq_disable();
  if (!cpumask_test_cpu(cpu, &cpus_state_saved)) {
  crash_save_cpu(regs, cpu);
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Artful][PATCH 3/3] powerpc: System reset avoid interleaving oops using die synchronisation

Joseph Salisbury-3
In reply to this post by Joseph Salisbury-3
From: Nicholas Piggin <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1776211

The die() oops path contains a serializing lock to prevent oops
messages from being interleaved. In the case of a system reset
initiated oops (e.g., qemu nmi command), __die was being called
which lacks that synchronisation and oops reports could be
interleaved across CPUs.

A recent patch 4388c9b3a6ee7 ("powerpc: Do not send system reset
request through the oops path") changed this to __die to avoid
the debugger() call, but there is no real harm to calling it twice
if the first time fell through. So go back to using die() here.
This was observed to fix the problem.

Fixes: 4388c9b3a6ee7 ("powerpc: Do not send system reset request through the oops path")
Signed-off-by: Nicholas Piggin <[hidden email]>
Reviewed-by: David Gibson <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 4552d128c26e0f0f27a5bd2fadc24092b8f6c1d7)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 arch/powerpc/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 8332c77e..174524b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -306,7 +306,7 @@ void system_reset_exception(struct pt_regs *regs)
  * No debugger or crash dump registered, print logs then
  * panic.
  */
- __die("System Reset", regs, SIGABRT);
+ die("System Reset", regs, SIGABRT);
 
  mdelay(2*MSEC_PER_SEC); /* Wait a little while for others to print */
  add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Artful][PATCH 0/3] Fixes for LP:1776211

Thadeu Lima de Souza Cascardo-3
In reply to this post by Joseph Salisbury-3
Acked-by: Thadeu Lima de Souza Cascardo <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

NACK: [SRU][Artful][PATCH 0/3] Fixes for LP:1776211

Stefan Bader-2
In reply to this post by Joseph Salisbury-3
On 13.06.2018 17:54, Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1776211
>
> == SRU Justification ==
> IBM has requested these three commits in Artful.  In Artful, kdump fails to
> capture dump when smt=2 or off.
>
> Including these three commits allows kdump to work properly.
>
> == Fixes ==
> 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path")
> 04b9c96eae72 ("powerpc/crash: Remove the test for cpu_online in the IPI callback")
> 4552d128c26e ("powerpc: System reset avoid interleaving oops using die synchronisation")
>
> == Regression Potential ==
> Low.  Fixes are limited to powerpc.
>
> == Test Case ==
> A test kernel was built with these patches and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
> Balbir Singh (1):
>   powerpc/crash: Remove the test for cpu_online in the IPI callback
>
> Nicholas Piggin (2):
>   powerpc: Do not send system reset request through the oops path
>   powerpc: System reset avoid interleaving oops using die
>     synchronisation
>
>  arch/powerpc/kernel/crash.c |  3 ---
>  arch/powerpc/kernel/traps.c | 45 ++++++++++++++++++++++++++++++---------------
>  2 files changed, 30 insertions(+), 18 deletions(-)
>
Artful reaches EOL and this issue does not sound like a critical one to me. If
those changes are not yet in Bionic, we should get them applied there instead.

-Stefan


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment