[B][C][SRU][PATCH 0/1] Fix for ftrace test hang issue

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[B][C][SRU][PATCH 0/1] Fix for ftrace test hang issue

Po-Hsu Lin (Sam)
BugLink: https://bugs.launchpad.net/bugs/1826385

== Justification ==
Running the ftrace in ubuntu_kernel_selftests repetitively against x86
Cosmic kernel will cause system hang.

When this happens, you won't be able to ssh into this system, and no log
can be found in syslog.

This hang is caused by one of the sub-test: kprobe/multiple_kprobes

Masami's comment from upstream discussion (https://lkml.org/lkml/2018/12/3/1219):
In arch/x86/kernel/kprobes/opt.c, copy_optimized_instructions() does a
copy loop, but only update src and dest cursors, but not update real
address which is used for adjusting RIP relative instruction.

== Fix ==
43a1b0cb4 (kprobes/x86: Fix instruction patching corruption when copying
more than one RIP-relative instruction)

This patch is already in D.
For B/C, they all have this ill-commit 63fef14 and this patch can be
cherry-picked. Note that for Bionic kernel it can only be triggered in
this way with a kernel built with GCC-8.

Although it's a bit difficult to trigger this on Bionic, I think it
worth this fix as it's quite straightforward.

For X, the ill-commit 63fef14 does not exist.

== Test ==
Test kernel for Cosmic and Bionic built with GCC-8:
http://people.canonical.com/~phlin/kernel/lp-1826385-ftrace-hang/

(To verify this for the Bionic, you will need to build a kernel with GCC-8.)

Patch tested with a bare-metal and a KVM node, both of them can pass the
beating repetitively.

== Regression Potential ==
Low, upstream fix specific for kprobe and limited to x86 architecture.


Masami Hiramatsu (1):
  kprobes/x86: Fix instruction patching corruption when copying more
    than one RIP-relative instruction

 arch/x86/kernel/kprobes/opt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[B][C][SRU][PATCH 1/1] kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction

Po-Hsu Lin (Sam)
From: Masami Hiramatsu <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1826385

After copy_optimized_instructions() copies several instructions
to the working buffer it tries to fix up the real RIP address, but it
adjusts the RIP-relative instruction with an incorrect RIP address
for the 2nd and subsequent instructions due to a bug in the logic.

This will break the kernel pretty badly (with likely outcomes such as
a kernel freeze, a crash, or worse) because probed instructions can refer
to the wrong data.

For example putting kprobes on cpumask_next() typically hits this bug.

cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y
(in this case nr_cpumask_bits is an alias of nr_cpu_ids):

 <cpumask_next>:
        48 89 f0 mov    %rsi,%rax
        8b 35 7b fb e2 00 mov    0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids>
        55 push   %rbp
...

If we put a kprobe on it and it gets jump-optimized, it gets
patched by the kprobes code like this:

 <cpumask_next>:
        e9 95 7d 07 1e jmpq   0xffffffffa000207a
        7b fb jnp    0xffffffff81f8a2e2 <cpumask_next+2>
        e2 00 loop   0xffffffff81f8a2e9 <cpumask_next+9>
        55 push   %rbp

This shows that the first two MOV instructions were copied to a
trampoline buffer at 0xffffffffa000207a.

Here is the disassembled result of the trampoline, skipping
the optprobe template instructions:

        # Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea:

        54 push   %rsp
        ...
        48 83 c4 08 add    $0x8,%rsp
        9d popfq
        48 89 f0 mov    %rsi,%rax
        8b 35 82 7d db e2 mov    -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3>

This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of
the original *nr_cpu_ids. This leads to a kernel freeze because
cpumask_next() always returns 0 and for_each_cpu() never ends.

Fix this by adding 'len' correctly to the real RIP address while
copying.

[ mingo: Improved the changelog. ]

Reported-by: Michael Rodin <[hidden email]>
Signed-off-by: Masami Hiramatsu <[hidden email]>
Reviewed-by: Steven Rostedt (VMware) <[hidden email]>
Cc: Arnaldo Carvalho de Melo <[hidden email]>
Cc: Linus Torvalds <[hidden email]>
Cc: Peter Zijlstra <[hidden email]>
Cc: Ravi Bangoria <[hidden email]>
Cc: Steven Rostedt <[hidden email]>
Cc: Thomas Gleixner <[hidden email]>
Cc: [hidden email] # v4.15+
Fixes: 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use text_poke()")
Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox
Signed-off-by: Ingo Molnar <[hidden email]>
(cherry picked from commit 43a1b0cb4cd6dbfd3cd9c10da663368394d299d8)
Signed-off-by: Po-Hsu Lin <[hidden email]>
---
 arch/x86/kernel/kprobes/opt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 1467f96..72e1286 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -189,7 +189,7 @@ static int copy_optimized_instructions(u8 *dest, u8 *src, u8 *real)
  int len = 0, ret;
 
  while (len < RELATIVEJUMP_SIZE) {
- ret = __copy_instruction(dest + len, src + len, real, &insn);
+ ret = __copy_instruction(dest + len, src + len, real + len, &insn);
  if (!ret || !can_boost(&insn, src + len))
  return -EINVAL;
  len += ret;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [B][C][SRU][PATCH 1/1] kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction

Kleber Souza
On 5/7/19 10:50 AM, Po-Hsu Lin wrote:

> From: Masami Hiramatsu <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1826385
>
> After copy_optimized_instructions() copies several instructions
> to the working buffer it tries to fix up the real RIP address, but it
> adjusts the RIP-relative instruction with an incorrect RIP address
> for the 2nd and subsequent instructions due to a bug in the logic.
>
> This will break the kernel pretty badly (with likely outcomes such as
> a kernel freeze, a crash, or worse) because probed instructions can refer
> to the wrong data.
>
> For example putting kprobes on cpumask_next() typically hits this bug.
>
> cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y
> (in this case nr_cpumask_bits is an alias of nr_cpu_ids):
>
>  <cpumask_next>:
> 48 89 f0 mov    %rsi,%rax
> 8b 35 7b fb e2 00 mov    0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids>
> 55 push   %rbp
> ...
>
> If we put a kprobe on it and it gets jump-optimized, it gets
> patched by the kprobes code like this:
>
>  <cpumask_next>:
> e9 95 7d 07 1e jmpq   0xffffffffa000207a
> 7b fb jnp    0xffffffff81f8a2e2 <cpumask_next+2>
> e2 00 loop   0xffffffff81f8a2e9 <cpumask_next+9>
> 55 push   %rbp
>
> This shows that the first two MOV instructions were copied to a
> trampoline buffer at 0xffffffffa000207a.
>
> Here is the disassembled result of the trampoline, skipping
> the optprobe template instructions:
>
> # Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea:
>
> 54 push   %rsp
> ...
> 48 83 c4 08 add    $0x8,%rsp
> 9d popfq
> 48 89 f0 mov    %rsi,%rax
> 8b 35 82 7d db e2 mov    -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3>
>
> This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of
> the original *nr_cpu_ids. This leads to a kernel freeze because
> cpumask_next() always returns 0 and for_each_cpu() never ends.
>
> Fix this by adding 'len' correctly to the real RIP address while
> copying.
>
> [ mingo: Improved the changelog. ]
>
> Reported-by: Michael Rodin <[hidden email]>
> Signed-off-by: Masami Hiramatsu <[hidden email]>
> Reviewed-by: Steven Rostedt (VMware) <[hidden email]>
> Cc: Arnaldo Carvalho de Melo <[hidden email]>
> Cc: Linus Torvalds <[hidden email]>
> Cc: Peter Zijlstra <[hidden email]>
> Cc: Ravi Bangoria <[hidden email]>
> Cc: Steven Rostedt <[hidden email]>
> Cc: Thomas Gleixner <[hidden email]>
> Cc: [hidden email] # v4.15+
> Fixes: 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use text_poke()")
> Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox
> Signed-off-by: Ingo Molnar <[hidden email]>
> (cherry picked from commit 43a1b0cb4cd6dbfd3cd9c10da663368394d299d8)
> Signed-off-by: Po-Hsu Lin <[hidden email]>

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

> ---
>  arch/x86/kernel/kprobes/opt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
> index 1467f96..72e1286 100644
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -189,7 +189,7 @@ static int copy_optimized_instructions(u8 *dest, u8 *src, u8 *real)
>   int len = 0, ret;
>  
>   while (len < RELATIVEJUMP_SIZE) {
> - ret = __copy_instruction(dest + len, src + len, real, &insn);
> + ret = __copy_instruction(dest + len, src + len, real + len, &insn);
>   if (!ret || !can_boost(&insn, src + len))
>   return -EINVAL;
>   len += ret;



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [B][C][SRU][PATCH 1/1] kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction

Connor Kuehl
In reply to this post by Po-Hsu Lin (Sam)
On 5/7/19 1:50 AM, Po-Hsu Lin wrote:

> From: Masami Hiramatsu <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1826385
>
> After copy_optimized_instructions() copies several instructions
> to the working buffer it tries to fix up the real RIP address, but it
> adjusts the RIP-relative instruction with an incorrect RIP address
> for the 2nd and subsequent instructions due to a bug in the logic.
>
> This will break the kernel pretty badly (with likely outcomes such as
> a kernel freeze, a crash, or worse) because probed instructions can refer
> to the wrong data.
>
> For example putting kprobes on cpumask_next() typically hits this bug.
>
> cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y
> (in this case nr_cpumask_bits is an alias of nr_cpu_ids):
>
>  <cpumask_next>:
> 48 89 f0 mov    %rsi,%rax
> 8b 35 7b fb e2 00 mov    0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids>
> 55 push   %rbp
> ...
>
> If we put a kprobe on it and it gets jump-optimized, it gets
> patched by the kprobes code like this:
>
>  <cpumask_next>:
> e9 95 7d 07 1e jmpq   0xffffffffa000207a
> 7b fb jnp    0xffffffff81f8a2e2 <cpumask_next+2>
> e2 00 loop   0xffffffff81f8a2e9 <cpumask_next+9>
> 55 push   %rbp
>
> This shows that the first two MOV instructions were copied to a
> trampoline buffer at 0xffffffffa000207a.
>
> Here is the disassembled result of the trampoline, skipping
> the optprobe template instructions:
>
> # Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea:
>
> 54 push   %rsp
> ...
> 48 83 c4 08 add    $0x8,%rsp
> 9d popfq
> 48 89 f0 mov    %rsi,%rax
> 8b 35 82 7d db e2 mov    -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3>
>
> This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of
> the original *nr_cpu_ids. This leads to a kernel freeze because
> cpumask_next() always returns 0 and for_each_cpu() never ends.
>
> Fix this by adding 'len' correctly to the real RIP address while
> copying.
>
> [ mingo: Improved the changelog. ]
>
> Reported-by: Michael Rodin <[hidden email]>
> Signed-off-by: Masami Hiramatsu <[hidden email]>
> Reviewed-by: Steven Rostedt (VMware) <[hidden email]>
> Cc: Arnaldo Carvalho de Melo <[hidden email]>
> Cc: Linus Torvalds <[hidden email]>
> Cc: Peter Zijlstra <[hidden email]>
> Cc: Ravi Bangoria <[hidden email]>
> Cc: Steven Rostedt <[hidden email]>
> Cc: Thomas Gleixner <[hidden email]>
> Cc: [hidden email] # v4.15+
> Fixes: 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use text_poke()")
> Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox
> Signed-off-by: Ingo Molnar <[hidden email]>
> (cherry picked from commit 43a1b0cb4cd6dbfd3cd9c10da663368394d299d8)
> Signed-off-by: Po-Hsu Lin <[hidden email]>
> ---
Acked-by: Connor Kuehl <[hidden email]>

>  arch/x86/kernel/kprobes/opt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
> index 1467f96..72e1286 100644
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -189,7 +189,7 @@ static int copy_optimized_instructions(u8 *dest, u8 *src, u8 *real)
>   int len = 0, ret;
>  
>   while (len < RELATIVEJUMP_SIZE) {
> - ret = __copy_instruction(dest + len, src + len, real, &insn);
> + ret = __copy_instruction(dest + len, src + len, real + len, &insn);
>   if (!ret || !can_boost(&insn, src + len))
>   return -EINVAL;
>   len += ret;
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

pEpkey.asc (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED: [B][C][SRU][PATCH 0/1] Fix for ftrace test hang issue

Khaled Elmously
In reply to this post by Po-Hsu Lin (Sam)
On 2019-05-07 16:50:05 , Po-Hsu Lin wrote:

> BugLink: https://bugs.launchpad.net/bugs/1826385
>
> == Justification ==
> Running the ftrace in ubuntu_kernel_selftests repetitively against x86
> Cosmic kernel will cause system hang.
>
> When this happens, you won't be able to ssh into this system, and no log
> can be found in syslog.
>
> This hang is caused by one of the sub-test: kprobe/multiple_kprobes
>
> Masami's comment from upstream discussion (https://lkml.org/lkml/2018/12/3/1219):
> In arch/x86/kernel/kprobes/opt.c, copy_optimized_instructions() does a
> copy loop, but only update src and dest cursors, but not update real
> address which is used for adjusting RIP relative instruction.
>
> == Fix ==
> 43a1b0cb4 (kprobes/x86: Fix instruction patching corruption when copying
> more than one RIP-relative instruction)
>
> This patch is already in D.
> For B/C, they all have this ill-commit 63fef14 and this patch can be
> cherry-picked. Note that for Bionic kernel it can only be triggered in
> this way with a kernel built with GCC-8.
>
> Although it's a bit difficult to trigger this on Bionic, I think it
> worth this fix as it's quite straightforward.
>
> For X, the ill-commit 63fef14 does not exist.
>
> == Test ==
> Test kernel for Cosmic and Bionic built with GCC-8:
> http://people.canonical.com/~phlin/kernel/lp-1826385-ftrace-hang/
>
> (To verify this for the Bionic, you will need to build a kernel with GCC-8.)
>
> Patch tested with a bare-metal and a KVM node, both of them can pass the
> beating repetitively.
>
> == Regression Potential ==
> Low, upstream fix specific for kprobe and limited to x86 architecture.
>
>
> Masami Hiramatsu (1):
>   kprobes/x86: Fix instruction patching corruption when copying more
>     than one RIP-relative instruction
>
>  arch/x86/kernel/kprobes/opt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --
> 2.7.4
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team