[SRU][Disco][PATCH 0/2] SUNRPC: Use after free when GSSD credentials are invalid causes oops

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][Disco][PATCH 0/2] SUNRPC: Use after free when GSSD credentials are invalid causes oops

Matthew Ruffell
BugLink: https://bugs.launchpad.net/bugs/1842037

[Impact]

There is a use after free which normally causes a null pointer dereference when
NFS clients send invalid credentials via GSSD to a NFS server which has shares
protected by kerberos krb5* security.

The call trace is below:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
Call Trace:
  rpc_check_timeout+0x22/0xe0 [sunrpc]
  call_decode+0x12c/0x190 [sunrpc]
  __rpc_execute+0x7a/0x340 [sunrpc]
  rpc_execute+0xa3/0xb0 [sunrpc]
  rpc_run_task+0xf7/0x140 [sunrpc]
  nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4]
  nfs41_setup_state_renewal+0x3d/0x90 [nfsv4]
  ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4]
  ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4]
  nfs41_finish_session_reset+0x26/0x30 [nfsv4]
  nfs41_init_clientid+0x44/0x70 [nfsv4]
  nfs4_establish_lease+0x61/0xa0 [nfsv4]
  ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4]
  nfs4_state_manager+0x1b1/0x750 [nfsv4]
  ? kernel_sigaction+0x43/0xe0
  nfs4_run_state_manager+0x24/0x40 [nfsv4]
  kthread+0x120/0x140
  ? nfs4_state_manager+0x750/0x750 [nfsv4]
  ? __kthread_parkme+0x70/0x70
  ret_from_fork+0x35/0x40
 
This then causes outages on a heavily trafficked NFS server and no one can
access their shares until the server is rebooted.

[Fix]

The fix comes in the following two commits:

commit cea57789e4081870ac3498fbefabbbd0d0fd8434
Author: Trond Myklebust <[hidden email]>
Date:   Sat Mar 9 16:06:47 2019 -0500
Subject: SUNRPC: Clean up

commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3
Author: Trond Myklebust <[hidden email]>
Date:   Wed May 29 12:49:52 2019 -0400
Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
 credential

There is a small subtlety to be addressed here.
7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes"
cea57789e4081870ac3498fbefabbbd0d0fd8434, and
cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel.
The code path which triggers the use after free can still be reached without
cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both
commits resolves the problem, while still maintaining as little backporting as
necessary.

Please note, both commits have been backported.

cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final
comment, as well as some minor context adjustments in the final hunk.

7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting.
The upstream patch uses a switch statement based on error codes, while the disco
kernel uses a goto and label type architecture. The commits in the middle were
numerous and completely unrelated, so I backported the commit to use goto and
labels. Please review this backport a little more closely than you normally do,
but it has been tested, and I believe the code to be sound.

cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream.
7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream.

7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable,
in 5.1.9, and was omitted from disco stable updates process, probably because of
the patch not cleanly applying and requiring the backport and infrastructure
provided in cea57789e4081870ac3498fbefabbbd0d0fd8434.

[Testcase]

This is difficult to reproduce on test systems, and has instead been verified on
a production NFS v4.1 system in a customer environment. This server is heavily
trafficked and has a large number of different NFS clients connected to it.

I have built a test kernel that contains the above patch, and also patches for
Bug 1828978. It is available here:

https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test

Note that the above kernel is for bionic HWE, and not explicitly disco.

Discussion about the patch validation can be found at the bottom of Bug 1842037.

On unpatched kernels, expect to see the symptoms mentioned in Impact, and on
patched systems, everything working as intended.

[Regression Potential]

The changes are limited to users of sunrpc and the change itself is limited to
cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios,
the code should only be triggered when misbehaving clients do not keep their
authentication tickets up to date.

In case of regression, misbehaving clients may cause outages on services which
use sunrpc. In which case, the server administrator would have to revert to an
earlier kernel in order to restore those services, as you cannot stop
misbehaving clients from connecting.

Since the fix was selected for upstream stable, it has been vetted and widely
accepted by the community. The backport I performed should have no functional
difference at all.

Trond Myklebust (2):
  SUNRPC: Clean up
  SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
    credential

 net/sunrpc/clnt.c | 76 +++++++++++++++++++----------------------------
 1 file changed, 30 insertions(+), 46 deletions(-)

--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Disco][PATCH 1/2] SUNRPC: Clean up

Matthew Ruffell
From: Trond Myklebust <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1842037

Replace remaining callers of call_timeout() with rpc_check_timeout().

Signed-off-by: Trond Myklebust <[hidden email]>
(backported from commit cea57789e4081870ac3498fbefabbbd0d0fd8434)
[mruffell: changed comment and minor context adjustment]
Signed-off-by: Matthew Ruffell <[hidden email]>
---
 net/sunrpc/clnt.c | 52 ++++++++++++++++-------------------------------
 1 file changed, 17 insertions(+), 35 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0f3ee58aeaf1..f9568b0dc63e 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -70,7 +70,6 @@ static void call_status(struct rpc_task *task);
 static void call_transmit_status(struct rpc_task *task);
 static void call_refresh(struct rpc_task *task);
 static void call_refreshresult(struct rpc_task *task);
-static void call_timeout(struct rpc_task *task);
 static void call_connect(struct rpc_task *task);
 static void call_connect_status(struct rpc_task *task);
 
@@ -1887,7 +1886,8 @@ call_bind_status(struct rpc_task *task)
 
 retry_timeout:
  task->tk_status = 0;
- task->tk_action = call_timeout;
+ task->tk_action = call_encode;
+ rpc_check_timeout(task);
 }
 
 /*
@@ -2176,10 +2176,8 @@ call_status(struct rpc_task *task)
  case -EHOSTUNREACH:
  case -ENETUNREACH:
  case -EPERM:
- if (RPC_IS_SOFTCONN(task)) {
- rpc_exit(task, status);
- break;
- }
+ if (RPC_IS_SOFTCONN(task))
+ goto out_exit;
  /*
  * Delay any retries for 3 seconds, then handle as if it
  * were a timeout.
@@ -2187,7 +2185,6 @@ call_status(struct rpc_task *task)
  rpc_delay(task, 3*HZ);
  /* fall through */
  case -ETIMEDOUT:
- task->tk_action = call_timeout;
  break;
  case -ECONNREFUSED:
  case -ECONNRESET:
@@ -2200,18 +2197,21 @@ call_status(struct rpc_task *task)
  /* fall through */
  case -EPIPE:
  case -EAGAIN:
- task->tk_action = call_timeout;
  break;
  case -EIO:
  /* shutdown or soft timeout */
- rpc_exit(task, status);
- break;
+ goto out_exit;
  default:
  if (clnt->cl_chatty)
  printk("%s: RPC call returned error %d\n",
        clnt->cl_program->name, -status);
- rpc_exit(task, status);
+ goto out_exit;
  }
+ task->tk_action = call_encode;
+ rpc_check_timeout(task);
+ return;
+out_exit:
+ rpc_exit(task, status);
 }
 
 static void
@@ -2258,19 +2258,6 @@ rpc_check_timeout(struct rpc_task *task)
  rpcauth_invalcred(task);
 }
 
-/*
- * 6a. Handle RPC timeout
- * We do not release the request slot, so we keep using the
- * same XID for all retransmits.
- */
-static void
-call_timeout(struct rpc_task *task)
-{
- task->tk_action = call_encode;
- task->tk_status = 0;
- rpc_check_timeout(task);
-}
-
 /*
  * 7. Decode the RPC reply
  */
@@ -2309,16 +2296,8 @@ call_decode(struct rpc_task *task)
  WARN_ON(memcmp(&req->rq_rcv_buf, &req->rq_private_buf,
  sizeof(req->rq_rcv_buf)) != 0);
 
- if (req->rq_rcv_buf.len < 12) {
- if (!RPC_IS_SOFT(task)) {
- task->tk_action = call_encode;
- goto out_retry;
- }
- dprintk("RPC:       %s: too small RPC reply size (%d bytes)\n",
- clnt->cl_program->name, task->tk_status);
- task->tk_action = call_timeout;
+ if (req->rq_rcv_buf.len < 12)
  goto out_retry;
- }
 
  p = rpc_verify_header(task);
  if (IS_ERR(p)) {
@@ -2339,11 +2318,14 @@ call_decode(struct rpc_task *task)
  /* Note: rpc_verify_header() may have freed the RPC slot */
  if (task->tk_rqstp == req) {
  xdr_free_bvec(&req->rq_rcv_buf);
- req->rq_reply_bytes_recvd = req->rq_rcv_buf.len = 0;
+ req->rq_reply_bytes_recvd = 0;
+ req->rq_rcv_buf.len = 0;
  if (task->tk_client->cl_discrtry)
  xprt_conditional_disconnect(req->rq_xprt,
- req->rq_connect_cookie);
+    req->rq_connect_cookie);
  }
+ task->tk_action = call_encode;
+ rpc_check_timeout(task);
 }
 
 static __be32 *
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Disco][PATCH 2/2] SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS credential

Matthew Ruffell
In reply to this post by Matthew Ruffell
From: Trond Myklebust <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1842037

The addition of rpc_check_timeout() to call_decode causes an Oops
when the RPCSEC_GSS credential is rejected.
The reason is that rpc_decode_header() will call xprt_release() in
order to free task->tk_rqstp, which is needed by rpc_check_timeout()
to check whether or not we should exit due to a soft timeout.

The fix is to move the call to xprt_release() into call_decode() so
we can perform it after rpc_check_timeout().

Reported-by: Olga Kornievskaia <[hidden email]>
Reported-by: Nick Bowler <[hidden email]>
Fixes: cea57789e408 ("SUNRPC: Clean up")
Cc: [hidden email] # v5.1+
Signed-off-by: Trond Myklebust <[hidden email]>
Signed-off-by: Anna Schumaker <[hidden email]>
(backported from commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3)
[mruffell: rewrite goto error handling, medium context adjustments]
Signed-off-by: Matthew Ruffell <[hidden email]>
---
 net/sunrpc/clnt.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index f9568b0dc63e..5ea3c62fff9f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2303,6 +2303,8 @@ call_decode(struct rpc_task *task)
  if (IS_ERR(p)) {
  if (p == ERR_PTR(-EAGAIN))
  goto out_retry;
+ if (p == ERR_PTR(-EKEYREJECTED))
+ goto out_key_rejected;
  return;
  }
  task->tk_action = rpc_exit_task;
@@ -2315,17 +2317,21 @@ call_decode(struct rpc_task *task)
  return;
 out_retry:
  task->tk_status = 0;
- /* Note: rpc_verify_header() may have freed the RPC slot */
- if (task->tk_rqstp == req) {
- xdr_free_bvec(&req->rq_rcv_buf);
- req->rq_reply_bytes_recvd = 0;
- req->rq_rcv_buf.len = 0;
- if (task->tk_client->cl_discrtry)
- xprt_conditional_disconnect(req->rq_xprt,
-    req->rq_connect_cookie);
- }
+ xdr_free_bvec(&req->rq_rcv_buf);
+ req->rq_reply_bytes_recvd = 0;
+ req->rq_rcv_buf.len = 0;
+ if (task->tk_client->cl_discrtry)
+ xprt_conditional_disconnect(req->rq_xprt,
+    req->rq_connect_cookie);
  task->tk_action = call_encode;
  rpc_check_timeout(task);
+ return;
+out_key_rejected:
+ task->tk_action = call_reserve;
+ rpc_check_timeout(task);
+ rpcauth_invalcred(task);
+ /* Ensure we obtain a new XID if we retry! */
+ xprt_release(task);
 }
 
 static __be32 *
@@ -2413,11 +2419,7 @@ rpc_verify_header(struct rpc_task *task)
  task->tk_cred_retry--;
  dprintk("RPC: %5u %s: retry stale creds\n",
  task->tk_pid, __func__);
- rpcauth_invalcred(task);
- /* Ensure we obtain a new XID! */
- xprt_release(task);
- task->tk_action = call_reserve;
- goto out_retry;
+ return ERR_PTR(-EKEYREJECTED);
  case RPC_AUTH_BADCRED:
  case RPC_AUTH_BADVERF:
  /* possibly garbled cred/verf? */
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Disco][PATCH 0/2] SUNRPC: Use after free when GSSD credentials are invalid causes oops

Andrea Righi
In reply to this post by Matthew Ruffell
On Wed, Oct 30, 2019 at 04:50:04PM +1300, Matthew Ruffell wrote:

> BugLink: https://bugs.launchpad.net/bugs/1842037
>
> [Impact]
>
> There is a use after free which normally causes a null pointer dereference when
> NFS clients send invalid credentials via GSSD to a NFS server which has shares
> protected by kerberos krb5* security.
>
> The call trace is below:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
> Call Trace:
>   rpc_check_timeout+0x22/0xe0 [sunrpc]
>   call_decode+0x12c/0x190 [sunrpc]
>   __rpc_execute+0x7a/0x340 [sunrpc]
>   rpc_execute+0xa3/0xb0 [sunrpc]
>   rpc_run_task+0xf7/0x140 [sunrpc]
>   nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4]
>   nfs41_setup_state_renewal+0x3d/0x90 [nfsv4]
>   ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4]
>   ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4]
>   nfs41_finish_session_reset+0x26/0x30 [nfsv4]
>   nfs41_init_clientid+0x44/0x70 [nfsv4]
>   nfs4_establish_lease+0x61/0xa0 [nfsv4]
>   ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4]
>   nfs4_state_manager+0x1b1/0x750 [nfsv4]
>   ? kernel_sigaction+0x43/0xe0
>   nfs4_run_state_manager+0x24/0x40 [nfsv4]
>   kthread+0x120/0x140
>   ? nfs4_state_manager+0x750/0x750 [nfsv4]
>   ? __kthread_parkme+0x70/0x70
>   ret_from_fork+0x35/0x40
>  
> This then causes outages on a heavily trafficked NFS server and no one can
> access their shares until the server is rebooted.
>
> [Fix]
>
> The fix comes in the following two commits:
>
> commit cea57789e4081870ac3498fbefabbbd0d0fd8434
> Author: Trond Myklebust <[hidden email]>
> Date:   Sat Mar 9 16:06:47 2019 -0500
> Subject: SUNRPC: Clean up
>
> commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3
> Author: Trond Myklebust <[hidden email]>
> Date:   Wed May 29 12:49:52 2019 -0400
> Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
>  credential
>
> There is a small subtlety to be addressed here.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes"
> cea57789e4081870ac3498fbefabbbd0d0fd8434, and
> cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel.
> The code path which triggers the use after free can still be reached without
> cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both
> commits resolves the problem, while still maintaining as little backporting as
> necessary.
>
> Please note, both commits have been backported.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final
> comment, as well as some minor context adjustments in the final hunk.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting.
> The upstream patch uses a switch statement based on error codes, while the disco
> kernel uses a goto and label type architecture. The commits in the middle were
> numerous and completely unrelated, so I backported the commit to use goto and
> labels. Please review this backport a little more closely than you normally do,
> but it has been tested, and I believe the code to be sound.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable,
> in 5.1.9, and was omitted from disco stable updates process, probably because of
> the patch not cleanly applying and requiring the backport and infrastructure
> provided in cea57789e4081870ac3498fbefabbbd0d0fd8434.
>
> [Testcase]
>
> This is difficult to reproduce on test systems, and has instead been verified on
> a production NFS v4.1 system in a customer environment. This server is heavily
> trafficked and has a large number of different NFS clients connected to it.
>
> I have built a test kernel that contains the above patch, and also patches for
> Bug 1828978. It is available here:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test
>
> Note that the above kernel is for bionic HWE, and not explicitly disco.
>
> Discussion about the patch validation can be found at the bottom of Bug 1842037.
>
> On unpatched kernels, expect to see the symptoms mentioned in Impact, and on
> patched systems, everything working as intended.
>
> [Regression Potential]
>
> The changes are limited to users of sunrpc and the change itself is limited to
> cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios,
> the code should only be triggered when misbehaving clients do not keep their
> authentication tickets up to date.
>
> In case of regression, misbehaving clients may cause outages on services which
> use sunrpc. In which case, the server administrator would have to revert to an
> earlier kernel in order to restore those services, as you cannot stop
> misbehaving clients from connecting.
>
> Since the fix was selected for upstream stable, it has been vetted and widely
> accepted by the community. The backport I performed should have no functional
> difference at all.
>
> Trond Myklebust (2):
>   SUNRPC: Clean up
>   SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
>     credential
>
>  net/sunrpc/clnt.c | 76 +++++++++++++++++++----------------------------
>  1 file changed, 30 insertions(+), 46 deletions(-)

Makes sense to me and backports look coorect.

Acked-by: Andrea Righi <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Disco][PATCH 0/2] SUNRPC: Use after free when GSSD credentials are invalid causes oops

Stefan Bader-2
In reply to this post by Matthew Ruffell
On 30.10.19 04:50, Matthew Ruffell wrote:

> BugLink: https://bugs.launchpad.net/bugs/1842037
>
> [Impact]
>
> There is a use after free which normally causes a null pointer dereference when
> NFS clients send invalid credentials via GSSD to a NFS server which has shares
> protected by kerberos krb5* security.
>
> The call trace is below:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
> Call Trace:
>   rpc_check_timeout+0x22/0xe0 [sunrpc]
>   call_decode+0x12c/0x190 [sunrpc]
>   __rpc_execute+0x7a/0x340 [sunrpc]
>   rpc_execute+0xa3/0xb0 [sunrpc]
>   rpc_run_task+0xf7/0x140 [sunrpc]
>   nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4]
>   nfs41_setup_state_renewal+0x3d/0x90 [nfsv4]
>   ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4]
>   ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4]
>   nfs41_finish_session_reset+0x26/0x30 [nfsv4]
>   nfs41_init_clientid+0x44/0x70 [nfsv4]
>   nfs4_establish_lease+0x61/0xa0 [nfsv4]
>   ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4]
>   nfs4_state_manager+0x1b1/0x750 [nfsv4]
>   ? kernel_sigaction+0x43/0xe0
>   nfs4_run_state_manager+0x24/0x40 [nfsv4]
>   kthread+0x120/0x140
>   ? nfs4_state_manager+0x750/0x750 [nfsv4]
>   ? __kthread_parkme+0x70/0x70
>   ret_from_fork+0x35/0x40
>  
> This then causes outages on a heavily trafficked NFS server and no one can
> access their shares until the server is rebooted.
>
> [Fix]
>
> The fix comes in the following two commits:
>
> commit cea57789e4081870ac3498fbefabbbd0d0fd8434
> Author: Trond Myklebust <[hidden email]>
> Date:   Sat Mar 9 16:06:47 2019 -0500
> Subject: SUNRPC: Clean up
>
> commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3
> Author: Trond Myklebust <[hidden email]>
> Date:   Wed May 29 12:49:52 2019 -0400
> Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
>  credential
>
> There is a small subtlety to be addressed here.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes"
> cea57789e4081870ac3498fbefabbbd0d0fd8434, and
> cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel.
> The code path which triggers the use after free can still be reached without
> cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both
> commits resolves the problem, while still maintaining as little backporting as
> necessary.
>
> Please note, both commits have been backported.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final
> comment, as well as some minor context adjustments in the final hunk.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting.
> The upstream patch uses a switch statement based on error codes, while the disco
> kernel uses a goto and label type architecture. The commits in the middle were
> numerous and completely unrelated, so I backported the commit to use goto and
> labels. Please review this backport a little more closely than you normally do,
> but it has been tested, and I believe the code to be sound.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable,
> in 5.1.9, and was omitted from disco stable updates process, probably because of
> the patch not cleanly applying and requiring the backport and infrastructure
> provided in cea57789e4081870ac3498fbefabbbd0d0fd8434.
>
> [Testcase]
>
> This is difficult to reproduce on test systems, and has instead been verified on
> a production NFS v4.1 system in a customer environment. This server is heavily
> trafficked and has a large number of different NFS clients connected to it.
>
> I have built a test kernel that contains the above patch, and also patches for
> Bug 1828978. It is available here:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test
>
> Note that the above kernel is for bionic HWE, and not explicitly disco.
>
> Discussion about the patch validation can be found at the bottom of Bug 1842037.
>
> On unpatched kernels, expect to see the symptoms mentioned in Impact, and on
> patched systems, everything working as intended.
>
> [Regression Potential]
>
> The changes are limited to users of sunrpc and the change itself is limited to
> cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios,
> the code should only be triggered when misbehaving clients do not keep their
> authentication tickets up to date.
>
> In case of regression, misbehaving clients may cause outages on services which
> use sunrpc. In which case, the server administrator would have to revert to an
> earlier kernel in order to restore those services, as you cannot stop
> misbehaving clients from connecting.
>
> Since the fix was selected for upstream stable, it has been vetted and widely
> accepted by the community. The backport I performed should have no functional
> difference at all.
>
> Trond Myklebust (2):
>   SUNRPC: Clean up
>   SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
>     credential
>
>  net/sunrpc/clnt.c | 76 +++++++++++++++++++----------------------------
>  1 file changed, 30 insertions(+), 46 deletions(-)
>
Acked-by: Stefan Bader <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][Disco][PATCH 0/2] SUNRPC: Use after free when GSSD credentials are invalid causes oops

Khaled Elmously
In reply to this post by Matthew Ruffell
On 2019-10-30 16:50:04 , Matthew Ruffell wrote:

> BugLink: https://bugs.launchpad.net/bugs/1842037
>
> [Impact]
>
> There is a use after free which normally causes a null pointer dereference when
> NFS clients send invalid credentials via GSSD to a NFS server which has shares
> protected by kerberos krb5* security.
>
> The call trace is below:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
> Call Trace:
>   rpc_check_timeout+0x22/0xe0 [sunrpc]
>   call_decode+0x12c/0x190 [sunrpc]
>   __rpc_execute+0x7a/0x340 [sunrpc]
>   rpc_execute+0xa3/0xb0 [sunrpc]
>   rpc_run_task+0xf7/0x140 [sunrpc]
>   nfs4_proc_get_lease_time+0xf3/0x130 [nfsv4]
>   nfs41_setup_state_renewal+0x3d/0x90 [nfsv4]
>   ? nfs4_realloc_slot_table+0x5b/0x130 [nfsv4]
>   ? nfs4_setup_session_slot_tables+0x77/0xc0 [nfsv4]
>   nfs41_finish_session_reset+0x26/0x30 [nfsv4]
>   nfs41_init_clientid+0x44/0x70 [nfsv4]
>   nfs4_establish_lease+0x61/0xa0 [nfsv4]
>   ? nfs4_handle_reclaim_lease_error+0x108/0x130 [nfsv4]
>   nfs4_state_manager+0x1b1/0x750 [nfsv4]
>   ? kernel_sigaction+0x43/0xe0
>   nfs4_run_state_manager+0x24/0x40 [nfsv4]
>   kthread+0x120/0x140
>   ? nfs4_state_manager+0x750/0x750 [nfsv4]
>   ? __kthread_parkme+0x70/0x70
>   ret_from_fork+0x35/0x40
>  
> This then causes outages on a heavily trafficked NFS server and no one can
> access their shares until the server is rebooted.
>
> [Fix]
>
> The fix comes in the following two commits:
>
> commit cea57789e4081870ac3498fbefabbbd0d0fd8434
> Author: Trond Myklebust <[hidden email]>
> Date:   Sat Mar 9 16:06:47 2019 -0500
> Subject: SUNRPC: Clean up
>
> commit 7987b694ade8cc465ce10fb3dceaa614f13ceaf3
> Author: Trond Myklebust <[hidden email]>
> Date:   Wed May 29 12:49:52 2019 -0400
> Subject: SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
>  credential
>
> There is a small subtlety to be addressed here.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 is marked as "Fixes"
> cea57789e4081870ac3498fbefabbbd0d0fd8434, and
> cea57789e4081870ac3498fbefabbbd0d0fd8434 is not present in the disco kernel.
> The code path which triggers the use after free can still be reached without
> cea57789e4081870ac3498fbefabbbd0d0fd8434 being present, and applying both
> commits resolves the problem, while still maintaining as little backporting as
> necessary.
>
> Please note, both commits have been backported.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 required a small change to the final
> comment, as well as some minor context adjustments in the final hunk.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 required medium to heavy backporting.
> The upstream patch uses a switch statement based on error codes, while the disco
> kernel uses a goto and label type architecture. The commits in the middle were
> numerous and completely unrelated, so I backported the commit to use goto and
> labels. Please review this backport a little more closely than you normally do,
> but it has been tested, and I believe the code to be sound.
>
> cea57789e4081870ac3498fbefabbbd0d0fd8434 landed in 5.1 upstream.
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 landed in 5.2 upstream.
>
> 7987b694ade8cc465ce10fb3dceaa614f13ceaf3 was also selected for upstream -stable,
> in 5.1.9, and was omitted from disco stable updates process, probably because of
> the patch not cleanly applying and requiring the backport and infrastructure
> provided in cea57789e4081870ac3498fbefabbbd0d0fd8434.
>
> [Testcase]
>
> This is difficult to reproduce on test systems, and has instead been verified on
> a production NFS v4.1 system in a customer environment. This server is heavily
> trafficked and has a large number of different NFS clients connected to it.
>
> I have built a test kernel that contains the above patch, and also patches for
> Bug 1828978. It is available here:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test
>
> Note that the above kernel is for bionic HWE, and not explicitly disco.
>
> Discussion about the patch validation can be found at the bottom of Bug 1842037.
>
> On unpatched kernels, expect to see the symptoms mentioned in Impact, and on
> patched systems, everything working as intended.
>
> [Regression Potential]
>
> The changes are limited to users of sunrpc and the change itself is limited to
> cases where the RPCSEC_GSS credential is rejected. Under blue skies scenarios,
> the code should only be triggered when misbehaving clients do not keep their
> authentication tickets up to date.
>
> In case of regression, misbehaving clients may cause outages on services which
> use sunrpc. In which case, the server administrator would have to revert to an
> earlier kernel in order to restore those services, as you cannot stop
> misbehaving clients from connecting.
>
> Since the fix was selected for upstream stable, it has been vetted and widely
> accepted by the community. The backport I performed should have no functional
> difference at all.
>
> Trond Myklebust (2):
>   SUNRPC: Clean up
>   SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS
>     credential
>
>  net/sunrpc/clnt.c | 76 +++++++++++++++++++----------------------------
>  1 file changed, 30 insertions(+), 46 deletions(-)
>
> --
> 2.20.1
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team