[PATCH 0/2][SRU][EOAN] UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/2][SRU][EOAN] UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE

Christian Brauner-3
Hey everyone,

BugLink: https://bugs.launchpad.net/bugs/1847744

Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
which enables a process (watchee) to retrieve an fd for its seccomp
filter. This fd can then be handed to another (usually more privileged)
process (watcher). The watcher will then be able to receive seccomp
messages about the syscalls having been performed by the watchee.

This feature is heavily used by LXD but currently with limited
useability which is why we urgently need this series.
For example, it is currently used to intercept mknod() syscalls in
unprivileged containers. The mknod() syscall can be easily filtered
based on dev_t. This allows us to only intercept a very specific subset
of mknod() syscalls. Furthermore, mknod() is not possible in user
namespaces toto coelo and so intercepting and denying syscalls that are
not in the whitelist on accident is not a big deal. The watchee won't
notice a difference.

In contrast to mknod(), a lot of other syscall we intercept (e.g.
setxattr(), and soon mount()) cannot be easily filtered like mknod()
because they have pointer arguments. Additionally, some of them might
actually succeed in user namespaces (e.g. setxattr() for all "user.*"
xattrs). Since we currently cannot tell seccomp to continue from a user
notifier we are stuck with performing all of the syscalls in lieu of the
container. This is a huge security liability since it is extremely
difficult to correctly assume all of the necessary privileges of the
calling task such that the syscall can be successfully emulated without
escaping other additional security restrictions (think missing CAP_MKNOD
for mknod(), or MS_NODEV on a filesystem etc.). This can
be solved by telling seccomp to resume the syscall.

Until we have backported this patch we are blocked on intercepting the
mount() syscall. It would be excellent if we could backport this patch.

I've also backported the selftests since they are worth running!
Please note that these patches are up for the v5.5 merge window and will
not be carried as Ubuntu specific patches indefinitely!

Thanks!
Christian

Christian Brauner (2):
  UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
  UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE

 include/uapi/linux/seccomp.h                  |  29 +++++
 kernel/seccomp.c                              |  28 ++++-
 tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
 3 files changed, 158 insertions(+), 6 deletions(-)

--
2.23.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/2][SRU][EOAN] UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE

Christian Brauner-3
BugLink: https://bugs.launchpad.net/bugs/1847744

This allows the seccomp notifier to continue a syscall. A positive
discussion about this feature was triggered by a post to the
ksummit-discuss mailing list (cf. [3]) and took place during KSummit
(cf. [1]) and again at the containers/checkpoint-restore
micro-conference at Linux Plumbers.

Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
which enables a process (watchee) to retrieve an fd for its seccomp
filter. This fd can then be handed to another (usually more privileged)
process (watcher). The watcher will then be able to receive seccomp
messages about the syscalls having been performed by the watchee.

This feature is heavily used in some userspace workloads. For example,
it is currently used to intercept mknod() syscalls in user namespaces
aka in containers.
The mknod() syscall can be easily filtered based on dev_t. This allows
us to only intercept a very specific subset of mknod() syscalls.
Furthermore, mknod() is not possible in user namespaces toto coelo and
so intercepting and denying syscalls that are not in the whitelist on
accident is not a big deal. The watchee won't notice a difference.

In contrast to mknod(), a lot of other syscall we intercept (e.g.
setxattr()) cannot be easily filtered like mknod() because they have
pointer arguments. Additionally, some of them might actually succeed in
user namespaces (e.g. setxattr() for all "user.*" xattrs). Since we
currently cannot tell seccomp to continue from a user notifier we are
stuck with performing all of the syscalls in lieu of the container. This
is a huge security liability since it is extremely difficult to
correctly assume all of the necessary privileges of the calling task
such that the syscall can be successfully emulated without escaping
other additional security restrictions (think missing CAP_MKNOD for
mknod(), or MS_NODEV on a filesystem etc.). This can be solved by
telling seccomp to resume the syscall.

One thing that came up in the discussion was the problem that another
thread could change the memory after userspace has decided to let the
syscall continue which is a well known TOCTOU with seccomp which is
present in other ways already.
The discussion showed that this feature is already very useful for any
syscall without pointer arguments. For any accidentally intercepted
non-pointer syscall it is safe to continue.
For syscalls with pointer arguments there is a race but for any cautious
userspace and the main usec cases the race doesn't matter. The notifier
is intended to be used in a scenario where a more privileged watcher
supervises the syscalls of lesser privileged watchee to allow it to get
around kernel-enforced limitations by performing the syscall for it
whenever deemed save by the watcher. Hence, if a user tricks the watcher
into allowing a syscall they will either get a deny based on
kernel-enforced restrictions later or they will have changed the
arguments in such a way that they manage to perform a syscall with
arguments that they would've been allowed to do anyway.
In general, it is good to point out again, that the notifier fd was not
intended to allow userspace to implement a security policy but rather to
work around kernel security mechanisms in cases where the watcher knows
that a given action is safe to perform.

/* References */
[1]: https://linuxplumbersconf.org/event/4/contributions/560
[2]: https://linuxplumbersconf.org/event/4/contributions/477
[3]: https://lore.kernel.org/r/20190719093538.dhyopljyr5ns33qx@...
[4]: commit 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")

Co-developed-by: Kees Cook <[hidden email]>
Signed-off-by: Christian Brauner <[hidden email]>
Reviewed-by: Tycho Andersen <[hidden email]>
Cc: Andy Lutomirski <[hidden email]>
Cc: Will Drewry <[hidden email]>
CC: Tyler Hicks <[hidden email]>
Link: https://lore.kernel.org/r/20190920083007.11475-2-christian.brauner@...
Signed-off-by: Kees Cook <[hidden email]>
(cherry picked from commit fb3c5386b382d4097476ce9647260fc89b34afdb
 https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git)
Signed-off-by: Christian Brauner <[hidden email]>
---
 include/uapi/linux/seccomp.h | 29 +++++++++++++++++++++++++++++
 kernel/seccomp.c             | 28 ++++++++++++++++++++++------
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 90734aa5aa36..e48e2fa2d248 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -76,6 +76,35 @@ struct seccomp_notif {
  struct seccomp_data data;
 };
 
+/*
+ * Valid flags for struct seccomp_notif_resp
+ *
+ * Note, the SECCOMP_USER_NOTIF_FLAG_CONTINUE flag must be used with caution!
+ * If set by the process supervising the syscalls of another process the
+ * syscall will continue. This is problematic because of an inherent TOCTOU.
+ * An attacker can exploit the time while the supervised process is waiting on
+ * a response from the supervising process to rewrite syscall arguments which
+ * are passed as pointers of the intercepted syscall.
+ * It should be absolutely clear that this means that the seccomp notifier
+ * _cannot_ be used to implement a security policy! It should only ever be used
+ * in scenarios where a more privileged process supervises the syscalls of a
+ * lesser privileged process to get around kernel-enforced security
+ * restrictions when the privileged process deems this safe. In other words,
+ * in order to continue a syscall the supervising process should be sure that
+ * another security mechanism or the kernel itself will sufficiently block
+ * syscalls if arguments are rewritten to something unsafe.
+ *
+ * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
+ * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
+ * same syscall, the most recently added filter takes precedence. This means
+ * that the new SECCOMP_RET_USER_NOTIF filter can override any
+ * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
+ * such filtered syscalls to be executed by sending the response
+ * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
+ * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
+ */
+#define SECCOMP_USER_NOTIF_FLAG_CONTINUE BIT(0)
+
 struct seccomp_notif_resp {
  __u64 id;
  __s64 val;
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index dba52a7db5e8..12d2227e5786 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -75,6 +75,7 @@ struct seccomp_knotif {
  /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */
  int error;
  long val;
+ u32 flags;
 
  /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
  struct completion ready;
@@ -732,11 +733,12 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
  return filter->notif->next_id++;
 }
 
-static void seccomp_do_user_notification(int this_syscall,
- struct seccomp_filter *match,
- const struct seccomp_data *sd)
+static int seccomp_do_user_notification(int this_syscall,
+ struct seccomp_filter *match,
+ const struct seccomp_data *sd)
 {
  int err;
+ u32 flags = 0;
  long ret = 0;
  struct seccomp_knotif n = {};
 
@@ -764,6 +766,7 @@ static void seccomp_do_user_notification(int this_syscall,
  if (err == 0) {
  ret = n.val;
  err = n.error;
+ flags = n.flags;
  }
 
  /*
@@ -780,8 +783,14 @@ static void seccomp_do_user_notification(int this_syscall,
  list_del(&n.list);
 out:
  mutex_unlock(&match->notify_lock);
+
+ /* Userspace requests to continue the syscall. */
+ if (flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE)
+ return 0;
+
  syscall_set_return_value(current, task_pt_regs(current),
  err, ret);
+ return -1;
 }
 
 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
@@ -867,8 +876,10 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
  return 0;
 
  case SECCOMP_RET_USER_NOTIF:
- seccomp_do_user_notification(this_syscall, match, sd);
- goto skip;
+ if (seccomp_do_user_notification(this_syscall, match, sd))
+ goto skip;
+
+ return 0;
 
  case SECCOMP_RET_LOG:
  seccomp_log(this_syscall, 0, action, true);
@@ -1087,7 +1098,11 @@ static long seccomp_notify_send(struct seccomp_filter *filter,
  if (copy_from_user(&resp, buf, sizeof(resp)))
  return -EFAULT;
 
- if (resp.flags)
+ if (resp.flags & ~SECCOMP_USER_NOTIF_FLAG_CONTINUE)
+ return -EINVAL;
+
+ if ((resp.flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE) &&
+    (resp.error || resp.val))
  return -EINVAL;
 
  ret = mutex_lock_interruptible(&filter->notify_lock);
@@ -1116,6 +1131,7 @@ static long seccomp_notify_send(struct seccomp_filter *filter,
  knotif->state = SECCOMP_NOTIFY_REPLIED;
  knotif->error = resp.error;
  knotif->val = resp.val;
+ knotif->flags = resp.flags;
  complete(&knotif->ready);
 out:
  mutex_unlock(&filter->notify_lock);
--
2.23.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/2][SRU][EOAN] UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE

Christian Brauner-3
In reply to this post by Christian Brauner-3
BugLink: https://bugs.launchpad.net/bugs/1847744

Test whether a syscall can be performed after having been intercepted by
the seccomp notifier. The test uses dup() and kcmp() since it allows us to
nicely test whether the dup() syscall actually succeeded by comparing whether
the fds refer to the same underlying struct file.

Signed-off-by: Christian Brauner <[hidden email]>
Cc: Andy Lutomirski <[hidden email]>
Cc: Will Drewry <[hidden email]>
Cc: Shuah Khan <[hidden email]>
Cc: Alexei Starovoitov <[hidden email]>
Cc: Daniel Borkmann <[hidden email]>
Cc: Martin KaFai Lau <[hidden email]>
Cc: Song Liu <[hidden email]>
Cc: Yonghong Song <[hidden email]>
Cc: Tycho Andersen <[hidden email]>
CC: Tyler Hicks <[hidden email]>
Cc: [hidden email]
Cc: [hidden email]
Cc: [hidden email]
Cc: [hidden email]
Link: https://lore.kernel.org/r/20190920083007.11475-4-christian.brauner@...
Signed-off-by: Kees Cook <[hidden email]>
(cherry picked from commit 0eebfed2954f152259cae0ad57b91d3ea92968e8
 https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git)
Signed-off-by: Christian Brauner <[hidden email]>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 6ef7f16c4cf5..31a0e3daf326 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -43,6 +43,7 @@
 #include <sys/times.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <linux/kcmp.h>
 
 #include <unistd.h>
 #include <sys/syscall.h>
@@ -166,6 +167,10 @@ struct seccomp_metadata {
 
 #define SECCOMP_RET_USER_NOTIF 0x7fc00000U
 
+#ifndef SECCOMP_USER_NOTIF_FLAG_CONTINUE
+#define SECCOMP_USER_NOTIF_FLAG_CONTINUE 0x00000001
+#endif
+
 #define SECCOMP_IOC_MAGIC '!'
 #define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr)
 #define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type)
@@ -3480,6 +3485,108 @@ TEST(seccomp_get_notif_sizes)
  EXPECT_EQ(sizes.seccomp_notif_resp, sizeof(struct seccomp_notif_resp));
 }
 
+static int filecmp(pid_t pid1, pid_t pid2, int fd1, int fd2)
+{
+#ifdef __NR_kcmp
+ return syscall(__NR_kcmp, pid1, pid2, KCMP_FILE, fd1, fd2);
+#else
+ errno = ENOSYS;
+ return -1;
+#endif
+}
+
+TEST(user_notification_continue)
+{
+ pid_t pid;
+ long ret;
+ int status, listener;
+ struct seccomp_notif req = {};
+ struct seccomp_notif_resp resp = {};
+ struct pollfd pollfd;
+
+ ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ ASSERT_EQ(0, ret) {
+ TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+ }
+
+ listener = user_trap_syscall(__NR_dup, SECCOMP_FILTER_FLAG_NEW_LISTENER);
+ ASSERT_GE(listener, 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ int dup_fd, pipe_fds[2];
+ pid_t self;
+
+ ret = pipe(pipe_fds);
+ if (ret < 0)
+ exit(1);
+
+ dup_fd = dup(pipe_fds[0]);
+ if (dup_fd < 0)
+ exit(1);
+
+ self = getpid();
+
+ ret = filecmp(self, self, pipe_fds[0], dup_fd);
+ if (ret)
+ exit(2);
+
+ exit(0);
+ }
+
+ pollfd.fd = listener;
+ pollfd.events = POLLIN | POLLOUT;
+
+ EXPECT_GT(poll(&pollfd, 1, -1), 0);
+ EXPECT_EQ(pollfd.revents, POLLIN);
+
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+
+ pollfd.fd = listener;
+ pollfd.events = POLLIN | POLLOUT;
+
+ EXPECT_GT(poll(&pollfd, 1, -1), 0);
+ EXPECT_EQ(pollfd.revents, POLLOUT);
+
+ EXPECT_EQ(req.data.nr, __NR_dup);
+
+ resp.id = req.id;
+ resp.flags = SECCOMP_USER_NOTIF_FLAG_CONTINUE;
+
+ /*
+ * Verify that setting SECCOMP_USER_NOTIF_FLAG_CONTINUE enforces other
+ * args be set to 0.
+ */
+ resp.error = 0;
+ resp.val = USER_NOTIF_MAGIC;
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ resp.error = USER_NOTIF_MAGIC;
+ resp.val = 0;
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ resp.error = 0;
+ resp.val = 0;
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0) {
+ if (errno == EINVAL)
+ XFAIL(goto skip, "Kernel does not support SECCOMP_USER_NOTIF_FLAG_CONTINUE");
+ }
+
+skip:
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status)) {
+ if (WEXITSTATUS(status) == 2) {
+ XFAIL(return, "Kernel does not support kcmp() syscall");
+ return;
+ }
+ }
+}
+
 /*
  * TODO:
  * - add microbenchmarks
--
2.23.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/2][SRU][EOAN] UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE

Stefan Bader-2
In reply to this post by Christian Brauner-3
On 16.10.19 16:04, Christian Brauner wrote:

> Hey everyone,
>
> BugLink: https://bugs.launchpad.net/bugs/1847744
>
> Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
> which enables a process (watchee) to retrieve an fd for its seccomp
> filter. This fd can then be handed to another (usually more privileged)
> process (watcher). The watcher will then be able to receive seccomp
> messages about the syscalls having been performed by the watchee.
>
> This feature is heavily used by LXD but currently with limited
> useability which is why we urgently need this series.
> For example, it is currently used to intercept mknod() syscalls in
> unprivileged containers. The mknod() syscall can be easily filtered
> based on dev_t. This allows us to only intercept a very specific subset
> of mknod() syscalls. Furthermore, mknod() is not possible in user
> namespaces toto coelo and so intercepting and denying syscalls that are
> not in the whitelist on accident is not a big deal. The watchee won't
> notice a difference.
>
> In contrast to mknod(), a lot of other syscall we intercept (e.g.
> setxattr(), and soon mount()) cannot be easily filtered like mknod()
> because they have pointer arguments. Additionally, some of them might
> actually succeed in user namespaces (e.g. setxattr() for all "user.*"
> xattrs). Since we currently cannot tell seccomp to continue from a user
> notifier we are stuck with performing all of the syscalls in lieu of the
> container. This is a huge security liability since it is extremely
> difficult to correctly assume all of the necessary privileges of the
> calling task such that the syscall can be successfully emulated without
> escaping other additional security restrictions (think missing CAP_MKNOD
> for mknod(), or MS_NODEV on a filesystem etc.). This can
> be solved by telling seccomp to resume the syscall.
>
> Until we have backported this patch we are blocked on intercepting the
> mount() syscall. It would be excellent if we could backport this patch.
>
> I've also backported the selftests since they are worth running!
> Please note that these patches are up for the v5.5 merge window and will
> not be carried as Ubuntu specific patches indefinitely!
>
> Thanks!
> Christian
>
> Christian Brauner (2):
>   UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
>   UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
>
>  include/uapi/linux/seccomp.h                  |  29 +++++
>  kernel/seccomp.c                              |  28 ++++-
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
>  3 files changed, 158 insertions(+), 6 deletions(-)
>
Acked-by: Stefan Bader <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [PATCH 0/2][SRU][EOAN] UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE

Khaled Elmously
In reply to this post by Christian Brauner-3
On 2019-10-16 16:04:30 , Christian Brauner wrote:

> Hey everyone,
>
> BugLink: https://bugs.launchpad.net/bugs/1847744
>
> Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
> which enables a process (watchee) to retrieve an fd for its seccomp
> filter. This fd can then be handed to another (usually more privileged)
> process (watcher). The watcher will then be able to receive seccomp
> messages about the syscalls having been performed by the watchee.
>
> This feature is heavily used by LXD but currently with limited
> useability which is why we urgently need this series.
> For example, it is currently used to intercept mknod() syscalls in
> unprivileged containers. The mknod() syscall can be easily filtered
> based on dev_t. This allows us to only intercept a very specific subset
> of mknod() syscalls. Furthermore, mknod() is not possible in user
> namespaces toto coelo and so intercepting and denying syscalls that are
> not in the whitelist on accident is not a big deal. The watchee won't
> notice a difference.
>
> In contrast to mknod(), a lot of other syscall we intercept (e.g.
> setxattr(), and soon mount()) cannot be easily filtered like mknod()
> because they have pointer arguments. Additionally, some of them might
> actually succeed in user namespaces (e.g. setxattr() for all "user.*"
> xattrs). Since we currently cannot tell seccomp to continue from a user
> notifier we are stuck with performing all of the syscalls in lieu of the
> container. This is a huge security liability since it is extremely
> difficult to correctly assume all of the necessary privileges of the
> calling task such that the syscall can be successfully emulated without
> escaping other additional security restrictions (think missing CAP_MKNOD
> for mknod(), or MS_NODEV on a filesystem etc.). This can
> be solved by telling seccomp to resume the syscall.
>
> Until we have backported this patch we are blocked on intercepting the
> mount() syscall. It would be excellent if we could backport this patch.
>
> I've also backported the selftests since they are worth running!
> Please note that these patches are up for the v5.5 merge window and will
> not be carried as Ubuntu specific patches indefinitely!
>
> Thanks!
> Christian
>
> Christian Brauner (2):
>   UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
>   UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
>
>  include/uapi/linux/seccomp.h                  |  29 +++++
>  kernel/seccomp.c                              |  28 ++++-
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
>  3 files changed, 158 insertions(+), 6 deletions(-)
>
> --
> 2.23.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [PATCH 0/2][SRU][EOAN] UBUNTU: SAUCE: seccomp: backport SECCOMP_USER_NOTIF_FLAG_CONTINUE

Khaled Elmously
In reply to this post by Christian Brauner-3
On 2019-10-16 16:04:30 , Christian Brauner wrote:

> Hey everyone,
>
> BugLink: https://bugs.launchpad.net/bugs/1847744
>
> Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4])
> which enables a process (watchee) to retrieve an fd for its seccomp
> filter. This fd can then be handed to another (usually more privileged)
> process (watcher). The watcher will then be able to receive seccomp
> messages about the syscalls having been performed by the watchee.
>
> This feature is heavily used by LXD but currently with limited
> useability which is why we urgently need this series.
> For example, it is currently used to intercept mknod() syscalls in
> unprivileged containers. The mknod() syscall can be easily filtered
> based on dev_t. This allows us to only intercept a very specific subset
> of mknod() syscalls. Furthermore, mknod() is not possible in user
> namespaces toto coelo and so intercepting and denying syscalls that are
> not in the whitelist on accident is not a big deal. The watchee won't
> notice a difference.
>
> In contrast to mknod(), a lot of other syscall we intercept (e.g.
> setxattr(), and soon mount()) cannot be easily filtered like mknod()
> because they have pointer arguments. Additionally, some of them might
> actually succeed in user namespaces (e.g. setxattr() for all "user.*"
> xattrs). Since we currently cannot tell seccomp to continue from a user
> notifier we are stuck with performing all of the syscalls in lieu of the
> container. This is a huge security liability since it is extremely
> difficult to correctly assume all of the necessary privileges of the
> calling task such that the syscall can be successfully emulated without
> escaping other additional security restrictions (think missing CAP_MKNOD
> for mknod(), or MS_NODEV on a filesystem etc.). This can
> be solved by telling seccomp to resume the syscall.
>
> Until we have backported this patch we are blocked on intercepting the
> mount() syscall. It would be excellent if we could backport this patch.
>
> I've also backported the selftests since they are worth running!
> Please note that these patches are up for the v5.5 merge window and will
> not be carried as Ubuntu specific patches indefinitely!
>
> Thanks!
> Christian
>
> Christian Brauner (2):
>   UBUNTU: SAUCE: seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
>   UBUNTU: SAUCE: seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
>
>  include/uapi/linux/seccomp.h                  |  29 +++++
>  kernel/seccomp.c                              |  28 ++++-
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++++++
>  3 files changed, 158 insertions(+), 6 deletions(-)
>
> --
> 2.23.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team