[SRU][G][PATCH 0/2] vfio: pass DMA availability information to userspace (LP: 1907421)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][G][PATCH 0/2] vfio: pass DMA availability information to userspace (LP: 1907421)

frank.heimes
BugLink: https://bugs.launchpad.net/bugs/1907421

SRU Justification:

[Impact]

* In case a vfio-pci device on s390x is under I/O load, vfio-pci device may end up in error state.

* The commit 492855939bdb added a limit to the number of concurrent DMA requests for a vfio container.

* However, lazy unmapping in s390x can in fact cause quite a large number of outstanding DMA requests
  to build up prior to being purged - potentially the entire guest DMA space.

* This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed: No space left on device'.

* The solution requires a change to both kernel and qemu.

* The kernel side of things is addressed by this SRU.

* The fix adds the ability to provide the number of allowable DMA requests via VFIO_IOMMU_GET_INFO ioctl.

* The actual fix comes with commit 7d6e1329652e, but another fix ccd59dce1a21 is needed to get it cleanly applied.

[Fix]

* ccd59dce1a21f473518bf273bdf5b182bab955b3 ccd59dce1a21 "vfio/type1: Refactor vfio_iommu_type1_ioctl()"

* 7d6e1329652ed971d1b6e0e7bea66fba5044e271 7d6e1329652e "vfio iommu: Add dma available capability"

[Test Case]

* IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.

* PCIe adapters in place that provide vfio, like RoCE Express 2.

* A KVM host needs to be setup and a KVM guest (use again 20.10) that uses vfio.

* Generate I/O that flows through the vf and watch out for error like 'VFIO_MAP_DMA failed: No space left on device' in the log.

[Regression Potential]

* The first patch ccd59dce1a21 modifies the common code file drivers/vfio/vfio_iommu_type1.c quity significantly.

* But the reason is not that it introduces a lot of new things, it's a refactoring patch.

* Nevertheless if done in a bad way it can significantly harm the IO memory management of virtual function adapters.

* In worst case it may break them entirely, instead of 'just' exeeding the entire DMA space.

* Things could also go wrong while doing the mapping and unmapping of DMA, that may even have an impact beyond vf adapters - harming other DMA devices.

* The handling of dirty pages is also touched and the ioctl itself - which is important to keep the control of the devices.

* But as said before, it re-factoring work, it's upstream accepted since 5.9 and the provenance shows that many engineers had an eye on these changes.

* The second patch 7d6e1329652e - that inclides the needed fix - comes with far less modifications.

* It also tounches drivers/vfio/vfio_iommu_type1.c but in a way that it mainly
  adds one new function (vfio_iommu_dma_avail_build_caps) and an add. if statement in vfio_iommu_type1_get_info.

* That allows to provide the current allowable number of DMA mappings to the userspace via the IOMMU info chain,
  so that the userspace can take appropriate mitigation.

* Potential problems can be that the current allowable number of DMA mappings are wrong
  and in best case just mappings are wasted
  and in worst case there are more reported than available in reality, which could have a severe impact.

* What happens in such a case is a bit depending on the userspace.

* This patch got upstream accepted with kernel 5.10 and pre-tested by IBM.

* In addition a PPA was created with a patched groovy kernel that was shared for further testing.

[Other]

* The patch got upstream accepted with kernel v5.10, hence it will land in Hirsute once the target kernel 5.10 is in.

* For 5.4 this will come as upstream stable update 5.4.90/91.

* For 5.8 (and 5.9) upstream stable support already ended, hence this SRU is only needed for groovy.

Liu Yi L (1):
  vfio/type1: Refactor vfio_iommu_type1_ioctl()

Matthew Rosato (1):
  vfio iommu: Add dma available capability

 drivers/vfio/vfio_iommu_type1.c | 409 ++++++++++++++++++--------------
 include/uapi/linux/vfio.h       |  15 ++
 2 files changed, 244 insertions(+), 180 deletions(-)

--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][G][PATCH 1/2] vfio/type1: Refactor vfio_iommu_type1_ioctl()

frank.heimes
From: Liu Yi L <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1907421

This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of
if-else, and each command got a helper function.

Cc: Kevin Tian <[hidden email]>
CC: Jacob Pan <[hidden email]>
Cc: Alex Williamson <[hidden email]>
Cc: Eric Auger <[hidden email]>
Cc: Jean-Philippe Brucker <[hidden email]>
Cc: Joerg Roedel <[hidden email]>
Cc: Lu Baolu <[hidden email]>
Reviewed-by: Eric Auger <[hidden email]>
Suggested-by: Christoph Hellwig <[hidden email]>
Signed-off-by: Liu Yi L <[hidden email]>
Signed-off-by: Alex Williamson <[hidden email]>
(cherry picked from commit ccd59dce1a21f473518bf273bdf5b182bab955b3)
Signed-off-by: Frank Heimes <[hidden email]>
---
 drivers/vfio/vfio_iommu_type1.c | 394 +++++++++++++++++---------------
 1 file changed, 213 insertions(+), 181 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 00d3cf12e92c..88efc85c6b49 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2515,6 +2515,23 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
  return ret;
 }
 
+static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
+    unsigned long arg)
+{
+ switch (arg) {
+ case VFIO_TYPE1_IOMMU:
+ case VFIO_TYPE1v2_IOMMU:
+ case VFIO_TYPE1_NESTING_IOMMU:
+ return 1;
+ case VFIO_DMA_CC_IOMMU:
+ if (!iommu)
+ return 0;
+ return vfio_domains_have_iommu_cache(iommu);
+ default:
+ return 0;
+ }
+}
+
 static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
  struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
  size_t size)
@@ -2591,241 +2608,256 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
  return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
 }
 
-static long vfio_iommu_type1_ioctl(void *iommu_data,
-   unsigned int cmd, unsigned long arg)
+static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
+     unsigned long arg)
 {
- struct vfio_iommu *iommu = iommu_data;
+ struct vfio_iommu_type1_info info;
  unsigned long minsz;
+ struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+ unsigned long capsz;
+ int ret;
 
- if (cmd == VFIO_CHECK_EXTENSION) {
- switch (arg) {
- case VFIO_TYPE1_IOMMU:
- case VFIO_TYPE1v2_IOMMU:
- case VFIO_TYPE1_NESTING_IOMMU:
- return 1;
- case VFIO_DMA_CC_IOMMU:
- if (!iommu)
- return 0;
- return vfio_domains_have_iommu_cache(iommu);
- default:
- return 0;
- }
- } else if (cmd == VFIO_IOMMU_GET_INFO) {
- struct vfio_iommu_type1_info info;
- struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
- unsigned long capsz;
- int ret;
-
- minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+ minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
 
- /* For backward compatibility, cannot require this */
- capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
+ /* For backward compatibility, cannot require this */
+ capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
 
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
+ if (copy_from_user(&info, (void __user *)arg, minsz))
+ return -EFAULT;
 
- if (info.argsz < minsz)
- return -EINVAL;
+ if (info.argsz < minsz)
+ return -EINVAL;
 
- if (info.argsz >= capsz) {
- minsz = capsz;
- info.cap_offset = 0; /* output, no-recopy necessary */
- }
+ if (info.argsz >= capsz) {
+ minsz = capsz;
+ info.cap_offset = 0; /* output, no-recopy necessary */
+ }
 
- mutex_lock(&iommu->lock);
- info.flags = VFIO_IOMMU_INFO_PGSIZES;
+ mutex_lock(&iommu->lock);
+ info.flags = VFIO_IOMMU_INFO_PGSIZES;
 
- info.iova_pgsizes = iommu->pgsize_bitmap;
+ info.iova_pgsizes = iommu->pgsize_bitmap;
 
- ret = vfio_iommu_migration_build_caps(iommu, &caps);
+ ret = vfio_iommu_migration_build_caps(iommu, &caps);
 
- if (!ret)
- ret = vfio_iommu_iova_build_caps(iommu, &caps);
+ if (!ret)
+ ret = vfio_iommu_iova_build_caps(iommu, &caps);
 
- mutex_unlock(&iommu->lock);
+ mutex_unlock(&iommu->lock);
 
- if (ret)
- return ret;
+ if (ret)
+ return ret;
 
- if (caps.size) {
- info.flags |= VFIO_IOMMU_INFO_CAPS;
+ if (caps.size) {
+ info.flags |= VFIO_IOMMU_INFO_CAPS;
 
- if (info.argsz < sizeof(info) + caps.size) {
- info.argsz = sizeof(info) + caps.size;
- } else {
- vfio_info_cap_shift(&caps, sizeof(info));
- if (copy_to_user((void __user *)arg +
- sizeof(info), caps.buf,
- caps.size)) {
- kfree(caps.buf);
- return -EFAULT;
- }
- info.cap_offset = sizeof(info);
+ if (info.argsz < sizeof(info) + caps.size) {
+ info.argsz = sizeof(info) + caps.size;
+ } else {
+ vfio_info_cap_shift(&caps, sizeof(info));
+ if (copy_to_user((void __user *)arg +
+ sizeof(info), caps.buf,
+ caps.size)) {
+ kfree(caps.buf);
+ return -EFAULT;
  }
-
- kfree(caps.buf);
+ info.cap_offset = sizeof(info);
  }
 
- return copy_to_user((void __user *)arg, &info, minsz) ?
- -EFAULT : 0;
+ kfree(caps.buf);
+ }
 
- } else if (cmd == VFIO_IOMMU_MAP_DMA) {
- struct vfio_iommu_type1_dma_map map;
- uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
- VFIO_DMA_MAP_FLAG_WRITE;
+ return copy_to_user((void __user *)arg, &info, minsz) ?
+ -EFAULT : 0;
+}
 
- minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
+static int vfio_iommu_type1_map_dma(struct vfio_iommu *iommu,
+    unsigned long arg)
+{
+ struct vfio_iommu_type1_dma_map map;
+ unsigned long minsz;
+ uint32_t mask = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
 
- if (copy_from_user(&map, (void __user *)arg, minsz))
- return -EFAULT;
+ minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
- if (map.argsz < minsz || map.flags & ~mask)
- return -EINVAL;
+ if (copy_from_user(&map, (void __user *)arg, minsz))
+ return -EFAULT;
 
- return vfio_dma_do_map(iommu, &map);
+ if (map.argsz < minsz || map.flags & ~mask)
+ return -EINVAL;
 
- } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
- struct vfio_iommu_type1_dma_unmap unmap;
- struct vfio_bitmap bitmap = { 0 };
- int ret;
+ return vfio_dma_do_map(iommu, &map);
+}
 
- minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
+static int vfio_iommu_type1_unmap_dma(struct vfio_iommu *iommu,
+      unsigned long arg)
+{
+ struct vfio_iommu_type1_dma_unmap unmap;
+ struct vfio_bitmap bitmap = { 0 };
+ unsigned long minsz;
+ int ret;
 
- if (copy_from_user(&unmap, (void __user *)arg, minsz))
- return -EFAULT;
+ minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
- if (unmap.argsz < minsz ||
-    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
- return -EINVAL;
+ if (copy_from_user(&unmap, (void __user *)arg, minsz))
+ return -EFAULT;
 
- if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
- unsigned long pgshift;
+ if (unmap.argsz < minsz ||
+    unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
+ return -EINVAL;
 
- if (unmap.argsz < (minsz + sizeof(bitmap)))
- return -EINVAL;
+ if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
+ unsigned long pgshift;
 
- if (copy_from_user(&bitmap,
-   (void __user *)(arg + minsz),
-   sizeof(bitmap)))
- return -EFAULT;
+ if (unmap.argsz < (minsz + sizeof(bitmap)))
+ return -EINVAL;
 
- if (!access_ok((void __user *)bitmap.data, bitmap.size))
- return -EINVAL;
+ if (copy_from_user(&bitmap,
+   (void __user *)(arg + minsz),
+   sizeof(bitmap)))
+ return -EFAULT;
 
- pgshift = __ffs(bitmap.pgsize);
- ret = verify_bitmap_size(unmap.size >> pgshift,
- bitmap.size);
- if (ret)
- return ret;
- }
+ if (!access_ok((void __user *)bitmap.data, bitmap.size))
+ return -EINVAL;
 
- ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
+ pgshift = __ffs(bitmap.pgsize);
+ ret = verify_bitmap_size(unmap.size >> pgshift,
+ bitmap.size);
  if (ret)
  return ret;
+ }
+
+ ret = vfio_dma_do_unmap(iommu, &unmap, &bitmap);
+ if (ret)
+ return ret;
 
- return copy_to_user((void __user *)arg, &unmap, minsz) ?
+ return copy_to_user((void __user *)arg, &unmap, minsz) ?
  -EFAULT : 0;
- } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) {
- struct vfio_iommu_type1_dirty_bitmap dirty;
- uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
- VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
- VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
- int ret = 0;
+}
 
- if (!iommu->v2)
- return -EACCES;
+static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
+ unsigned long arg)
+{
+ struct vfio_iommu_type1_dirty_bitmap dirty;
+ uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
+ VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
+ VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
+ unsigned long minsz;
+ int ret = 0;
 
- minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap,
-    flags);
+ if (!iommu->v2)
+ return -EACCES;
 
- if (copy_from_user(&dirty, (void __user *)arg, minsz))
- return -EFAULT;
+ minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, flags);
 
- if (dirty.argsz < minsz || dirty.flags & ~mask)
- return -EINVAL;
+ if (copy_from_user(&dirty, (void __user *)arg, minsz))
+ return -EFAULT;
 
- /* only one flag should be set at a time */
- if (__ffs(dirty.flags) != __fls(dirty.flags))
- return -EINVAL;
+ if (dirty.argsz < minsz || dirty.flags & ~mask)
+ return -EINVAL;
 
- if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) {
- size_t pgsize;
+ /* only one flag should be set at a time */
+ if (__ffs(dirty.flags) != __fls(dirty.flags))
+ return -EINVAL;
 
- mutex_lock(&iommu->lock);
- pgsize = 1 << __ffs(iommu->pgsize_bitmap);
- if (!iommu->dirty_page_tracking) {
- ret = vfio_dma_bitmap_alloc_all(iommu, pgsize);
- if (!ret)
- iommu->dirty_page_tracking = true;
- }
- mutex_unlock(&iommu->lock);
- return ret;
- } else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) {
- mutex_lock(&iommu->lock);
- if (iommu->dirty_page_tracking) {
- iommu->dirty_page_tracking = false;
- vfio_dma_bitmap_free_all(iommu);
- }
- mutex_unlock(&iommu->lock);
- return 0;
- } else if (dirty.flags &
- VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
- struct vfio_iommu_type1_dirty_bitmap_get range;
- unsigned long pgshift;
- size_t data_size = dirty.argsz - minsz;
- size_t iommu_pgsize;
-
- if (!data_size || data_size < sizeof(range))
- return -EINVAL;
-
- if (copy_from_user(&range, (void __user *)(arg + minsz),
-   sizeof(range)))
- return -EFAULT;
+ if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) {
+ size_t pgsize;
 
- if (range.iova + range.size < range.iova)
- return -EINVAL;
- if (!access_ok((void __user *)range.bitmap.data,
-       range.bitmap.size))
- return -EINVAL;
+ mutex_lock(&iommu->lock);
+ pgsize = 1 << __ffs(iommu->pgsize_bitmap);
+ if (!iommu->dirty_page_tracking) {
+ ret = vfio_dma_bitmap_alloc_all(iommu, pgsize);
+ if (!ret)
+ iommu->dirty_page_tracking = true;
+ }
+ mutex_unlock(&iommu->lock);
+ return ret;
+ } else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) {
+ mutex_lock(&iommu->lock);
+ if (iommu->dirty_page_tracking) {
+ iommu->dirty_page_tracking = false;
+ vfio_dma_bitmap_free_all(iommu);
+ }
+ mutex_unlock(&iommu->lock);
+ return 0;
+ } else if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) {
+ struct vfio_iommu_type1_dirty_bitmap_get range;
+ unsigned long pgshift;
+ size_t data_size = dirty.argsz - minsz;
+ size_t iommu_pgsize;
 
- pgshift = __ffs(range.bitmap.pgsize);
- ret = verify_bitmap_size(range.size >> pgshift,
- range.bitmap.size);
- if (ret)
- return ret;
+ if (!data_size || data_size < sizeof(range))
+ return -EINVAL;
 
- mutex_lock(&iommu->lock);
+ if (copy_from_user(&range, (void __user *)(arg + minsz),
+   sizeof(range)))
+ return -EFAULT;
 
- iommu_pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap);
+ if (range.iova + range.size < range.iova)
+ return -EINVAL;
+ if (!access_ok((void __user *)range.bitmap.data,
+       range.bitmap.size))
+ return -EINVAL;
 
- /* allow only smallest supported pgsize */
- if (range.bitmap.pgsize != iommu_pgsize) {
- ret = -EINVAL;
- goto out_unlock;
- }
- if (range.iova & (iommu_pgsize - 1)) {
- ret = -EINVAL;
- goto out_unlock;
- }
- if (!range.size || range.size & (iommu_pgsize - 1)) {
- ret = -EINVAL;
- goto out_unlock;
- }
+ pgshift = __ffs(range.bitmap.pgsize);
+ ret = verify_bitmap_size(range.size >> pgshift,
+ range.bitmap.size);
+ if (ret)
+ return ret;
 
- if (iommu->dirty_page_tracking)
- ret = vfio_iova_dirty_bitmap(range.bitmap.data,
- iommu, range.iova, range.size,
- range.bitmap.pgsize);
- else
- ret = -EINVAL;
-out_unlock:
- mutex_unlock(&iommu->lock);
+ mutex_lock(&iommu->lock);
 
- return ret;
+ iommu_pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap);
+
+ /* allow only smallest supported pgsize */
+ if (range.bitmap.pgsize != iommu_pgsize) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+ if (range.iova & (iommu_pgsize - 1)) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+ if (!range.size || range.size & (iommu_pgsize - 1)) {
+ ret = -EINVAL;
+ goto out_unlock;
  }
+
+ if (iommu->dirty_page_tracking)
+ ret = vfio_iova_dirty_bitmap(range.bitmap.data,
+     iommu, range.iova,
+     range.size,
+     range.bitmap.pgsize);
+ else
+ ret = -EINVAL;
+out_unlock:
+ mutex_unlock(&iommu->lock);
+
+ return ret;
  }
 
- return -ENOTTY;
+ return -EINVAL;
+}
+
+static long vfio_iommu_type1_ioctl(void *iommu_data,
+   unsigned int cmd, unsigned long arg)
+{
+ struct vfio_iommu *iommu = iommu_data;
+
+ switch (cmd) {
+ case VFIO_CHECK_EXTENSION:
+ return vfio_iommu_type1_check_extension(iommu, arg);
+ case VFIO_IOMMU_GET_INFO:
+ return vfio_iommu_type1_get_info(iommu, arg);
+ case VFIO_IOMMU_MAP_DMA:
+ return vfio_iommu_type1_map_dma(iommu, arg);
+ case VFIO_IOMMU_UNMAP_DMA:
+ return vfio_iommu_type1_unmap_dma(iommu, arg);
+ case VFIO_IOMMU_DIRTY_PAGES:
+ return vfio_iommu_type1_dirty_pages(iommu, arg);
+ default:
+ return -ENOTTY;
+ }
 }
 
 static int vfio_iommu_type1_register_notifier(void *iommu_data,
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][G][PATCH 2/2] vfio iommu: Add dma available capability

frank.heimes
In reply to this post by frank.heimes
From: Matthew Rosato <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1907421

Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container")
added the ability to limit the number of memory backed DMA mappings.
However on s390x, when lazy mapping is in use, we use a very large
number of concurrent mappings.  Let's provide the current allowable
number of DMA mappings to userspace via the IOMMU info chain so that
userspace can take appropriate mitigation.

Signed-off-by: Matthew Rosato <[hidden email]>
Reviewed-by: Cornelia Huck <[hidden email]>
Signed-off-by: Alex Williamson <[hidden email]>
(cherry picked from commit 7d6e1329652ed971d1b6e0e7bea66fba5044e271)
Signed-off-by: Frank Heimes <[hidden email]>
---
 drivers/vfio/vfio_iommu_type1.c | 17 +++++++++++++++++
 include/uapi/linux/vfio.h       | 15 +++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 88efc85c6b49..f8a30d3eccb3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2608,6 +2608,20 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
  return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
 }
 
+static int vfio_iommu_dma_avail_build_caps(struct vfio_iommu *iommu,
+   struct vfio_info_cap *caps)
+{
+ struct vfio_iommu_type1_info_dma_avail cap_dma_avail;
+
+ cap_dma_avail.header.id = VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL;
+ cap_dma_avail.header.version = 1;
+
+ cap_dma_avail.avail = iommu->dma_avail;
+
+ return vfio_info_add_capability(caps, &cap_dma_avail.header,
+ sizeof(cap_dma_avail));
+}
+
 static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
      unsigned long arg)
 {
@@ -2640,6 +2654,9 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
 
  ret = vfio_iommu_migration_build_caps(iommu, &caps);
 
+ if (!ret)
+ ret = vfio_iommu_dma_avail_build_caps(iommu, &caps);
+
  if (!ret)
  ret = vfio_iommu_iova_build_caps(iommu, &caps);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 920470502329..3891e03d3af0 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1039,6 +1039,21 @@ struct vfio_iommu_type1_info_cap_migration {
  __u64 max_dirty_bitmap_size; /* in bytes */
 };
 
+/*
+ * The DMA available capability allows to report the current number of
+ * simultaneously outstanding DMA mappings that are allowed.
+ *
+ * The structure below defines version 1 of this capability.
+ *
+ * avail: specifies the current number of outstanding DMA mappings allowed.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL 3
+
+struct vfio_iommu_type1_info_dma_avail {
+ struct vfio_info_cap_header header;
+ __u32 avail;
+};
+
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
 
 /**
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][G][PATCH 0/2] vfio: pass DMA availability information to userspace (LP: 1907421)

William Breathitt Gray
In reply to this post by frank.heimes
On Wed, Jan 20, 2021 at 09:04:36PM +0100, [hidden email] wrote:

> BugLink: https://bugs.launchpad.net/bugs/1907421
>
> SRU Justification:
>
> [Impact]
>
> * In case a vfio-pci device on s390x is under I/O load, vfio-pci device may end up in error state.
>
> * The commit 492855939bdb added a limit to the number of concurrent DMA requests for a vfio container.
>
> * However, lazy unmapping in s390x can in fact cause quite a large number of outstanding DMA requests
>   to build up prior to being purged - potentially the entire guest DMA space.
>
> * This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed: No space left on device'.
>
> * The solution requires a change to both kernel and qemu.
>
> * The kernel side of things is addressed by this SRU.
>
> * The fix adds the ability to provide the number of allowable DMA requests via VFIO_IOMMU_GET_INFO ioctl.
>
> * The actual fix comes with commit 7d6e1329652e, but another fix ccd59dce1a21 is needed to get it cleanly applied.
>
> [Fix]
>
> * ccd59dce1a21f473518bf273bdf5b182bab955b3 ccd59dce1a21 "vfio/type1: Refactor vfio_iommu_type1_ioctl()"
>
> * 7d6e1329652ed971d1b6e0e7bea66fba5044e271 7d6e1329652e "vfio iommu: Add dma available capability"
>
> [Test Case]
>
> * IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.
>
> * PCIe adapters in place that provide vfio, like RoCE Express 2.
>
> * A KVM host needs to be setup and a KVM guest (use again 20.10) that uses vfio.
>
> * Generate I/O that flows through the vf and watch out for error like 'VFIO_MAP_DMA failed: No space left on device' in the log.
>
> [Regression Potential]
>
> * The first patch ccd59dce1a21 modifies the common code file drivers/vfio/vfio_iommu_type1.c quity significantly.
>
> * But the reason is not that it introduces a lot of new things, it's a refactoring patch.
>
> * Nevertheless if done in a bad way it can significantly harm the IO memory management of virtual function adapters.
>
> * In worst case it may break them entirely, instead of 'just' exeeding the entire DMA space.
>
> * Things could also go wrong while doing the mapping and unmapping of DMA, that may even have an impact beyond vf adapters - harming other DMA devices.
>
> * The handling of dirty pages is also touched and the ioctl itself - which is important to keep the control of the devices.
>
> * But as said before, it re-factoring work, it's upstream accepted since 5.9 and the provenance shows that many engineers had an eye on these changes.
>
> * The second patch 7d6e1329652e - that inclides the needed fix - comes with far less modifications.
>
> * It also tounches drivers/vfio/vfio_iommu_type1.c but in a way that it mainly
>   adds one new function (vfio_iommu_dma_avail_build_caps) and an add. if statement in vfio_iommu_type1_get_info.
>
> * That allows to provide the current allowable number of DMA mappings to the userspace via the IOMMU info chain,
>   so that the userspace can take appropriate mitigation.
>
> * Potential problems can be that the current allowable number of DMA mappings are wrong
>   and in best case just mappings are wasted
>   and in worst case there are more reported than available in reality, which could have a severe impact.
>
> * What happens in such a case is a bit depending on the userspace.
>
> * This patch got upstream accepted with kernel 5.10 and pre-tested by IBM.
>
> * In addition a PPA was created with a patched groovy kernel that was shared for further testing.
>
> [Other]
>
> * The patch got upstream accepted with kernel v5.10, hence it will land in Hirsute once the target kernel 5.10 is in.
>
> * For 5.4 this will come as upstream stable update 5.4.90/91.
>
> * For 5.8 (and 5.9) upstream stable support already ended, hence this SRU is only needed for groovy.
>
> Liu Yi L (1):
>   vfio/type1: Refactor vfio_iommu_type1_ioctl()
>
> Matthew Rosato (1):
>   vfio iommu: Add dma available capability
>
>  drivers/vfio/vfio_iommu_type1.c | 409 ++++++++++++++++++--------------
>  include/uapi/linux/vfio.h       |  15 ++
>  2 files changed, 244 insertions(+), 180 deletions(-)
>
> --
> 2.25.1
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Acked-by: William Breathitt Gray <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][G][PATCH 0/2] vfio: pass DMA availability information to userspace (LP: 1907421)

Ian May
In reply to this post by frank.heimes
Patches look good, apply clean and build tested

Acked-by: Ian May <[hidden email]>

On 2021-01-20 21:04:36 , [hidden email] wrote:

> BugLink: https://bugs.launchpad.net/bugs/1907421
>
> SRU Justification:
>
> [Impact]
>
> * In case a vfio-pci device on s390x is under I/O load, vfio-pci device may end up in error state.
>
> * The commit 492855939bdb added a limit to the number of concurrent DMA requests for a vfio container.
>
> * However, lazy unmapping in s390x can in fact cause quite a large number of outstanding DMA requests
>   to build up prior to being purged - potentially the entire guest DMA space.
>
> * This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed: No space left on device'.
>
> * The solution requires a change to both kernel and qemu.
>
> * The kernel side of things is addressed by this SRU.
>
> * The fix adds the ability to provide the number of allowable DMA requests via VFIO_IOMMU_GET_INFO ioctl.
>
> * The actual fix comes with commit 7d6e1329652e, but another fix ccd59dce1a21 is needed to get it cleanly applied.
>
> [Fix]
>
> * ccd59dce1a21f473518bf273bdf5b182bab955b3 ccd59dce1a21 "vfio/type1: Refactor vfio_iommu_type1_ioctl()"
>
> * 7d6e1329652ed971d1b6e0e7bea66fba5044e271 7d6e1329652e "vfio iommu: Add dma available capability"
>
> [Test Case]
>
> * IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.
>
> * PCIe adapters in place that provide vfio, like RoCE Express 2.
>
> * A KVM host needs to be setup and a KVM guest (use again 20.10) that uses vfio.
>
> * Generate I/O that flows through the vf and watch out for error like 'VFIO_MAP_DMA failed: No space left on device' in the log.
>
> [Regression Potential]
>
> * The first patch ccd59dce1a21 modifies the common code file drivers/vfio/vfio_iommu_type1.c quity significantly.
>
> * But the reason is not that it introduces a lot of new things, it's a refactoring patch.
>
> * Nevertheless if done in a bad way it can significantly harm the IO memory management of virtual function adapters.
>
> * In worst case it may break them entirely, instead of 'just' exeeding the entire DMA space.
>
> * Things could also go wrong while doing the mapping and unmapping of DMA, that may even have an impact beyond vf adapters - harming other DMA devices.
>
> * The handling of dirty pages is also touched and the ioctl itself - which is important to keep the control of the devices.
>
> * But as said before, it re-factoring work, it's upstream accepted since 5.9 and the provenance shows that many engineers had an eye on these changes.
>
> * The second patch 7d6e1329652e - that inclides the needed fix - comes with far less modifications.
>
> * It also tounches drivers/vfio/vfio_iommu_type1.c but in a way that it mainly
>   adds one new function (vfio_iommu_dma_avail_build_caps) and an add. if statement in vfio_iommu_type1_get_info.
>
> * That allows to provide the current allowable number of DMA mappings to the userspace via the IOMMU info chain,
>   so that the userspace can take appropriate mitigation.
>
> * Potential problems can be that the current allowable number of DMA mappings are wrong
>   and in best case just mappings are wasted
>   and in worst case there are more reported than available in reality, which could have a severe impact.
>
> * What happens in such a case is a bit depending on the userspace.
>
> * This patch got upstream accepted with kernel 5.10 and pre-tested by IBM.
>
> * In addition a PPA was created with a patched groovy kernel that was shared for further testing.
>
> [Other]
>
> * The patch got upstream accepted with kernel v5.10, hence it will land in Hirsute once the target kernel 5.10 is in.
>
> * For 5.4 this will come as upstream stable update 5.4.90/91.
>
> * For 5.8 (and 5.9) upstream stable support already ended, hence this SRU is only needed for groovy.
>
> Liu Yi L (1):
>   vfio/type1: Refactor vfio_iommu_type1_ioctl()
>
> Matthew Rosato (1):
>   vfio iommu: Add dma available capability
>
>  drivers/vfio/vfio_iommu_type1.c | 409 ++++++++++++++++++--------------
>  include/uapi/linux/vfio.h       |  15 ++
>  2 files changed, 244 insertions(+), 180 deletions(-)
>
> --
> 2.25.1
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][G][PATCH 0/2] vfio: pass DMA availability information to userspace (LP: 1907421)

Kelsey Skunberg
In reply to this post by frank.heimes
Applied to Groovy/master-next. thank you!

-Kelsey

On 2021-01-20 21:04:36 , [hidden email] wrote:

> BugLink: https://bugs.launchpad.net/bugs/1907421
>
> SRU Justification:
>
> [Impact]
>
> * In case a vfio-pci device on s390x is under I/O load, vfio-pci device may end up in error state.
>
> * The commit 492855939bdb added a limit to the number of concurrent DMA requests for a vfio container.
>
> * However, lazy unmapping in s390x can in fact cause quite a large number of outstanding DMA requests
>   to build up prior to being purged - potentially the entire guest DMA space.
>
> * This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed: No space left on device'.
>
> * The solution requires a change to both kernel and qemu.
>
> * The kernel side of things is addressed by this SRU.
>
> * The fix adds the ability to provide the number of allowable DMA requests via VFIO_IOMMU_GET_INFO ioctl.
>
> * The actual fix comes with commit 7d6e1329652e, but another fix ccd59dce1a21 is needed to get it cleanly applied.
>
> [Fix]
>
> * ccd59dce1a21f473518bf273bdf5b182bab955b3 ccd59dce1a21 "vfio/type1: Refactor vfio_iommu_type1_ioctl()"
>
> * 7d6e1329652ed971d1b6e0e7bea66fba5044e271 7d6e1329652e "vfio iommu: Add dma available capability"
>
> [Test Case]
>
> * IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.
>
> * PCIe adapters in place that provide vfio, like RoCE Express 2.
>
> * A KVM host needs to be setup and a KVM guest (use again 20.10) that uses vfio.
>
> * Generate I/O that flows through the vf and watch out for error like 'VFIO_MAP_DMA failed: No space left on device' in the log.
>
> [Regression Potential]
>
> * The first patch ccd59dce1a21 modifies the common code file drivers/vfio/vfio_iommu_type1.c quity significantly.
>
> * But the reason is not that it introduces a lot of new things, it's a refactoring patch.
>
> * Nevertheless if done in a bad way it can significantly harm the IO memory management of virtual function adapters.
>
> * In worst case it may break them entirely, instead of 'just' exeeding the entire DMA space.
>
> * Things could also go wrong while doing the mapping and unmapping of DMA, that may even have an impact beyond vf adapters - harming other DMA devices.
>
> * The handling of dirty pages is also touched and the ioctl itself - which is important to keep the control of the devices.
>
> * But as said before, it re-factoring work, it's upstream accepted since 5.9 and the provenance shows that many engineers had an eye on these changes.
>
> * The second patch 7d6e1329652e - that inclides the needed fix - comes with far less modifications.
>
> * It also tounches drivers/vfio/vfio_iommu_type1.c but in a way that it mainly
>   adds one new function (vfio_iommu_dma_avail_build_caps) and an add. if statement in vfio_iommu_type1_get_info.
>
> * That allows to provide the current allowable number of DMA mappings to the userspace via the IOMMU info chain,
>   so that the userspace can take appropriate mitigation.
>
> * Potential problems can be that the current allowable number of DMA mappings are wrong
>   and in best case just mappings are wasted
>   and in worst case there are more reported than available in reality, which could have a severe impact.
>
> * What happens in such a case is a bit depending on the userspace.
>
> * This patch got upstream accepted with kernel 5.10 and pre-tested by IBM.
>
> * In addition a PPA was created with a patched groovy kernel that was shared for further testing.
>
> [Other]
>
> * The patch got upstream accepted with kernel v5.10, hence it will land in Hirsute once the target kernel 5.10 is in.
>
> * For 5.4 this will come as upstream stable update 5.4.90/91.
>
> * For 5.8 (and 5.9) upstream stable support already ended, hence this SRU is only needed for groovy.
>
> Liu Yi L (1):
>   vfio/type1: Refactor vfio_iommu_type1_ioctl()
>
> Matthew Rosato (1):
>   vfio iommu: Add dma available capability
>
>  drivers/vfio/vfio_iommu_type1.c | 409 ++++++++++++++++++--------------
>  include/uapi/linux/vfio.h       |  15 ++
>  2 files changed, 244 insertions(+), 180 deletions(-)
>
> --
> 2.25.1
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team