[SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

Daniel Axtens
In current Ubuntu kernels, PV blkfront drivers have blk-mq enabled by
default and cannot use the old I/O scheduler.

[Impact]
blk-mq is not as fast as the old request-based scheduler for some
workloads on HDD disks.

[Fix]
Amazon Linux has a commit which reintroduces the request-based
mode. It disables blk-mq by default but allows it to be switched back
on with a kernel parameter.

For X this needs a small patch from upstream for error handling.

For B/C this patchset is bigger as it includes the suspend/resume
patches already in X, and a new fixup. These are desirable as the
request mode patch assumes their presence.

[Regression Potential]
Could potentially break xen based disks on AWS.

For B/C, the patches also add some code to the xen core around suspend
and resume, this code is much smaller and also mirrors code already in
Xenial.

[Tests]
Tested by AWS for Xenial, and their kernel engineers vetted the Xenial
patches. I tested the Bionic and Cosmic patchsets with fio, the system
appears stable and the IOPS promised for EBS Provisioned IOPS disks
were met in my testing. I did an apt update/upgrade and everything
worked (no hash-sum mismatches).

Anchal Agarwal (1):
  xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq

Munehisa Kamata (5):
  xen/manage: keep track of the on-going suspend mode
  xen/manage: introduce helper function to know the on-going suspend
    mode
  xenbus: add freeze/thaw/restore callbacks support
  xen-blkfront: add callbacks for PM suspend and hibernation
  xen-blkfront: resurrect request-based mode

 drivers/block/xen-blkfront.c      | 475 ++++++++++++++++++++++++++----
 drivers/xen/manage.c              |  73 +++++
 drivers/xen/xenbus/xenbus_probe.c | 102 ++++++-
 include/xen/xen-ops.h             |   4 +
 include/xen/xenbus.h              |   3 +
 5 files changed, 576 insertions(+), 81 deletions(-)

--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU cosmic-aws][PATCH 1/6] xen/manage: keep track of the on-going suspend mode

Daniel Axtens
From: Munehisa Kamata <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1801305

To differentiate between Xen suspend, PM suspend and PM hibernation,
keep track of the on-going suspend mode by mainly using a new PM
notifier. Since Xen suspend doesn't have corresponding PM event, its
main logic is modfied to acquire pm_mutex and set the current mode.

Note that we may see deadlock if PM suspend/hibernation is interrupted
by Xen suspend. PM suspend/hibernation depends on xenwatch thread to
process xenbus state transactions, but the thread will sleep to wait
pm_mutex which is already held by PM suspend/hibernation context in the
scenario. Though, acquirng pm_mutex is still right thing to do, and we
would need to modify Xen shutdown code to avoid the issue. This will be
fixed by a separate patch.

Signed-off-by: Munehisa Kamata <[hidden email]>
Signed-off-by: Anchal Agarwal <[hidden email]>
Reviewed-by: Sebastian Biemueller <[hidden email]>
Reviewed-by: Munehisa Kamata <[hidden email]>
Reviewed-by: Eduardo Valentin <[hidden email]>
CR: https://cr.amazon.com/r/8273194/
(cherry-picked from
 0013-xen-manage-keep-track-of-the-on-going-suspend-mode.patch
 in AWS 4.14 kernel SRPM)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 drivers/xen/manage.c | 58 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index c93d8ef8df34..efdcb6fe34b4 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -13,6 +13,7 @@
 #include <linux/freezer.h>
 #include <linux/syscore_ops.h>
 #include <linux/export.h>
+#include <linux/suspend.h>
 
 #include <xen/xen.h>
 #include <xen/xenbus.h>
@@ -39,6 +40,16 @@ enum shutdown_state {
 /* Ignore multiple shutdown requests. */
 static enum shutdown_state shutting_down = SHUTDOWN_INVALID;
 
+enum suspend_modes {
+        NO_SUSPEND = 0,
+        XEN_SUSPEND,
+        PM_SUSPEND,
+        PM_HIBERNATION,
+};
+
+/* Protected by pm_mutex */
+static enum suspend_modes suspend_mode = NO_SUSPEND;
+
 struct suspend_info {
  int cancelled;
 };
@@ -98,6 +109,10 @@ static void do_suspend(void)
  int err;
  struct suspend_info si;
 
+ lock_system_sleep();
+
+ suspend_mode = XEN_SUSPEND;
+
  shutting_down = SHUTDOWN_SUSPEND;
 
  err = freeze_processes();
@@ -161,6 +176,10 @@ static void do_suspend(void)
  thaw_processes();
 out:
  shutting_down = SHUTDOWN_INVALID;
+
+ suspend_mode = NO_SUSPEND;
+
+ unlock_system_sleep();
 }
 #endif /* CONFIG_HIBERNATE_CALLBACKS */
 
@@ -384,3 +403,42 @@ int xen_setup_shutdown_event(void)
 EXPORT_SYMBOL_GPL(xen_setup_shutdown_event);
 
 subsys_initcall(xen_setup_shutdown_event);
+
+static int xen_pm_notifier(struct notifier_block *notifier,
+   unsigned long pm_event, void *unused)
+{
+ switch (pm_event) {
+ case PM_SUSPEND_PREPARE:
+ suspend_mode = PM_SUSPEND;
+ break;
+ case PM_HIBERNATION_PREPARE:
+ case PM_RESTORE_PREPARE:
+ suspend_mode = PM_HIBERNATION;
+ break;
+ case PM_POST_SUSPEND:
+ case PM_POST_RESTORE:
+ case PM_POST_HIBERNATION:
+ /* Set back to the default */
+ suspend_mode = NO_SUSPEND;
+ break;
+ default:
+ pr_warn("Receive unknown PM event 0x%lx\n", pm_event);
+ return -EINVAL;
+ }
+
+ return 0;
+};
+
+static struct notifier_block xen_pm_notifier_block = {
+ .notifier_call = xen_pm_notifier
+};
+
+static int xen_setup_pm_notifier(void)
+{
+ if (!xen_hvm_domain())
+ return -ENODEV;
+
+ return register_pm_notifier(&xen_pm_notifier_block);
+}
+
+subsys_initcall(xen_setup_pm_notifier);
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU cosmic-aws][PATCH 2/6] xen/manage: introduce helper function to know the on-going suspend mode

Daniel Axtens
In reply to this post by Daniel Axtens
From: Munehisa Kamata <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1801305

Introduce simple functions which help to know the on-going suspend mode
so that other Xen-related code can behave differently according to the
current suspend mode.

Signed-off-by: Munehisa Kamata <[hidden email]>
Signed-off-by: Anchal Agarwal <[hidden email]>
Reviewed-by: Alakesh Haloi <[hidden email]>
Reviewed-by: Sebastian Biemueller <[hidden email]>
Reviewed-by: Munehisa Kamata <[hidden email]>
Reviewed-by: Eduardo Valentin <[hidden email]>
CR: https://cr.amazon.com/r/8273190/
(cherry-picked from
 0014-xen-manage-introduce-helper-function-to-know-the-on-.patch
 in AWS 4.14 kernel SRPM)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 drivers/xen/manage.c  | 15 +++++++++++++++
 include/xen/xen-ops.h |  4 ++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index efdcb6fe34b4..5562aba04e16 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -50,6 +50,21 @@ enum suspend_modes {
 /* Protected by pm_mutex */
 static enum suspend_modes suspend_mode = NO_SUSPEND;
 
+bool xen_suspend_mode_is_xen_suspend(void)
+{
+ return suspend_mode == XEN_SUSPEND;
+}
+
+bool xen_suspend_mode_is_pm_suspend(void)
+{
+ return suspend_mode == PM_SUSPEND;
+}
+
+bool xen_suspend_mode_is_pm_hibernation(void)
+{
+ return suspend_mode == PM_HIBERNATION;
+}
+
 struct suspend_info {
  int cancelled;
 };
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index fd18c974a619..6ca7cbf2c5b4 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -39,6 +39,10 @@ u64 xen_steal_clock(int cpu);
 
 int xen_setup_shutdown_event(void);
 
+bool xen_suspend_mode_is_xen_suspend(void);
+bool xen_suspend_mode_is_pm_suspend(void);
+bool xen_suspend_mode_is_pm_hibernation(void);
+
 extern unsigned long *xen_contiguous_bitmap;
 
 #ifdef CONFIG_XEN_PV
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU cosmic-aws][PATCH 3/6] xenbus: add freeze/thaw/restore callbacks support

Daniel Axtens
In reply to this post by Daniel Axtens
From: Munehisa Kamata <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1801305

Since commit b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for
suspend/resume/chkpt"), xenbus uses PMSG_FREEZE, PMSG_THAW and
PMSG_RESTORE events for Xen suspend. However, they're actually assigned
to xenbus_dev_suspend(), xenbus_dev_cancel() and xenbus_dev_resume()
respectively, and only suspend and resume callbacks are supported at
driver level. To support PM suspend and PM hibernation, modify the bus
level PM callbacks to invoke not only device driver's suspend/resume but
also freeze/thaw/restore.

Note that we'll use freeze/restore callbacks even for PM suspend whereas
suspend/resume callbacks are normally used in the case, becausae the
existing xenbus device drivers already have suspend/resume callbacks
specifically designed for Xen suspend. So we can allow the device
drivers to keep the existing callbacks wihtout modification.

Signed-off-by: Munehisa Kamata <[hidden email]>
Signed-off-by: Anchal Agarwal <[hidden email]>
Reviewed-by: Munehisa Kamata <[hidden email]>
Reviewed-by: Eduardo Valentin <[hidden email]>
CR: https://cr.amazon.com/r/8273200/
(cherry-picked from
 0015-xenbus-add-freeze-thaw-restore-callbacks-support.patch
 in AWS 4.14 kernel SRPM)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 drivers/xen/xenbus/xenbus_probe.c | 102 +++++++++++++++++++++++++-----
 include/xen/xenbus.h              |   3 +
 2 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index f2088838f690..809dc380e184 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -49,6 +49,7 @@
 #include <linux/io.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/suspend.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -588,26 +589,47 @@ int xenbus_dev_suspend(struct device *dev)
  struct xenbus_driver *drv;
  struct xenbus_device *xdev
  = container_of(dev, struct xenbus_device, dev);
+ int (*cb)(struct xenbus_device *) = NULL;
+ bool xen_suspend = xen_suspend_mode_is_xen_suspend();
 
  DPRINTK("%s", xdev->nodename);
 
  if (dev->driver == NULL)
  return 0;
  drv = to_xenbus_driver(dev->driver);
- if (drv->suspend)
- err = drv->suspend(xdev);
- if (err)
- pr_warn("suspend %s failed: %i\n", dev_name(dev), err);
+
+ if (xen_suspend)
+ cb = drv->suspend;
+ else
+ cb = drv->freeze;
+
+ if (cb)
+ err = cb(xdev);
+
+ if (err) {
+ pr_warn("%s %s failed: %i\n", xen_suspend ?
+ "suspend" : "freeze", dev_name(dev), err);
+ return err;
+ }
+
+ if (!xen_suspend) {
+ /* Forget otherend since this can become stale after restore */
+ free_otherend_watch(xdev);
+ free_otherend_details(xdev);
+ }
+
  return 0;
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_suspend);
 
 int xenbus_dev_resume(struct device *dev)
 {
- int err;
+ int err = 0;
  struct xenbus_driver *drv;
  struct xenbus_device *xdev
  = container_of(dev, struct xenbus_device, dev);
+ int (*cb)(struct xenbus_device *) = NULL;
+ bool xen_suspend = xen_suspend_mode_is_xen_suspend();
 
  DPRINTK("%s", xdev->nodename);
 
@@ -616,24 +638,34 @@ int xenbus_dev_resume(struct device *dev)
  drv = to_xenbus_driver(dev->driver);
  err = talk_to_otherend(xdev);
  if (err) {
- pr_warn("resume (talk_to_otherend) %s failed: %i\n",
+ pr_warn("%s (talk_to_otherend) %s failed: %i\n",
+ xen_suspend ? "resume" : "restore",
  dev_name(dev), err);
  return err;
  }
 
- xdev->state = XenbusStateInitialising;
+ if (xen_suspend)
+ xdev->state = XenbusStateInitialising;
 
- if (drv->resume) {
- err = drv->resume(xdev);
- if (err) {
- pr_warn("resume %s failed: %i\n", dev_name(dev), err);
- return err;
- }
+ if (xen_suspend)
+ cb = drv->resume;
+ else
+ cb = drv->restore;
+
+ if (cb)
+ err = cb(xdev);
+
+ if (err) {
+ pr_warn("%s %s failed: %i\n",
+ xen_suspend ? "resume" : "restore",
+ dev_name(dev), err);
+ return err;
  }
 
  err = watch_otherend(xdev);
  if (err) {
- pr_warn("resume (watch_otherend) %s failed: %d.\n",
+ pr_warn("%s (watch_otherend) %s failed: %d.\n",
+ xen_suspend ? "resume" : "restore",
  dev_name(dev), err);
  return err;
  }
@@ -644,8 +676,46 @@ EXPORT_SYMBOL_GPL(xenbus_dev_resume);
 
 int xenbus_dev_cancel(struct device *dev)
 {
- /* Do nothing */
- DPRINTK("cancel");
+ int err = 0;
+ struct xenbus_driver *drv;
+ struct xenbus_device *xdev
+ = container_of(dev, struct xenbus_device, dev);
+ bool xen_suspend = xen_suspend_mode_is_xen_suspend();
+
+ if (xen_suspend) {
+ /* Do nothing */
+ DPRINTK("cancel");
+ return 0;
+ }
+
+ DPRINTK("%s", xdev->nodename);
+
+ if (dev->driver == NULL)
+ return 0;
+ drv = to_xenbus_driver(dev->driver);
+
+ err = talk_to_otherend(xdev);
+ if (err) {
+ pr_warn("thaw (talk_to_otherend) %s failed: %d.\n",
+ dev_name(dev), err);
+ return err;
+ }
+
+ if (drv->thaw) {
+ err = drv->thaw(xdev);
+ if (err) {
+ pr_warn("thaw %s failed: %i\n", dev_name(dev), err);
+ return err;
+ }
+ }
+
+ err = watch_otherend(xdev);
+ if (err) {
+ pr_warn("thaw (watch_otherend) %s failed: %d.\n",
+ dev_name(dev), err);
+ return err;
+ }
+
  return 0;
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_cancel);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..20261d5f4e78 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -100,6 +100,9 @@ struct xenbus_driver {
  int (*remove)(struct xenbus_device *dev);
  int (*suspend)(struct xenbus_device *dev);
  int (*resume)(struct xenbus_device *dev);
+ int (*freeze)(struct xenbus_device *dev);
+ int (*thaw)(struct xenbus_device *dev);
+ int (*restore)(struct xenbus_device *dev);
  int (*uevent)(struct xenbus_device *, struct kobj_uevent_env *);
  struct device_driver driver;
  int (*read_otherend_details)(struct xenbus_device *dev);
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU cosmic-aws][PATCH 4/6] xen-blkfront: add callbacks for PM suspend and hibernation

Daniel Axtens
In reply to this post by Daniel Axtens
From: Munehisa Kamata <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1801305

Add freeze and restore callbacks for PM suspend and hibernation support.
The freeze handler stops a block-layer queue and disconnect the frontend
from the backend while freeing ring_info and associated resources. The
restore handler re-allocates ring_info and re-connect to the backedend,
so the rest of the kernel can continue to use the block device
transparently.Also, the handlers are used for both PM
suspend and hibernation so that we can keep the existing suspend/resume
callbacks for Xen suspend without modification.
If a backend doesn't have commit 12ea729645ac ("xen/blkback: unmap all
persistent grants when frontend gets disconnected"), the frontend may see
massive amount of grant table warning when freeing resources.

 [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
 [   36.855089] xen:grant_table: WARNING: g.e. 0x112 still in use!

In this case, persistent grants would need to be disabled.

Ensure no reqs/rsps in rings before disconnecting. When disconnecting
the frontend from the backend in blkfront_freeze(), there still may be
unconsumed requests or responses in the rings, especially when the
backend is backed by network-based device. If the frontend gets
disconnected with such reqs/rsps remaining there, it can cause
grant warnings and/or losing reqs/rsps by freeing pages afterward.
This can lead resumed kernel into unrecoverable state like unexpected
freeing of grant page and/or hung task due to the lost reqs or rsps.
Therefore we have to ensure that there is no unconsumed requests or
responses before disconnecting.

Actually, the frontend just needs to wait for some amount of time so that
the backend can process the requests, put responses and notify the
frontend back. Timeout used here is based on some heuristic. If we somehow
hit the timeout, it would mean something serious happens in the backend,
the frontend will just return an error to PM core and PM suspend/hibernation
will be aborted. This may be something should be fixed by the backend side,
but a frontend side fix is probably still worth doing to work with
broader backends.

Backport Note:
Unlike 4.9 kernel, blk-mq is default for 4.14 kernel and request-based
mode cod eis not included in this frontend driver.

Signed-off-by: Munehisa Kamata <[hidden email]>
Signed-off-by: Anchal Agarwal <[hidden email]>
Reviewed-by: Munehisa Kamata <[hidden email]>
Reviewed-by: Eduardo Valentin <[hidden email]>
CR: https://cr.amazon.com/r/8297625/
(cherry-picked from
 0018-xen-blkfront-add-callbacks-for-PM-suspend-and-hibern.patch
 in AWS 4.14 kernel SRPM)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 drivers/block/xen-blkfront.c | 164 +++++++++++++++++++++++++++++++++--
 1 file changed, 156 insertions(+), 8 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b5cedccb5d7d..f6367d03ba63 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -46,6 +46,8 @@
 #include <linux/scatterlist.h>
 #include <linux/bitmap.h>
 #include <linux/list.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
 
 #include <xen/xen.h>
 #include <xen/xenbus.h>
@@ -78,6 +80,8 @@ enum blkif_state {
  BLKIF_STATE_DISCONNECTED,
  BLKIF_STATE_CONNECTED,
  BLKIF_STATE_SUSPENDED,
+ BLKIF_STATE_FREEZING,
+ BLKIF_STATE_FROZEN
 };
 
 struct grant {
@@ -216,6 +220,7 @@ struct blkfront_info
  /* Save uncomplete reqs and bios for migration. */
  struct list_head requests;
  struct bio_list bio_list;
+ struct completion wait_backend_disconnected;
 };
 
 static unsigned int nr_minors;
@@ -262,6 +267,16 @@ static DEFINE_SPINLOCK(minor_lock);
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
 static void blkfront_gather_backend_features(struct blkfront_info *info);
 static int negotiate_mq(struct blkfront_info *info);
+static void __blkif_free(struct blkfront_info *info);
+
+static inline bool blkfront_ring_is_busy(struct blkif_front_ring *ring)
+{
+ if (RING_SIZE(ring) > RING_FREE_REQUESTS(ring) ||
+    RING_HAS_UNCONSUMED_RESPONSES(ring))
+ return true;
+ else
+ return false;
+}
 
 static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
@@ -996,6 +1011,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
  info->sector_size = sector_size;
  info->physical_sector_size = physical_sector_size;
  blkif_set_queue_limits(info);
+ init_completion(&info->wait_backend_disconnected);
 
  return 0;
 }
@@ -1219,6 +1235,8 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
 /* Already hold rinfo->ring_lock. */
 static inline void kick_pending_request_queues_locked(struct blkfront_ring_info *rinfo)
 {
+ if (unlikely(rinfo->dev_info->connected == BLKIF_STATE_FREEZING))
+                return;
  if (!RING_FULL(&rinfo->ring))
  blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true);
 }
@@ -1342,8 +1360,6 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 
 static void blkif_free(struct blkfront_info *info, int suspend)
 {
- unsigned int i;
-
  /* Prevent new requests being issued until we fix things up. */
  info->connected = suspend ?
  BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
@@ -1351,6 +1367,13 @@ static void blkif_free(struct blkfront_info *info, int suspend)
  if (info->rq)
  blk_mq_stop_hw_queues(info->rq);
 
+ __blkif_free(info);
+}
+
+static void __blkif_free(struct blkfront_info *info)
+{
+ unsigned int i;
+
  for (i = 0; i < info->nr_rings; i++)
  blkif_free_ring(&info->rinfo[i]);
 
@@ -1554,8 +1577,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
  struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
  struct blkfront_info *info = rinfo->dev_info;
 
- if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
- return IRQ_HANDLED;
+ if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
+ if (info->connected != BLKIF_STATE_FREEZING)
+ return IRQ_HANDLED;
+ }
 
  spin_lock_irqsave(&rinfo->ring_lock, flags);
  again:
@@ -2007,6 +2032,7 @@ static int blkif_recover(struct blkfront_info *info)
  struct bio *bio;
  unsigned int segs;
 
+ bool frozen = info->connected == BLKIF_STATE_FROZEN;
  blkfront_gather_backend_features(info);
  /* Reset limits changed by blk_mq_update_nr_hw_queues(). */
  blkif_set_queue_limits(info);
@@ -2033,6 +2059,9 @@ static int blkif_recover(struct blkfront_info *info)
  kick_pending_request_queues(rinfo);
  }
 
+ if (frozen)
+ return 0;
+
  list_for_each_entry_safe(req, n, &info->requests, queuelist) {
  /* Requeue pending requests (flush or discard) */
  list_del_init(&req->queuelist);
@@ -2340,6 +2369,7 @@ static void blkfront_connect(struct blkfront_info *info)
 
  return;
  case BLKIF_STATE_SUSPENDED:
+ case BLKIF_STATE_FROZEN:
  /*
  * If we are recovering from suspension, we need to wait
  * for the backend to announce it's features before
@@ -2457,13 +2487,38 @@ static void blkback_changed(struct xenbus_device *dev,
  break;
 
  case XenbusStateClosed:
- if (dev->state == XenbusStateClosed)
+ if (dev->state == XenbusStateClosed) {
+ if (info->connected == BLKIF_STATE_FREEZING) {
+ __blkif_free(info);
+ info->connected = BLKIF_STATE_FROZEN;
+ complete(&info->wait_backend_disconnected);
+ break;
+ }
+
  break;
+ }
+
+ /*
+ * We may somehow receive backend's Closed again while thawing
+ * or restoring and it causes thawing or restoring to fail.
+ * Ignore such unexpected state anyway.
+ */
+ if (info->connected == BLKIF_STATE_FROZEN &&
+ dev->state == XenbusStateInitialised) {
+ dev_dbg(&dev->dev,
+ "ignore the backend's Closed state: %s",
+ dev->nodename);
+ break;
+ }
  /* fall through */
  case XenbusStateClosing:
- if (info)
- blkfront_closing(info);
- break;
+ if (info) {
+                        if (info->connected == BLKIF_STATE_FREEZING)
+                                xenbus_frontend_closed(dev);
+                        else
+                                blkfront_closing(info);
+                }
+                break;
  }
 }
 
@@ -2599,6 +2654,96 @@ static void blkif_release(struct gendisk *disk, fmode_t mode)
  mutex_unlock(&blkfront_mutex);
 }
 
+static int blkfront_freeze(struct xenbus_device *dev)
+{
+ unsigned int i;
+ struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+        struct blkfront_ring_info *rinfo;
+        struct blkif_front_ring *ring;
+ /* This would be reasonable timeout as used in xenbus_dev_shutdown() */
+ unsigned int timeout = 5 * HZ;
+ int err = 0;
+
+ info->connected = BLKIF_STATE_FREEZING;
+
+ blk_mq_stop_hw_queues(info->rq);
+
+ for (i = 0; i < info->nr_rings; i++) {
+ rinfo = &info->rinfo[i];
+
+ gnttab_cancel_free_callback(&rinfo->callback);
+ flush_work(&rinfo->work);
+ }
+
+ for (i = 0; i < info->nr_rings; i++) {
+                spinlock_t *lock;
+                bool busy;
+                unsigned long req_timeout_ms = 25;
+                unsigned long ring_timeout;
+
+                rinfo = &info->rinfo[i];
+                ring = &rinfo->ring;
+
+                lock = &rinfo->ring_lock;
+
+                ring_timeout = jiffies +
+                        msecs_to_jiffies(req_timeout_ms * RING_SIZE(ring));
+
+                do {
+                        spin_lock_irq(lock);
+                        busy = blkfront_ring_is_busy(ring);
+                        spin_unlock_irq(lock);
+
+                        if (busy)
+                                msleep(req_timeout_ms);
+                        else
+                                break;
+                } while (time_is_after_jiffies(ring_timeout));
+
+                /* Timed out */
+                if (busy) {
+                        xenbus_dev_error(dev, err, "the ring is still busy");
+                        info->connected = BLKIF_STATE_CONNECTED;
+                        return -EBUSY;
+                }
+        }
+
+ /* Kick the backend to disconnect */
+ xenbus_switch_state(dev, XenbusStateClosing);
+
+ /*
+ * We don't want to move forward before the frontend is diconnected
+ * from the backend cleanly.
+ */
+ timeout = wait_for_completion_timeout(&info->wait_backend_disconnected,
+      timeout);
+ if (!timeout) {
+ err = -EBUSY;
+ xenbus_dev_error(dev, err, "Freezing timed out;"
+ "the device may become inconsistent state");
+ }
+
+ return err;
+}
+
+static int blkfront_restore(struct xenbus_device *dev)
+{
+ struct blkfront_info *info = dev_get_drvdata(&dev->dev);
+ int err = 0;
+
+ err = negotiate_mq(info);
+ if (err)
+ goto out;
+
+ err = talk_to_blkback(dev, info);
+ if (err)
+ goto out;
+ blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
+
+out:
+ return err;
+}
+
 static const struct block_device_operations xlvbd_block_fops =
 {
  .owner = THIS_MODULE,
@@ -2621,6 +2766,9 @@ static struct xenbus_driver blkfront_driver = {
  .resume = blkfront_resume,
  .otherend_changed = blkback_changed,
  .is_ready = blkfront_is_ready,
+ .freeze = blkfront_freeze,
+ .thaw = blkfront_restore,
+ .restore = blkfront_restore
 };
 
 static int __init xlblk_init(void)
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU cosmic-aws][PATCH 5/6] xen-blkfront: resurrect request-based mode

Daniel Axtens
In reply to this post by Daniel Axtens
From: Munehisa Kamata <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1801305

This change resurrect request-based mode which was completely dropped in
commit 907c3eb18e0b ("xen-blkfront: convert to blk-mq APIs").

Not to make the queue lock stale, resurrect per-device (vbd) lock in
blkfront_info which is never freed during Xen suspend and use it in
request-based mode. This is bascially the same as what the driver was
doing until commit 11659569f720 ("xen/blkfront: split per device
io_lock"). If the driver is in blk-mq mode, just use the lock(s) in
blkfront_ring_info.

In commit b7420c1eaeac ("drivers/amazon: xen-blkfront: resurrect
request-based mode"), we accidentally didn't bring piece of code which
empties the request queue while saving bios. The logic was originally
introduced in commit 402b27f9f2c2 ("xen-block: implement indirect
descriptors"). It seems to be still required for request-based mode,
so just do the same thing as before.

Note that some suspend/resume logic were moved from blkif_recover() to
blkfront_resume() in commit 7b427a59538a ("xen-blkfront: save uncompleted
reqs in blkfront_resume()"), so add the logic to blkfront_resume().

Forward-port notes: As part of this forward port, we are no longer
using out of tree xen-blkfront. Request based patch and its releated
per device vbd lock has now been ported on top of intree xen-blkfront.
For reference:
4.9 CR for resurrect request based mode: https://cr.amazon.com/r/6834653/
4.9 CR for resurrect per-device (vbd) lock: https://cr.amazon.com/r/7475903/
4.9 CR for empty the request queue while resuming: https://cr.amazon.com/r/7475918/
As part of forward-port, all the above 3 related patches, have been merged into
a single commit.
In 4.14.y kernel, we realized during forward-port and testing, that blk-mq stashes
the error code for request right after the request structure in memory. Care was
taken to not reuse this piece of memory for stashing error code in request mode
as this can cause memory corruption.
Hibernation: To not break git bisect and the hibernation feature, blkfront_freeze()
and blkfront_resume() were modified as well to support request-based mode.

Reported-by: Imre Palik <[hidden email]>
Reviewed-by: Eduardo Valentin <[hidden email]>
Reviewed-by: Munehisa Kamata <[hidden email]>
Reviewed-by: Anchal Agarwal <[hidden email]>
Signed-off-by: Munehisa Kamata <[hidden email]>
Signed-off-by: Vallish Vaidyeshwara <[hidden email]>

CR: https://cr.amazon.com/r/8309443
(forward-ported from
 0026-xen-blkfront-resurrect-request-based-mode.patch
 in AWS 4.14 kernel SRPM. The patch doesn't apply cleanly,
 but once applied to the bionic source is a clean cherry-pick
 to cosmic.)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 drivers/block/xen-blkfront.c | 334 ++++++++++++++++++++++++++++-------
 1 file changed, 268 insertions(+), 66 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index f6367d03ba63..792e7d2593d5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -155,6 +155,15 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the
 #define BLK_MAX_RING_SIZE \
  __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_GRANTS)
 
+static unsigned int blkfront_use_blk_mq = 0;
+module_param_named(use_blk_mq, blkfront_use_blk_mq, int, S_IRUGO);
+MODULE_PARM_DESC(use_blk_mq, "Enable blk-mq (default is 0)");
+
+/*
+ * Index to the first available ring.
+ */
+#define FIRST_RING_ID (0)
+
 /*
  * ring-ref%u i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consistent with backend.
@@ -193,6 +202,12 @@ struct blkfront_ring_info {
  */
 struct blkfront_info
 {
+ /*
+ * Per vbd lock which protects an associated blkfront_ring_info if the
+ * driver is in request-based mode. Use this lock always instead of per
+ * ring lock in that mode.
+ */
+ spinlock_t io_lock;
  struct mutex mutex;
  struct xenbus_device *xbdev;
  struct gendisk *gd;
@@ -264,6 +279,19 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define GREFS(_psegs) ((_psegs) * GRANTS_PER_PSEG)
 
+/* Macro to save error status */
+#define BLKIF_REQ_PUT_ERROR_STATUS(req, error, status) \
+ do { \
+ if (blkfront_use_blk_mq) \
+ blkif_req(req)->error = status; \
+ else \
+ error = status; \
+ } while (0)
+
+/* Macro to retrieve error status */
+#define BLKIF_REQ_GET_ERROR_STATUS(req, error) \
+ ((blkfront_use_blk_mq) ? blkif_req(req)->error : error)
+
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
 static void blkfront_gather_backend_features(struct blkfront_info *info);
 static int negotiate_mq(struct blkfront_info *info);
@@ -894,6 +922,62 @@ static inline bool blkif_request_flush_invalid(struct request *req,
  !info->feature_fua));
 }
 
+static inline void blkif_complete_request(struct request *req, int error)
+{
+ if (blkfront_use_blk_mq)
+ blk_mq_complete_request(req);
+ else
+ __blk_end_request_all(req, error);
+}
+
+/*
+ * do_blkif_request
+ * read a block; request is in a request queue
+ */
+static void do_blkif_request(struct request_queue *rq)
+{
+ struct blkfront_info *info = NULL;
+ struct request *req;
+ int queued;
+
+ pr_debug("Entered do_blkif_request\n");
+
+ queued = 0;
+
+ while ((req = blk_peek_request(rq)) != NULL) {
+ info = req->rq_disk->private_data;
+
+ if (RING_FULL(&info->rinfo[FIRST_RING_ID].ring))
+ goto wait;
+
+ blk_start_request(req);
+
+ if (blkif_request_flush_invalid(req, info)) {
+ __blk_end_request_all(req, BLK_STS_NOTSUPP);
+ continue;
+ }
+
+ pr_debug("do_blk req %p: cmd_flags %u, sec %lx, "
+ "(%u/%u) [%s]\n",
+ req, req->cmd_flags, (unsigned long)blk_rq_pos(req),
+ blk_rq_cur_sectors(req), blk_rq_sectors(req),
+ rq_data_dir(req) ? "write" : "read");
+
+ if (blkif_queue_request(req, &info->rinfo[FIRST_RING_ID])) {
+ blk_requeue_request(rq, req);
+wait:
+ /* Avoid pointless unplugs. */
+ blk_stop_queue(rq);
+ break;
+ }
+
+ queued++;
+ }
+
+ if(queued != 0)
+ flush_requests(&info->rinfo[FIRST_RING_ID]);
+}
+
 static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
   const struct blk_mq_queue_data *qd)
 {
@@ -979,30 +1063,37 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
  struct request_queue *rq;
  struct blkfront_info *info = gd->private_data;
 
- memset(&info->tag_set, 0, sizeof(info->tag_set));
- info->tag_set.ops = &blkfront_mq_ops;
- info->tag_set.nr_hw_queues = info->nr_rings;
- if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
- /*
- * When indirect descriptior is not supported, the I/O request
- * will be split between multiple request in the ring.
- * To avoid problems when sending the request, divide by
- * 2 the depth of the queue.
- */
- info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
- } else
- info->tag_set.queue_depth = BLK_RING_SIZE(info);
- info->tag_set.numa_node = NUMA_NO_NODE;
- info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
- info->tag_set.cmd_size = sizeof(struct blkif_req);
- info->tag_set.driver_data = info;
-
- if (blk_mq_alloc_tag_set(&info->tag_set))
- return -EINVAL;
- rq = blk_mq_init_queue(&info->tag_set);
- if (IS_ERR(rq)) {
- blk_mq_free_tag_set(&info->tag_set);
- return PTR_ERR(rq);
+ if (blkfront_use_blk_mq) {
+ memset(&info->tag_set, 0, sizeof(info->tag_set));
+ info->tag_set.ops = &blkfront_mq_ops;
+ info->tag_set.nr_hw_queues = info->nr_rings;
+ if (HAS_EXTRA_REQ && info->max_indirect_segments == 0) {
+ /*
+ * When indirect descriptior is not supported, the I/O request
+ * will be split between multiple request in the ring.
+ * To avoid problems when sending the request, divide by
+ * 2 the depth of the queue.
+ */
+ info->tag_set.queue_depth =  BLK_RING_SIZE(info) / 2;
+ } else
+ info->tag_set.queue_depth = BLK_RING_SIZE(info);
+ info->tag_set.numa_node = NUMA_NO_NODE;
+ info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+ info->tag_set.cmd_size = sizeof(struct blkif_req);
+ info->tag_set.driver_data = info;
+
+ if (blk_mq_alloc_tag_set(&info->tag_set))
+ return -EINVAL;
+ rq = blk_mq_init_queue(&info->tag_set);
+ if (IS_ERR(rq)) {
+ blk_mq_free_tag_set(&info->tag_set);
+ return PTR_ERR(rq);
+ }
+ } else {
+ spin_lock_init(&info->io_lock);
+ rq = blk_init_queue(do_blkif_request, &info->io_lock);
+ if (IS_ERR(rq))
+ return PTR_ERR(rq);
  }
 
  rq->queuedata = info;
@@ -1201,21 +1292,29 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
  unsigned int minor, nr_minors, i;
+ unsigned long flags;
 
  if (info->rq == NULL)
  return;
 
  /* No more blkif_request(). */
- blk_mq_stop_hw_queues(info->rq);
+ if (blkfront_use_blk_mq) {
+ blk_mq_stop_hw_queues(info->rq);
 
- for (i = 0; i < info->nr_rings; i++) {
- struct blkfront_ring_info *rinfo = &info->rinfo[i];
+ for (i = 0; i < info->nr_rings; i++) {
+ struct blkfront_ring_info *rinfo = &info->rinfo[i];
 
- /* No more gnttab callback work. */
- gnttab_cancel_free_callback(&rinfo->callback);
+ /* No more gnttab callback work. */
+ gnttab_cancel_free_callback(&rinfo->callback);
 
- /* Flush gnttab callback work. Must be done with no locks held. */
- flush_work(&rinfo->work);
+ /* Flush gnttab callback work. Must be done with no locks held. */
+ flush_work(&rinfo->work);
+ }
+ } else {
+ spin_lock_irqsave(&info->io_lock, flags);
+ blk_stop_queue(info->rq);
+ gnttab_cancel_free_callback(&info->rinfo[FIRST_RING_ID].callback);
+ spin_unlock_irqrestore(&info->io_lock, flags);
  }
 
  del_gendisk(info->gd);
@@ -1225,7 +1324,8 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
  xlbd_release_minors(minor, nr_minors);
 
  blk_cleanup_queue(info->rq);
- blk_mq_free_tag_set(&info->tag_set);
+ if (blkfront_use_blk_mq)
+ blk_mq_free_tag_set(&info->tag_set);
  info->rq = NULL;
 
  put_disk(info->gd);
@@ -1237,17 +1337,31 @@ static inline void kick_pending_request_queues_locked(struct blkfront_ring_info
 {
  if (unlikely(rinfo->dev_info->connected == BLKIF_STATE_FREEZING))
                 return;
- if (!RING_FULL(&rinfo->ring))
+
+ if (RING_FULL(&rinfo->ring))
+ return;
+
+ if (blkfront_use_blk_mq) {
  blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true);
+ } else {
+ /* Re-enable calldowns */
+ blk_start_queue(rinfo->dev_info->rq);
+ /* Kick things off immediately */
+ do_blkif_request(rinfo->dev_info->rq);
+ }
 }
 
 static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
  unsigned long flags;
+ struct blkfront_info *info = rinfo->dev_info;
+ spinlock_t *lock;
 
- spin_lock_irqsave(&rinfo->ring_lock, flags);
+ lock = blkfront_use_blk_mq ? &rinfo->ring_lock : &info->io_lock;
+
+ spin_lock_irqsave(lock, flags);
  kick_pending_request_queues_locked(rinfo);
- spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+ spin_unlock_irqrestore(lock, flags);
 }
 
 static void blkif_restart_queue(struct work_struct *work)
@@ -1258,6 +1372,7 @@ static void blkif_restart_queue(struct work_struct *work)
  kick_pending_request_queues(rinfo);
 }
 
+/* Must be called with per vbd lock held if the frontend uses request-based */
 static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 {
  struct grant *persistent_gnt, *n;
@@ -1340,6 +1455,9 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
  /* No more gnttab callback work. */
  gnttab_cancel_free_callback(&rinfo->callback);
 
+ if (!blkfront_use_blk_mq)
+ spin_unlock_irq(&info->io_lock);
+
  /* Flush gnttab callback work. Must be done with no locks held. */
  flush_work(&rinfo->work);
 
@@ -1361,11 +1479,18 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
 static void blkif_free(struct blkfront_info *info, int suspend)
 {
  /* Prevent new requests being issued until we fix things up. */
+ if (!blkfront_use_blk_mq)
+ spin_lock_irq(&info->io_lock);
+
  info->connected = suspend ?
  BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
  /* No more blkif_request(). */
- if (info->rq)
- blk_mq_stop_hw_queues(info->rq);
+ if (info->rq) {
+ if (blkfront_use_blk_mq)
+ blk_mq_stop_hw_queues(info->rq);
+ else
+ blk_stop_queue(info->rq);
+ }
 
  __blkif_free(info);
 }
@@ -1576,13 +1701,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
  unsigned long flags;
  struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
  struct blkfront_info *info = rinfo->dev_info;
+ spinlock_t *lock;
+ int error = BLK_STS_OK;
 
  if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
  if (info->connected != BLKIF_STATE_FREEZING)
  return IRQ_HANDLED;
  }
 
- spin_lock_irqsave(&rinfo->ring_lock, flags);
+ lock = blkfront_use_blk_mq ? &rinfo->ring_lock : &info->io_lock;
+
+ spin_lock_irqsave(lock, flags);
  again:
  rp = rinfo->ring.sring->rsp_prod;
  rmb(); /* Ensure we see queued responses up to 'rp'. */
@@ -1622,9 +1751,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
  }
 
  if (bret->status == BLKIF_RSP_OKAY)
- blkif_req(req)->error = BLK_STS_OK;
+ BLKIF_REQ_PUT_ERROR_STATUS(req, error, BLK_STS_OK);
  else
- blkif_req(req)->error = BLK_STS_IOERR;
+ BLKIF_REQ_PUT_ERROR_STATUS(req, error, BLK_STS_IOERR);
 
  switch (bret->operation) {
  case BLKIF_OP_DISCARD:
@@ -1632,7 +1761,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
  struct request_queue *rq = info->rq;
  printk(KERN_WARNING "blkfront: %s: %s op failed\n",
    info->gd->disk_name, op_name(bret->operation));
- blkif_req(req)->error = BLK_STS_NOTSUPP;
+ BLKIF_REQ_PUT_ERROR_STATUS(req, error, BLK_STS_NOTSUPP);
  info->feature_discard = 0;
  info->feature_secdiscard = 0;
  blk_queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
@@ -1644,17 +1773,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
  if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
  printk(KERN_WARNING "blkfront: %s: %s op failed\n",
        info->gd->disk_name, op_name(bret->operation));
- blkif_req(req)->error = BLK_STS_NOTSUPP;
+ BLKIF_REQ_PUT_ERROR_STATUS(req, error, BLK_STS_NOTSUPP);
  }
  if (unlikely(bret->status == BLKIF_RSP_ERROR &&
      rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
  printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
        info->gd->disk_name, op_name(bret->operation));
- blkif_req(req)->error = BLK_STS_NOTSUPP;
+ BLKIF_REQ_PUT_ERROR_STATUS(req, error, BLK_STS_NOTSUPP);
  }
- if (unlikely(blkif_req(req)->error)) {
- if (blkif_req(req)->error == BLK_STS_NOTSUPP)
- blkif_req(req)->error = BLK_STS_OK;
+ if (unlikely(BLKIF_REQ_GET_ERROR_STATUS(req, error))) {
+ if (BLKIF_REQ_GET_ERROR_STATUS(req, error)
+ == BLK_STS_NOTSUPP)
+ BLKIF_REQ_PUT_ERROR_STATUS(req, error,
+ BLK_STS_OK);
  info->feature_fua = 0;
  info->feature_flush = 0;
  xlvbd_flush(info);
@@ -1671,7 +1802,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
  BUG();
  }
 
- blk_mq_complete_request(req);
+ blkif_complete_request(req, BLKIF_REQ_GET_ERROR_STATUS(req, error));
  }
 
  rinfo->ring.rsp_cons = i;
@@ -1686,7 +1817,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 
  kick_pending_request_queues_locked(rinfo);
 
- spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+ spin_unlock_irqrestore(lock, flags);
 
  return IRQ_HANDLED;
 }
@@ -1927,8 +2058,11 @@ static int negotiate_mq(struct blkfront_info *info)
  backend_max_queues = xenbus_read_unsigned(info->xbdev->otherend,
   "multi-queue-max-queues", 1);
  info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
- /* We need at least one ring. */
- if (!info->nr_rings)
+ /*
+ * We need at least one ring. Also, do not allow to have multiple rings if blk-mq is
+ * not used.
+ */
+ if (!info->nr_rings || !blkfront_use_blk_mq)
  info->nr_rings = 1;
 
  info->rinfo = kcalloc(info->nr_rings,
@@ -1947,7 +2081,8 @@ static int negotiate_mq(struct blkfront_info *info)
  INIT_LIST_HEAD(&rinfo->grants);
  rinfo->dev_info = info;
  INIT_WORK(&rinfo->work, blkif_restart_queue);
- spin_lock_init(&rinfo->ring_lock);
+ if (blkfront_use_blk_mq)
+ spin_lock_init(&rinfo->ring_lock);
  }
  return 0;
 }
@@ -2048,6 +2183,10 @@ static int blkif_recover(struct blkfront_info *info)
  }
  xenbus_switch_state(info->xbdev, XenbusStateConnected);
 
+ /* blk_requeue_request below must be called with queue lock held */
+ if (!blkfront_use_blk_mq)
+ spin_lock_irq(&info->io_lock);
+
  /* Now safe for us to use the shared ring */
  info->connected = BLKIF_STATE_CONNECTED;
 
@@ -2056,20 +2195,34 @@ static int blkif_recover(struct blkfront_info *info)
 
  rinfo = &info->rinfo[r_index];
  /* Kick any other new requests queued since we resumed */
- kick_pending_request_queues(rinfo);
+ if (blkfront_use_blk_mq)
+ kick_pending_request_queues(rinfo);
+ else
+ kick_pending_request_queues_locked(rinfo);
  }
 
- if (frozen)
+ if (frozen) {
+ if (!blkfront_use_blk_mq)
+ spin_unlock_irq(&info->io_lock);
  return 0;
+ }
 
  list_for_each_entry_safe(req, n, &info->requests, queuelist) {
  /* Requeue pending requests (flush or discard) */
  list_del_init(&req->queuelist);
  BUG_ON(req->nr_phys_segments > segs);
- blk_mq_requeue_request(req, false);
+ if (blkfront_use_blk_mq)
+ blk_mq_requeue_request(req, false);
+ else
+ blk_requeue_request(info->rq, req);
+ }
+
+ if (blkfront_use_blk_mq) {
+ blk_mq_start_stopped_hw_queues(info->rq, true);
+ blk_mq_kick_requeue_list(info->rq);
+ } else {
+ spin_unlock_irq(&info->io_lock);
  }
- blk_mq_start_stopped_hw_queues(info->rq, true);
- blk_mq_kick_requeue_list(info->rq);
 
  while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
  /* Traverse the list of pending bios and re-queue them */
@@ -2126,14 +2279,47 @@ static int blkfront_resume(struct xenbus_device *dev)
  merge_bio.tail = shadow[j].request->biotail;
  bio_list_merge(&info->bio_list, &merge_bio);
  shadow[j].request->bio = NULL;
- blk_mq_end_request(shadow[j].request, BLK_STS_OK);
+ if (blkfront_use_blk_mq)
+ blk_mq_end_request(shadow[j].request, BLK_STS_OK);
+ else
+ blk_end_request_all(shadow[j].request, BLK_STS_OK);
  }
  }
 
+ if (!blkfront_use_blk_mq) {
+ struct request *req;
+ struct bio_list merge_bio;
+
+ /*
+ * Empty the queue, this is important because we might have
+ * requests in the queue with more segments than what we
+ * can handle now.
+ */
+ spin_lock_irq(&info->io_lock);
+ while ((req = blk_fetch_request(info->rq)) != NULL) {
+ if (req_op(req) == REQ_OP_FLUSH ||
+    req_op(req) == REQ_OP_DISCARD ||
+    req_op(req) == REQ_OP_SECURE_ERASE ||
+    req->cmd_flags & REQ_FUA) {
+ list_add(&req->queuelist, &info->requests);
+ continue;
+ }
+ merge_bio.head = req->bio;
+ merge_bio.tail = req->biotail;
+ bio_list_merge(&info->bio_list, &merge_bio);
+ req->bio = NULL;
+ if (req_op(req) == REQ_OP_FLUSH ||
+    req->cmd_flags & REQ_FUA)
+ pr_alert("diskcache flush request found!\n");
+ __blk_end_request_all(req, BLK_STS_OK);
+ }
+ spin_unlock_irq(&info->io_lock);
+ }
+
  blkif_free(info, info->connected == BLKIF_STATE_CONNECTED);
 
  err = talk_to_blkback(dev, info);
- if (!err)
+ if (!err && blkfront_use_blk_mq)
  blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
 
  /*
@@ -2489,6 +2675,8 @@ static void blkback_changed(struct xenbus_device *dev,
  case XenbusStateClosed:
  if (dev->state == XenbusStateClosed) {
  if (info->connected == BLKIF_STATE_FREEZING) {
+ if (!blkfront_use_blk_mq)
+ spin_lock_irq(&info->io_lock);
  __blkif_free(info);
  info->connected = BLKIF_STATE_FROZEN;
  complete(&info->wait_backend_disconnected);
@@ -2665,14 +2853,25 @@ static int blkfront_freeze(struct xenbus_device *dev)
  int err = 0;
 
  info->connected = BLKIF_STATE_FREEZING;
+
+ if (blkfront_use_blk_mq) {
+ blk_mq_stop_hw_queues(info->rq);
 
- blk_mq_stop_hw_queues(info->rq);
-
- for (i = 0; i < info->nr_rings; i++) {
- rinfo = &info->rinfo[i];
+ for (i = 0; i < info->nr_rings; i++) {
+ rinfo = &info->rinfo[i];
+
+ gnttab_cancel_free_callback(&rinfo->callback);
+ flush_work(&rinfo->work);
+ }
+ } else {
+ spin_lock_irq(&info->io_lock);
+ blk_stop_queue(info->rq);
+ gnttab_cancel_free_callback(
+ &info->rinfo[FIRST_RING_ID].callback);
+ spin_unlock_irq(&info->io_lock);
 
- gnttab_cancel_free_callback(&rinfo->callback);
- flush_work(&rinfo->work);
+ blk_sync_queue(info->rq);
+ flush_work(&info->rinfo[FIRST_RING_ID].work);
  }
 
  for (i = 0; i < info->nr_rings; i++) {
@@ -2684,7 +2883,8 @@ static int blkfront_freeze(struct xenbus_device *dev)
                 rinfo = &info->rinfo[i];
                 ring = &rinfo->ring;
 
-                lock = &rinfo->ring_lock;
+                lock = blkfront_use_blk_mq ?
+ &rinfo->ring_lock : &info->io_lock;
 
                 ring_timeout = jiffies +
                         msecs_to_jiffies(req_timeout_ms * RING_SIZE(ring));
@@ -2738,7 +2938,9 @@ static int blkfront_restore(struct xenbus_device *dev)
  err = talk_to_blkback(dev, info);
  if (err)
  goto out;
- blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
+
+ if (blkfront_use_blk_mq)
+ blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
 
 out:
  return err;
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU cosmic-aws][PATCH 6/6] xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq

Daniel Axtens
In reply to this post by Daniel Axtens
From: Anchal Agarwal <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1801305

The code for talk_to_blkback API changed in kernel-4.14.45 to include a call to
negotiate_mq. Subsequent calls causes kernel panic
[   84.440105] Call Trace:
[   84.443707]  talk_to_blkback+0x6d/0x8b0 [xen_blkfront]
[   84.449147]  blkfront_restore+0x33/0x60 [xen_blkfront]
[   84.453336]  ? xenbus_read_otherend_details+0x50/0xb0
[   84.457804]  xenbus_dev_cancel+0x5f/0x160
[   84.463286]  ? xenbus_dev_resume+0x170/0x170
[   84.466891]  dpm_run_callback+0x3b/0x100
[   84.470516]  device_resume+0x10d/0x420
[   84.473844]  dpm_resume+0xfd/0x2f0
[   84.476984]  hibernation_snapshot+0x218/0x410
[   84.480794]  hibernate+0x14b/0x270
[   84.484030]  state_store+0x50/0x60
[   84.487443]  kernfs_fop_write+0x105/0x180
[   84.492695]  __vfs_write+0x36/0x160
[   84.496672]  ? __audit_syscall_entry+0xbc/0x110
[   84.502123]  vfs_write+0xad/0x1a0
[   84.506857]  SyS_write+0x52/0xc0
[   84.511420]  do_syscall_64+0x67/0x100
[   84.516365]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   84.522571] RIP: 0033:0x7f44a03407e4
[   84.526210] RSP: 002b:00007ffd5e0ec3c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   84.534041] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f44a03407e4
[   84.542571] RDX: 0000000000000004 RSI: 0000000001e94990 RDI: 0000000000000001
[   84.549142] RBP: 0000000001e94990 R08: 00007f44a060c8c0 R09: 00007f44a0c57740
[   84.554658] R10: 00007f44a03cd320 R11: 0000000000000246 R12: 0000000000000004
[   84.560411] R13: 0000000000000001 R14: 00007f44a060b760 R15: 0000000000000004
[   84.565744] Code: 39 ab e8 00 00 00 77 8a 31 c0 5b 5d c3 44 8b 05 50 57 00 00 45 85 c0 0f 84 2f ff ff ff 89 c0 48 69 f8 e0 40 01 00 e9 30 ff ff ff <0f> 0b 48 8b 7b 28 48 c7 c2 78 58 16 a0 be f4 ff ff ff e8 7e 37
[   84.580594] RIP: negotiate_mq+0x12b/0x150 [xen_blkfront] RSP: ffffc90000ebbc70

Signed-off-by: Anchal Agarwal <[hidden email]>
Reviewed-by: Frank van der Linden <[hidden email]>
Reviewed-by: Vallish Vaidyeshwara <[hidden email]>
(cherry-picked from
 0035-xen-blkfront-Fixed-blkfront_restore-to-remove-a-call.patch
 in AWS 4.14 kernel SRPM)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 drivers/block/xen-blkfront.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 792e7d2593d5..ab35ca992958 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2930,11 +2930,6 @@ static int blkfront_restore(struct xenbus_device *dev)
 {
  struct blkfront_info *info = dev_get_drvdata(&dev->dev);
  int err = 0;
-
- err = negotiate_mq(info);
- if (err)
- goto out;
-
  err = talk_to_blkback(dev, info);
  if (err)
  goto out;
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

Stefan Bader-2
In reply to this post by Daniel Axtens
On 02.11.18 09:57, Daniel Axtens wrote:

> In current Ubuntu kernels, PV blkfront drivers have blk-mq enabled by
> default and cannot use the old I/O scheduler.
>
> [Impact]
> blk-mq is not as fast as the old request-based scheduler for some
> workloads on HDD disks.
>
> [Fix]
> Amazon Linux has a commit which reintroduces the request-based
> mode. It disables blk-mq by default but allows it to be switched back
> on with a kernel parameter.
>
> For X this needs a small patch from upstream for error handling.
>
> For B/C this patchset is bigger as it includes the suspend/resume
> patches already in X, and a new fixup. These are desirable as the
> request mode patch assumes their presence.
>
> [Regression Potential]
> Could potentially break xen based disks on AWS.
>
> For B/C, the patches also add some code to the xen core around suspend
> and resume, this code is much smaller and also mirrors code already in
> Xenial.
>
> [Tests]
> Tested by AWS for Xenial, and their kernel engineers vetted the Xenial
> patches. I tested the Bionic and Cosmic patchsets with fio, the system
> appears stable and the IOPS promised for EBS Provisioned IOPS disks
> were met in my testing. I did an apt update/upgrade and everything
> worked (no hash-sum mismatches).
>
> Anchal Agarwal (1):
>   xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>
> Munehisa Kamata (5):
>   xen/manage: keep track of the on-going suspend mode
>   xen/manage: introduce helper function to know the on-going suspend
>     mode
>   xenbus: add freeze/thaw/restore callbacks support
>   xen-blkfront: add callbacks for PM suspend and hibernation
>   xen-blkfront: resurrect request-based mode
>
>  drivers/block/xen-blkfront.c      | 475 ++++++++++++++++++++++++++----
>  drivers/xen/manage.c              |  73 +++++
>  drivers/xen/xenbus/xenbus_probe.c | 102 ++++++-
>  include/xen/xen-ops.h             |   4 +
>  include/xen/xenbus.h              |   3 +
>  5 files changed, 576 insertions(+), 81 deletions(-)
>
Acked-by: Stefan Bader <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

Kleber Sacilotto de Souza
In reply to this post by Daniel Axtens
On 11/02/18 09:57, Daniel Axtens wrote:

> In current Ubuntu kernels, PV blkfront drivers have blk-mq enabled by
> default and cannot use the old I/O scheduler.
>
> [Impact]
> blk-mq is not as fast as the old request-based scheduler for some
> workloads on HDD disks.
>
> [Fix]
> Amazon Linux has a commit which reintroduces the request-based
> mode. It disables blk-mq by default but allows it to be switched back
> on with a kernel parameter.
>
> For X this needs a small patch from upstream for error handling.
>
> For B/C this patchset is bigger as it includes the suspend/resume
> patches already in X, and a new fixup. These are desirable as the
> request mode patch assumes their presence.
>
> [Regression Potential]
> Could potentially break xen based disks on AWS.
>
> For B/C, the patches also add some code to the xen core around suspend
> and resume, this code is much smaller and also mirrors code already in
> Xenial.
>
> [Tests]
> Tested by AWS for Xenial, and their kernel engineers vetted the Xenial
> patches. I tested the Bionic and Cosmic patchsets with fio, the system
> appears stable and the IOPS promised for EBS Provisioned IOPS disks
> were met in my testing. I did an apt update/upgrade and everything
> worked (no hash-sum mismatches).
>
> Anchal Agarwal (1):
>   xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>
> Munehisa Kamata (5):
>   xen/manage: keep track of the on-going suspend mode
>   xen/manage: introduce helper function to know the on-going suspend
>     mode
>   xenbus: add freeze/thaw/restore callbacks support
>   xen-blkfront: add callbacks for PM suspend and hibernation
>   xen-blkfront: resurrect request-based mode
>
>  drivers/block/xen-blkfront.c      | 475 ++++++++++++++++++++++++++----
>  drivers/xen/manage.c              |  73 +++++
>  drivers/xen/xenbus/xenbus_probe.c | 102 ++++++-
>  include/xen/xen-ops.h             |   4 +
>  include/xen/xenbus.h              |   3 +
>  5 files changed, 576 insertions(+), 81 deletions(-)
>
Hi Daniel,

The patches come from "AWS 4.14 kernel SRPM", does it mean they were
never upstream? If that's the case we need to mark the patches as SAUCE
(we could do it when applying them).


Thanks,

Kleber


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

Kleber Sacilotto de Souza
In reply to this post by Daniel Axtens
On 11/02/18 09:57, Daniel Axtens wrote:

> In current Ubuntu kernels, PV blkfront drivers have blk-mq enabled by
> default and cannot use the old I/O scheduler.
>
> [Impact]
> blk-mq is not as fast as the old request-based scheduler for some
> workloads on HDD disks.
>
> [Fix]
> Amazon Linux has a commit which reintroduces the request-based
> mode. It disables blk-mq by default but allows it to be switched back
> on with a kernel parameter.
>
> For X this needs a small patch from upstream for error handling.
>
> For B/C this patchset is bigger as it includes the suspend/resume
> patches already in X, and a new fixup. These are desirable as the
> request mode patch assumes their presence.
>
> [Regression Potential]
> Could potentially break xen based disks on AWS.
>
> For B/C, the patches also add some code to the xen core around suspend
> and resume, this code is much smaller and also mirrors code already in
> Xenial.
>
> [Tests]
> Tested by AWS for Xenial, and their kernel engineers vetted the Xenial
> patches. I tested the Bionic and Cosmic patchsets with fio, the system
> appears stable and the IOPS promised for EBS Provisioned IOPS disks
> were met in my testing. I did an apt update/upgrade and everything
> worked (no hash-sum mismatches).
>
> Anchal Agarwal (1):
>   xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>
> Munehisa Kamata (5):
>   xen/manage: keep track of the on-going suspend mode
>   xen/manage: introduce helper function to know the on-going suspend
>     mode
>   xenbus: add freeze/thaw/restore callbacks support
>   xen-blkfront: add callbacks for PM suspend and hibernation
>   xen-blkfront: resurrect request-based mode
>
>  drivers/block/xen-blkfront.c      | 475 ++++++++++++++++++++++++++----
>  drivers/xen/manage.c              |  73 +++++
>  drivers/xen/xenbus/xenbus_probe.c | 102 ++++++-
>  include/xen/xen-ops.h             |   4 +
>  include/xen/xenbus.h              |   3 +
>  5 files changed, 576 insertions(+), 81 deletions(-)
>
The patches probably need a SAUCE tag since they don't seems to come
from mainline.

Acked-by: Kleber Sacilotto de Souza <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

Daniel Axtens
In reply to this post by Kleber Sacilotto de Souza
Hi Kleber,

Sorry, just replied to your other email - yes, the patches from the
SRPM were never upstream and should therefore be marked as SAUCE. I
didn't do that because I (mistakenly!) thought that was limited to
patches written specifically for the Ubuntu kernel.

Regards,
Dniel
On Wed, Nov 7, 2018 at 7:52 PM Kleber Souza <[hidden email]> wrote:

>
> On 11/02/18 09:57, Daniel Axtens wrote:
> > In current Ubuntu kernels, PV blkfront drivers have blk-mq enabled by
> > default and cannot use the old I/O scheduler.
> >
> > [Impact]
> > blk-mq is not as fast as the old request-based scheduler for some
> > workloads on HDD disks.
> >
> > [Fix]
> > Amazon Linux has a commit which reintroduces the request-based
> > mode. It disables blk-mq by default but allows it to be switched back
> > on with a kernel parameter.
> >
> > For X this needs a small patch from upstream for error handling.
> >
> > For B/C this patchset is bigger as it includes the suspend/resume
> > patches already in X, and a new fixup. These are desirable as the
> > request mode patch assumes their presence.
> >
> > [Regression Potential]
> > Could potentially break xen based disks on AWS.
> >
> > For B/C, the patches also add some code to the xen core around suspend
> > and resume, this code is much smaller and also mirrors code already in
> > Xenial.
> >
> > [Tests]
> > Tested by AWS for Xenial, and their kernel engineers vetted the Xenial
> > patches. I tested the Bionic and Cosmic patchsets with fio, the system
> > appears stable and the IOPS promised for EBS Provisioned IOPS disks
> > were met in my testing. I did an apt update/upgrade and everything
> > worked (no hash-sum mismatches).
> >
> > Anchal Agarwal (1):
> >   xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
> >
> > Munehisa Kamata (5):
> >   xen/manage: keep track of the on-going suspend mode
> >   xen/manage: introduce helper function to know the on-going suspend
> >     mode
> >   xenbus: add freeze/thaw/restore callbacks support
> >   xen-blkfront: add callbacks for PM suspend and hibernation
> >   xen-blkfront: resurrect request-based mode
> >
> >  drivers/block/xen-blkfront.c      | 475 ++++++++++++++++++++++++++----
> >  drivers/xen/manage.c              |  73 +++++
> >  drivers/xen/xenbus/xenbus_probe.c | 102 ++++++-
> >  include/xen/xen-ops.h             |   4 +
> >  include/xen/xenbus.h              |   3 +
> >  5 files changed, 576 insertions(+), 81 deletions(-)
> >
> Hi Daniel,
>
> The patches come from "AWS 4.14 kernel SRPM", does it mean they were
> never upstream? If that's the case we need to mark the patches as SAUCE
> (we could do it when applying them).
>
>
> Thanks,
>
> Kleber
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED/cmnt: [SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

Khaled Elmously
In reply to this post by Daniel Axtens
Applied to cosmic-aws/master. Added "SAUCE: " to all patches

On 2018-11-02 19:57:26 , Daniel Axtens wrote:

> In current Ubuntu kernels, PV blkfront drivers have blk-mq enabled by
> default and cannot use the old I/O scheduler.
>
> [Impact]
> blk-mq is not as fast as the old request-based scheduler for some
> workloads on HDD disks.
>
> [Fix]
> Amazon Linux has a commit which reintroduces the request-based
> mode. It disables blk-mq by default but allows it to be switched back
> on with a kernel parameter.
>
> For X this needs a small patch from upstream for error handling.
>
> For B/C this patchset is bigger as it includes the suspend/resume
> patches already in X, and a new fixup. These are desirable as the
> request mode patch assumes their presence.
>
> [Regression Potential]
> Could potentially break xen based disks on AWS.
>
> For B/C, the patches also add some code to the xen core around suspend
> and resume, this code is much smaller and also mirrors code already in
> Xenial.
>
> [Tests]
> Tested by AWS for Xenial, and their kernel engineers vetted the Xenial
> patches. I tested the Bionic and Cosmic patchsets with fio, the system
> appears stable and the IOPS promised for EBS Provisioned IOPS disks
> were met in my testing. I did an apt update/upgrade and everything
> worked (no hash-sum mismatches).
>
> Anchal Agarwal (1):
>   xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>
> Munehisa Kamata (5):
>   xen/manage: keep track of the on-going suspend mode
>   xen/manage: introduce helper function to know the on-going suspend
>     mode
>   xenbus: add freeze/thaw/restore callbacks support
>   xen-blkfront: add callbacks for PM suspend and hibernation
>   xen-blkfront: resurrect request-based mode
>
>  drivers/block/xen-blkfront.c      | 475 ++++++++++++++++++++++++++----
>  drivers/xen/manage.c              |  73 +++++
>  drivers/xen/xenbus/xenbus_probe.c | 102 ++++++-
>  include/xen/xen-ops.h             |   4 +
>  include/xen/xenbus.h              |   3 +
>  5 files changed, 576 insertions(+), 81 deletions(-)
>
> --
> 2.17.1
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU cosmic-aws][PATCH 0/6] LP: #1801305: xen-blkfront: restore request mode

Kleber Sacilotto de Souza
In reply to this post by Daniel Axtens
On 11/08/18 01:23, Daniel Axtens wrote:
> Hi Kleber,
>
> Sorry, just replied to your other email - yes, the patches from the
> SRPM were never upstream and should therefore be marked as SAUCE. I
> didn't do that because I (mistakenly!) thought that was limited to
> patches written specifically for the Ubuntu kernel.

Hi Daniel,

Thank you for the clarification. There's no need to send the patches
again now, we can edit the subject when applying.


Thanks,
Kleber

> Regards,
> Dniel
> On Wed, Nov 7, 2018 at 7:52 PM Kleber Souza <[hidden email]> wrote:
>> On 11/02/18 09:57, Daniel Axtens wrote:
>>> In current Ubuntu kernels, PV blkfront drivers have blk-mq enabled by
>>> default and cannot use the old I/O scheduler.
>>>
>>> [Impact]
>>> blk-mq is not as fast as the old request-based scheduler for some
>>> workloads on HDD disks.
>>>
>>> [Fix]
>>> Amazon Linux has a commit which reintroduces the request-based
>>> mode. It disables blk-mq by default but allows it to be switched back
>>> on with a kernel parameter.
>>>
>>> For X this needs a small patch from upstream for error handling.
>>>
>>> For B/C this patchset is bigger as it includes the suspend/resume
>>> patches already in X, and a new fixup. These are desirable as the
>>> request mode patch assumes their presence.
>>>
>>> [Regression Potential]
>>> Could potentially break xen based disks on AWS.
>>>
>>> For B/C, the patches also add some code to the xen core around suspend
>>> and resume, this code is much smaller and also mirrors code already in
>>> Xenial.
>>>
>>> [Tests]
>>> Tested by AWS for Xenial, and their kernel engineers vetted the Xenial
>>> patches. I tested the Bionic and Cosmic patchsets with fio, the system
>>> appears stable and the IOPS promised for EBS Provisioned IOPS disks
>>> were met in my testing. I did an apt update/upgrade and everything
>>> worked (no hash-sum mismatches).
>>>
>>> Anchal Agarwal (1):
>>>   xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
>>>
>>> Munehisa Kamata (5):
>>>   xen/manage: keep track of the on-going suspend mode
>>>   xen/manage: introduce helper function to know the on-going suspend
>>>     mode
>>>   xenbus: add freeze/thaw/restore callbacks support
>>>   xen-blkfront: add callbacks for PM suspend and hibernation
>>>   xen-blkfront: resurrect request-based mode
>>>
>>>  drivers/block/xen-blkfront.c      | 475 ++++++++++++++++++++++++++----
>>>  drivers/xen/manage.c              |  73 +++++
>>>  drivers/xen/xenbus/xenbus_probe.c | 102 ++++++-
>>>  include/xen/xen-ops.h             |   4 +
>>>  include/xen/xenbus.h              |   3 +
>>>  5 files changed, 576 insertions(+), 81 deletions(-)
>>>
>> Hi Daniel,
>>
>> The patches come from "AWS 4.14 kernel SRPM", does it mean they were
>> never upstream? If that's the case we need to mark the patches as SAUCE
>> (we could do it when applying them).
>>
>>
>> Thanks,
>>
>> Kleber
>>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team