[Xenial][PATCH 0/1] Fix for libmbim-proxy using 100% CPU

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Xenial][PATCH 0/1] Fix for libmbim-proxy using 100% CPU

Wen-chien Jesse Sung
BugLink: https://launchpad.net/bugs/1851347

== Impact ==
A Dell Edge Gateway 3002 user reported that `top` reports around
100% CPU usage for the libmbim-proxy process. Seen in at least
40% of their devices at some point during the last 6 months.
The CPU keeps high at ~100% days or weeks, but does return back
to normal without a reboot. The LTE connectivity is seemingly
still working as usual.

The issue starts after an EPIPE error in the syslogs:
cdc_mbim 1-3:1.12: nonzero urb status received: -EPIPE

== Fix ==
8fec9355a968 USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
This has been in mainline kernel since 4.14.

== Testcase ==
Connect to the LTE network and see if libmbim-proxy uses 100% after
some point.

== Risk of Regression ==
Low, since
1. Already tested by original reporter and no issue found after
   more than a week (usually it fails twice in a week).
2. Ignoring -EPIPE is the default behavior after 4.14.


Bjørn Mork (1):
  USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse

 drivers/usb/class/cdc-wdm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Xenial][PATCH 1/1] USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse

Wen-chien Jesse Sung
From: Bjørn Mork <[hidden email]>

BugLink: https://launchpad.net/bugs/1851347

The driver will forward errors to userspace after turning most of them
into -EIO. But all status codes are not equal. The -EPIPE (stall) in
particular can be seen more as a result of normal USB signaling than
an actual error. The state is automatically cleared by the USB core
without intervention from either driver or userspace.

And most devices and firmwares will never trigger a stall as a result
of GetEncapsulatedResponse. This is in fact a requirement for CDC WDM
devices. Quoting from section 7.1 of the CDC WMC spec revision 1.1:

  The function shall not return STALL in response to
  GetEncapsulatedResponse.

But this driver is also handling GetEncapsulatedResponse on behalf of
the qmi_wwan and cdc_mbim drivers. Unfortunately the relevant specs
are not as clear wrt stall. So some QMI and MBIM devices *will*
occasionally stall, causing the GetEncapsulatedResponse to return an
-EPIPE status. Translating this into -EIO for userspace has proven to
be harmful. Treating it as an empty read is safer, making the driver
behave as if the device was conforming to the CDC WDM spec.

There have been numerous reports of issues related to -EPIPE errors
from some newer CDC MBIM devices in particular, like for example the
Fibocom L831-EAU.  Testing on this device has shown that the issues
go away if we simply ignore the -EPIPE status.  Similar handling of
-EPIPE is already known from e.g. usb_get_string()

The -EPIPE log message is still kept to let us track devices with this
unexpected behaviour, hoping that it attracts attention from firmware
developers.

Cc: <[hidden email]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100938
Reported-and-tested-by: Christian Ehrig <[hidden email]>
Reported-and-tested-by: Patrick Chilton <[hidden email]>
Reported-and-tested-by: Andreas Böhler <[hidden email]>
Signed-off-by: Bjørn Mork <[hidden email]>
Acked-by: Oliver Neukum <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>
(backported from commit 8fec9355a968ad240f3a2e9ad55b823cf1cc52ff)
Signed-off-by: Wen-chien Jesse Sung <[hidden email]>
---
 drivers/usb/class/cdc-wdm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 71ad04d54212..c2e3dcfa4ccd 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -188,7 +188,12 @@ static void wdm_in_callback(struct urb *urb)
  }
  }
 
- desc->rerr = status;
+ /*
+ * Avoid propagating -EPIPE (stall) to userspace since it is
+ * better handled as an empty read
+ */
+ desc->rerr = (status != -EPIPE) ? status : 0;
+
  if (length + desc->length > desc->wMaxCommand) {
  /* The buffer would overflow */
  set_bit(WDM_OVERFLOW, &desc->flags);
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [Xenial][PATCH 1/1] USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse

Connor Kuehl
On 11/5/19 1:41 AM, Wen-chien Jesse Sung wrote:
> (backported from commit 8fec9355a968ad240f3a2e9ad55b823cf1cc52ff)
> Signed-off-by: Wen-chien Jesse Sung <[hidden email]>

Acked-by: Connor Kuehl <[hidden email]>

> ---
>   drivers/usb/class/cdc-wdm.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
> index 71ad04d54212..c2e3dcfa4ccd 100644
> --- a/drivers/usb/class/cdc-wdm.c
> +++ b/drivers/usb/class/cdc-wdm.c
> @@ -188,7 +188,12 @@ static void wdm_in_callback(struct urb *urb)
>   }
>   }
>  
> - desc->rerr = status;
> + /*
> + * Avoid propagating -EPIPE (stall) to userspace since it is
> + * better handled as an empty read
> + */
> + desc->rerr = (status != -EPIPE) ? status : 0;
> +
>   if (length + desc->length > desc->wMaxCommand) {
>   /* The buffer would overflow */
>   set_bit(WDM_OVERFLOW, &desc->flags);
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [Xenial][PATCH 1/1] USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse

AceLan Kao
In reply to this post by Wen-chien Jesse Sung
Acked-By: AceLan Kao <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [Xenial][PATCH 0/1] Fix for libmbim-proxy using 100% CPU

Khaled Elmously
In reply to this post by Wen-chien Jesse Sung
On 2019-11-05 17:41:08 , Wen-chien Jesse Sung wrote:

> BugLink: https://launchpad.net/bugs/1851347
>
> == Impact ==
> A Dell Edge Gateway 3002 user reported that `top` reports around
> 100% CPU usage for the libmbim-proxy process. Seen in at least
> 40% of their devices at some point during the last 6 months.
> The CPU keeps high at ~100% days or weeks, but does return back
> to normal without a reboot. The LTE connectivity is seemingly
> still working as usual.
>
> The issue starts after an EPIPE error in the syslogs:
> cdc_mbim 1-3:1.12: nonzero urb status received: -EPIPE
>
> == Fix ==
> 8fec9355a968 USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
> This has been in mainline kernel since 4.14.
>
> == Testcase ==
> Connect to the LTE network and see if libmbim-proxy uses 100% after
> some point.
>
> == Risk of Regression ==
> Low, since
> 1. Already tested by original reporter and no issue found after
>    more than a week (usually it fails twice in a week).
> 2. Ignoring -EPIPE is the default behavior after 4.14.
>
>
> Bjørn Mork (1):
>   USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
>
>  drivers/usb/class/cdc-wdm.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> --
> 2.20.1
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team