[B/SRU A][PATCH 0/2] Fixes for LP: #1715519/CVE 2018-1000026

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[B/SRU A][PATCH 0/2] Fixes for LP: #1715519/CVE 2018-1000026

Daniel Axtens
From: Daniel Axtens <[hidden email]>

SRU Justification
=================

A ppc64le system runs as a guest under PowerVM. This guest has a bnx2x
card attached, and uses openvswitch to bridge an ibmveth interface for
traffic from other LPARs.

We see the following crash sometimes when running netperf:
May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f2)]MC assert!
May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_mc_assert:720(enP24p1s0f2)]XSTORM_ASSERT_LIST_INDEX 0x2
May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_mc_assert:736(enP24p1s0f2)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e42a7e 0x00462a38 0x00010052
May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_mc_assert:750(enP24p1s0f2)]Chip Revision: everest3, FW Version: 7_13_1
May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_attn_int_deasserted3:4329(enP24p1s0f2)]driver assert
May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_panic_dump:923(enP24p1s0f2)]begin crash dump -----------------
... (dump of registers follows) ...

Subsequent debugging reveals that the packets causing the issue come
through the ibmveth interface - from the AIX LPAR. The veth protocol
is 'special' - communication between LPARs on the same chassis can use
very large (64k) frames to reduce overhead. Normal networks cannot
handle such large packets, so traditionally, the VIOS partition would
signal to the AIX partitions that it was 'special', and AIX would send
regular, ethernet-sized packets to VIOS, which VIOS would then send
out.

This signalling between VIOS and AIX is done in a way that is not
standards-compliant, and so was never made part of Linux. Instead, the
Linux driver has always understood large frames and passed them up the
network stack.

In some cases (e.g. with TCP), multiple TCP segments are coalesced
into one large packet. In Linux, this goes through the generic receive
offload code, using a similar mechanism to GSO. These segments can be
very large which presents as a very large MSS (maximum segment size)
or gso_size.

Normally, the large packet is simply passed to whatever network
application on Linux is going to consume it, and everything is OK.

However, in this case, the packets go through Open vSwitch, and are
then passed to the bnx2x driver. The bnx2x driver/hardware supports
TSO and GSO, but with a restriction: the maximum segment size is
limited to around 9700 bytes. Normally this is more than
adequate. However, if a large packet with very large (>9700 byte) TCP
segments arrives through ibmveth, and is passed to bnx2x, the hardware
will panic.

[Impact]

bnx2x card panics, requiring power cycle to restore functionality.

The workaround is turning off TSO, which prevents the crash as the
kernel resegments *all* packets in software, not just ones that are
too big. This has a performance cost.

[Fix]

Test packet size in bnx2x feature check path and disable GSO if it is
too large.

[Regression Potential]

A/B/X: The changes to the network core are easily reviewed. The
changes to behaviour are limited to the bnx2x card driver.
The most likely failure case is a false-positive on the size check,
which would lead to a performance regression only.

T: This also involves a different change to the networking core to add
the old-style GSO checking, which is more invasive. However the
changes are simple and easily reviewed.

Daniel Axtens (2):
  net: create skb_gso_validate_mac_len()
  bnx2x: disable GSO where gso_size is too big for hardware

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 18 +++++++
 include/linux/skbuff.h                           | 16 ++++++
 net/core/skbuff.c                                | 63 +++++++++++++++++++-----
 net/sched/sch_tbf.c                              | 10 ----
 4 files changed, 84 insertions(+), 23 deletions(-)

--
2.14.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[B/SRU A][PATCH 1/2] net: create skb_gso_validate_mac_len()

Daniel Axtens
From: Daniel Axtens <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1715519
CVE-2018-1000026

If you take a GSO skb, and split it into packets, will the MAC
length (L2 + L3 + L4 headers + payload) of those packets be small
enough to fit within a given length?

Move skb_gso_mac_seglen() to skbuff.h with other related functions
like skb_gso_network_seglen() so we can use it, and then create
skb_gso_validate_mac_len to do the full calculation.

Signed-off-by: Daniel Axtens <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 2b16f048729bf35e6c28a40cbfad07239f9dcd90)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 include/linux/skbuff.h | 16 +++++++++++++
 net/core/skbuff.c      | 63 +++++++++++++++++++++++++++++++++++++++-----------
 net/sched/sch_tbf.c    | 10 --------
 3 files changed, 66 insertions(+), 23 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a38c80e9f91e..247cb89f7956 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3282,6 +3282,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
+bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
@@ -4115,6 +4116,21 @@ static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
  return hdr_len + skb_gso_transport_seglen(skb);
 }
 
+/**
+ * skb_gso_mac_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_mac_seglen is used to determine the real size of the
+ * individual segments, including MAC/L2, Layer3 (IP, IPv6) and L4
+ * headers (TCP/UDP).
+ */
+static inline unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
+{
+ unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
+ return hdr_len + skb_gso_transport_seglen(skb);
+}
+
 /* Local Checksum Offload.
  * Compute outer checksum based on the assumption that the
  * inner checksum will be offloaded later.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 08f574081315..a9435b6a4c9f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4910,37 +4910,74 @@ unsigned int skb_gso_transport_seglen(const struct sk_buff *skb)
 EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
 
 /**
- * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ * skb_gso_size_check - check the skb size, considering GSO_BY_FRAGS
  *
- * @skb: GSO skb
- * @mtu: MTU to validate against
+ * There are a couple of instances where we have a GSO skb, and we
+ * want to determine what size it would be after it is segmented.
  *
- * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
- * once split.
+ * We might want to check:
+ * -    L3+L4+payload size (e.g. IP forwarding)
+ * - L2+L3+L4+payload size (e.g. sanity check before passing to driver)
+ *
+ * This is a helper to do that correctly considering GSO_BY_FRAGS.
+ *
+ * @seg_len: The segmented length (from skb_gso_*_seglen). In the
+ *           GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
+ *
+ * @max_len: The maximum permissible length.
+ *
+ * Returns true if the segmented length <= max length.
  */
-bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
-{
+static inline bool skb_gso_size_check(const struct sk_buff *skb,
+      unsigned int seg_len,
+      unsigned int max_len) {
  const struct skb_shared_info *shinfo = skb_shinfo(skb);
  const struct sk_buff *iter;
- unsigned int hlen;
-
- hlen = skb_gso_network_seglen(skb);
 
  if (shinfo->gso_size != GSO_BY_FRAGS)
- return hlen <= mtu;
+ return seg_len <= max_len;
 
  /* Undo this so we can re-use header sizes */
- hlen -= GSO_BY_FRAGS;
+ seg_len -= GSO_BY_FRAGS;
 
  skb_walk_frags(skb, iter) {
- if (hlen + skb_headlen(iter) > mtu)
+ if (seg_len + skb_headlen(iter) > max_len)
  return false;
  }
 
  return true;
 }
+
+/**
+ * skb_gso_validate_mtu - Return in case such skb fits a given MTU
+ *
+ * @skb: GSO skb
+ * @mtu: MTU to validate against
+ *
+ * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU
+ * once split.
+ */
+bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+ return skb_gso_size_check(skb, skb_gso_network_seglen(skb), mtu);
+}
 EXPORT_SYMBOL_GPL(skb_gso_validate_mtu);
 
+/**
+ * skb_gso_validate_mac_len - Will a split GSO skb fit in a given length?
+ *
+ * @skb: GSO skb
+ * @len: length to validate against
+ *
+ * skb_gso_validate_mac_len validates if a given skb will fit a wanted
+ * length once split, including L2, L3 and L4 headers and the payload.
+ */
+bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len)
+{
+ return skb_gso_size_check(skb, skb_gso_mac_seglen(skb), len);
+}
+EXPORT_SYMBOL_GPL(skb_gso_validate_mac_len);
+
 static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb)
 {
  if (skb_cow(skb, skb_headroom(skb)) < 0) {
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 120f4f365967..635c280f13b8 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -142,16 +142,6 @@ static u64 psched_ns_t2l(const struct psched_ratecfg *r,
  return len;
 }
 
-/*
- * Return length of individual segments of a gso packet,
- * including all headers (MAC, IP, TCP/UDP)
- */
-static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
-{
- unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
- return hdr_len + skb_gso_transport_seglen(skb);
-}
-
 /* GSO packet is too big, segment it so that tbf can transmit
  * each segment in time
  */
--
2.14.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[B/SRU A][PATCH 2/2] bnx2x: disable GSO where gso_size is too big for hardware

Daniel Axtens
In reply to this post by Daniel Axtens
From: Daniel Axtens <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1715519
CVE-2018-1000026

If a bnx2x card is passed a GSO packet with a gso_size larger than
~9700 bytes, it will cause a firmware error that will bring the card
down:

bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f0)]MC assert!
bnx2x: [bnx2x_mc_assert:720(enP24p1s0f0)]XSTORM_ASSERT_LIST_INDEX 0x2
bnx2x: [bnx2x_mc_assert:736(enP24p1s0f0)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e43e47 0x00463e01 0x00010052
bnx2x: [bnx2x_mc_assert:750(enP24p1s0f0)]Chip Revision: everest3, FW Version: 7_13_1
... (dump of values continues) ...

Detect when the mac length of a GSO packet is greater than the maximum
packet size (9700 bytes) and disable GSO.

Signed-off-by: Daniel Axtens <[hidden email]>
Reviewed-by: Eric Dumazet <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 8914a595110a6eca69a5e275b323f5d09e18f4f9)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index ddd5d3ebd201..eefb8f64136e 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12934,6 +12934,24 @@ static netdev_features_t bnx2x_features_check(struct sk_buff *skb,
       struct net_device *dev,
       netdev_features_t features)
 {
+ /*
+ * A skb with gso_size + header length > 9700 will cause a
+ * firmware panic. Drop GSO support.
+ *
+ * Eventually the upper layer should not pass these packets down.
+ *
+ * For speed, if the gso_size is <= 9000, assume there will
+ * not be 700 bytes of headers and pass it through. Only do a
+ * full (slow) validation if the gso_size is > 9000.
+ *
+ * (Due to the way SKB_BY_FRAGS works this will also do a full
+ * validation in that case.)
+ */
+ if (unlikely(skb_is_gso(skb) &&
+     (skb_shinfo(skb)->gso_size > 9000) &&
+     !skb_gso_validate_mac_len(skb, 9700)))
+ features &= ~NETIF_F_GSO_MASK;
+
  features = vlan_features_check(skb, features);
  return vxlan_features_check(skb, features);
 }
--
2.14.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK / APPLIED[B]: [B/SRU A][PATCH 0/2] Fixes for LP: #1715519/CVE 2018-1000026

Seth Forshee
In reply to this post by Daniel Axtens
On Mon, Feb 12, 2018 at 03:43:34PM +1100, Daniel Axtens wrote:

> From: Daniel Axtens <[hidden email]>
>
> SRU Justification
> =================
>
> A ppc64le system runs as a guest under PowerVM. This guest has a bnx2x
> card attached, and uses openvswitch to bridge an ibmveth interface for
> traffic from other LPARs.
>
> We see the following crash sometimes when running netperf:
> May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_attn_int_deasserted3:4323(enP24p1s0f2)]MC assert!
> May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_mc_assert:720(enP24p1s0f2)]XSTORM_ASSERT_LIST_INDEX 0x2
> May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_mc_assert:736(enP24p1s0f2)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 0x25e42a7e 0x00462a38 0x00010052
> May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_mc_assert:750(enP24p1s0f2)]Chip Revision: everest3, FW Version: 7_13_1
> May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_attn_int_deasserted3:4329(enP24p1s0f2)]driver assert
> May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: [bnx2x_panic_dump:923(enP24p1s0f2)]begin crash dump -----------------
> ... (dump of registers follows) ...
>
> Subsequent debugging reveals that the packets causing the issue come
> through the ibmveth interface - from the AIX LPAR. The veth protocol
> is 'special' - communication between LPARs on the same chassis can use
> very large (64k) frames to reduce overhead. Normal networks cannot
> handle such large packets, so traditionally, the VIOS partition would
> signal to the AIX partitions that it was 'special', and AIX would send
> regular, ethernet-sized packets to VIOS, which VIOS would then send
> out.
>
> This signalling between VIOS and AIX is done in a way that is not
> standards-compliant, and so was never made part of Linux. Instead, the
> Linux driver has always understood large frames and passed them up the
> network stack.
>
> In some cases (e.g. with TCP), multiple TCP segments are coalesced
> into one large packet. In Linux, this goes through the generic receive
> offload code, using a similar mechanism to GSO. These segments can be
> very large which presents as a very large MSS (maximum segment size)
> or gso_size.
>
> Normally, the large packet is simply passed to whatever network
> application on Linux is going to consume it, and everything is OK.
>
> However, in this case, the packets go through Open vSwitch, and are
> then passed to the bnx2x driver. The bnx2x driver/hardware supports
> TSO and GSO, but with a restriction: the maximum segment size is
> limited to around 9700 bytes. Normally this is more than
> adequate. However, if a large packet with very large (>9700 byte) TCP
> segments arrives through ibmveth, and is passed to bnx2x, the hardware
> will panic.
>
> [Impact]
>
> bnx2x card panics, requiring power cycle to restore functionality.
>
> The workaround is turning off TSO, which prevents the crash as the
> kernel resegments *all* packets in software, not just ones that are
> too big. This has a performance cost.
>
> [Fix]
>
> Test packet size in bnx2x feature check path and disable GSO if it is
> too large.
>
> [Regression Potential]
>
> A/B/X: The changes to the network core are easily reviewed. The
> changes to behaviour are limited to the bnx2x card driver.
> The most likely failure case is a false-positive on the size check,
> which would lead to a performance regression only.

Acked-by: Seth Forshee <[hidden email]>

Applied to bionic/master-next, thanks!

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team