[SRU][F][G][H][PATCH 0/1] qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][F][G][H][PATCH 0/1] qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload.

Matthew Ruffell
BugLink: https://bugs.launchpad.net/bugs/1909062

[Impact]

For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series
10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4
kernel, Kubernetes Internal DNS requests will fail, due to these packets getting
corrupted.

Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this
particular packet type is not supported for hardware tx checksum offload, and
the packets end up corrupted when the qede driver attempts to checksum them.

This only affects internal Kubernetes DNS, as regular DNS lookups to regular
external domains will succeed, due to them not using IPIP packet types.

[Fix]

Marvell has developed a fix for the qede driver, which checks the packet type,
and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers
of type IPIP.

commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <[hidden email]>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530

This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream
stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.

Note, this SRU isn't targeted for Bionic due to tx csum offload support only
landing in 5.0 and onward, meaning the 4.15 kernel still works even without this
patch. Because of this, Bionic can pick the patch up naturally from upstream
stable.

[Testcase]

The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part
of a Kubernetes cluster.

Firstly, get a list of all devices in the system:

$ sudo ifconfig

Next, set all devices down with:

$ sudo ifconfig <device> down

Next, bring up the QLogic QL41xxx device:

$ sudo ifconfig <qlogic nic device> up

Then, attempt to lookup an internal Kubernetes domain:

$ nslookup <internal kubernetes domain address>

Without the patch, the connection will time out:

;; connection timed out; no servers could be reached

If we look at packet traces with tcpdump, we see it leaves the source, but never
arrives at the destination.

There is a test kernel available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test

If you install it, then Kubernetes internal DNS lookups will succeed.

[Where problems could occur]

If a regression were to occur, then users of the qede driver would be affected.
This is limited to those with QLogic QL41xxx series NICs. The patch explicitly
checks for IPIP type packets, so only those particular packets would be affected.

Since IPIP type packets are uncommon, it would not cause a total outage on
regression, since most packets are not IPIP tunnelled. It could potentially cause
problems for users who frequently handle VPN or Kubernetes internal DNS traffic.

A workaround would be to use ethtool to disable tx csum offload for all packet
types, or to revert to an older kernel.

Manish Chopra (1):
  qede: fix offload for IPIP tunnel packets

 drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
 1 file changed, 5 insertions(+)

--
2.27.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][F][G][H][PATCH 1/1] qede: fix offload for IPIP tunnel packets

Matthew Ruffell
From: Manish Chopra <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1909062

IPIP tunnels packets are unknown to device,
hence these packets are incorrectly parsed and
caused the packet corruption, so disable offlods
for such packets at run time.

Signed-off-by: Manish Chopra <[hidden email]>
Signed-off-by: Sudarsana Kalluru <[hidden email]>
Signed-off-by: Igor Russkikh <[hidden email]>
Link: https://lore.kernel.org/r/20201221145530.7771-1-manishc@...
Signed-off-by: Jakub Kicinski <[hidden email]>
(cherry picked from commit 5d5647dad259bb416fd5d3d87012760386d97530)
Signed-off-by: Matthew Ruffell <[hidden email]>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 004c0bfec41d..f310a94e0489 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -1737,6 +1737,11 @@ netdev_features_t qede_features_check(struct sk_buff *skb,
       ntohs(udp_hdr(skb)->dest) != gnv_port))
  return features & ~(NETIF_F_CSUM_MASK |
     NETIF_F_GSO_MASK);
+ } else if (l4_proto == IPPROTO_IPIP) {
+ /* IPIP tunnels are unknown to the device or at least unsupported natively,
+ * offloads for them can't be done trivially, so disable them for such skb.
+ */
+ return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
  }
  }
 
--
2.27.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED H: Re: [SRU][F][G][H][PATCH 1/1] qede: fix offload for IPIP tunnel packets

Paolo Pisati-5
On Fri, Jan 15, 2021 at 11:12:43AM +1300, Matthew Ruffell wrote:
> From: Manish Chopra <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1909062

Already part of 5.10 stable updates, so consider this applied in H+.
--
bye,
p.

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][F][G][H][PATCH 0/1] qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload.

William Breathitt Gray
In reply to this post by Matthew Ruffell
On Fri, Jan 15, 2021 at 11:12:42AM +1300, Matthew Ruffell wrote:

> BugLink: https://bugs.launchpad.net/bugs/1909062
>
> [Impact]
>
> For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series
> 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4
> kernel, Kubernetes Internal DNS requests will fail, due to these packets getting
> corrupted.
>
> Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this
> particular packet type is not supported for hardware tx checksum offload, and
> the packets end up corrupted when the qede driver attempts to checksum them.
>
> This only affects internal Kubernetes DNS, as regular DNS lookups to regular
> external domains will succeed, due to them not using IPIP packet types.
>
> [Fix]
>
> Marvell has developed a fix for the qede driver, which checks the packet type,
> and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers
> of type IPIP.
>
> commit 5d5647dad259bb416fd5d3d87012760386d97530
> Author: Manish Chopra <[hidden email]>
> Date: Mon Dec 21 06:55:30 2020 -0800
> Subject: qede: fix offload for IPIP tunnel packets
> Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
>
> This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream
> stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
>
> Note, this SRU isn't targeted for Bionic due to tx csum offload support only
> landing in 5.0 and onward, meaning the 4.15 kernel still works even without this
> patch. Because of this, Bionic can pick the patch up naturally from upstream
> stable.
>
> [Testcase]
>
> The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part
> of a Kubernetes cluster.
>
> Firstly, get a list of all devices in the system:
>
> $ sudo ifconfig
>
> Next, set all devices down with:
>
> $ sudo ifconfig <device> down
>
> Next, bring up the QLogic QL41xxx device:
>
> $ sudo ifconfig <qlogic nic device> up
>
> Then, attempt to lookup an internal Kubernetes domain:
>
> $ nslookup <internal kubernetes domain address>
>
> Without the patch, the connection will time out:
>
> ;; connection timed out; no servers could be reached
>
> If we look at packet traces with tcpdump, we see it leaves the source, but never
> arrives at the destination.
>
> There is a test kernel available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
>
> If you install it, then Kubernetes internal DNS lookups will succeed.
>
> [Where problems could occur]
>
> If a regression were to occur, then users of the qede driver would be affected.
> This is limited to those with QLogic QL41xxx series NICs. The patch explicitly
> checks for IPIP type packets, so only those particular packets would be affected.
>
> Since IPIP type packets are uncommon, it would not cause a total outage on
> regression, since most packets are not IPIP tunnelled. It could potentially cause
> problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
>
> A workaround would be to use ethtool to disable tx csum offload for all packet
> types, or to revert to an older kernel.
>
> Manish Chopra (1):
>   qede: fix offload for IPIP tunnel packets
>
>  drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Acked-by: William Breathitt Gray <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][F][G][H][PATCH 0/1] qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload.

Marcelo Henrique Cerri
In reply to this post by Matthew Ruffell
Acked-by: Marcelo Henrique Cerri <[hidden email]>

On Fri, Jan 15, 2021 at 11:12:42AM +1300, Matthew Ruffell wrote:

> BugLink: https://bugs.launchpad.net/bugs/1909062
>
> [Impact]
>
> For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series
> 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4
> kernel, Kubernetes Internal DNS requests will fail, due to these packets getting
> corrupted.
>
> Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this
> particular packet type is not supported for hardware tx checksum offload, and
> the packets end up corrupted when the qede driver attempts to checksum them.
>
> This only affects internal Kubernetes DNS, as regular DNS lookups to regular
> external domains will succeed, due to them not using IPIP packet types.
>
> [Fix]
>
> Marvell has developed a fix for the qede driver, which checks the packet type,
> and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers
> of type IPIP.
>
> commit 5d5647dad259bb416fd5d3d87012760386d97530
> Author: Manish Chopra <[hidden email]>
> Date: Mon Dec 21 06:55:30 2020 -0800
> Subject: qede: fix offload for IPIP tunnel packets
> Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
>
> This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream
> stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
>
> Note, this SRU isn't targeted for Bionic due to tx csum offload support only
> landing in 5.0 and onward, meaning the 4.15 kernel still works even without this
> patch. Because of this, Bionic can pick the patch up naturally from upstream
> stable.
>
> [Testcase]
>
> The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part
> of a Kubernetes cluster.
>
> Firstly, get a list of all devices in the system:
>
> $ sudo ifconfig
>
> Next, set all devices down with:
>
> $ sudo ifconfig <device> down
>
> Next, bring up the QLogic QL41xxx device:
>
> $ sudo ifconfig <qlogic nic device> up
>
> Then, attempt to lookup an internal Kubernetes domain:
>
> $ nslookup <internal kubernetes domain address>
>
> Without the patch, the connection will time out:
>
> ;; connection timed out; no servers could be reached
>
> If we look at packet traces with tcpdump, we see it leaves the source, but never
> arrives at the destination.
>
> There is a test kernel available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
>
> If you install it, then Kubernetes internal DNS lookups will succeed.
>
> [Where problems could occur]
>
> If a regression were to occur, then users of the qede driver would be affected.
> This is limited to those with QLogic QL41xxx series NICs. The patch explicitly
> checks for IPIP type packets, so only those particular packets would be affected.
>
> Since IPIP type packets are uncommon, it would not cause a total outage on
> regression, since most packets are not IPIP tunnelled. It could potentially cause
> problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
>
> A workaround would be to use ethtool to disable tx csum offload for all packet
> types, or to revert to an older kernel.
>
> Manish Chopra (1):
>   qede: fix offload for IPIP tunnel packets
>
>  drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
--
Regards,
Marcelo


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED[F/G]: [SRU][F][G][H][PATCH 0/1] qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload.

Kelsey Skunberg
In reply to this post by Matthew Ruffell
Applied to F/G master-next. thank you!

-Kelsey

On 2021-01-15 11:12:42 , Matthew Ruffell wrote:

> BugLink: https://bugs.launchpad.net/bugs/1909062
>
> [Impact]
>
> For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series
> 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4
> kernel, Kubernetes Internal DNS requests will fail, due to these packets getting
> corrupted.
>
> Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this
> particular packet type is not supported for hardware tx checksum offload, and
> the packets end up corrupted when the qede driver attempts to checksum them.
>
> This only affects internal Kubernetes DNS, as regular DNS lookups to regular
> external domains will succeed, due to them not using IPIP packet types.
>
> [Fix]
>
> Marvell has developed a fix for the qede driver, which checks the packet type,
> and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers
> of type IPIP.
>
> commit 5d5647dad259bb416fd5d3d87012760386d97530
> Author: Manish Chopra <[hidden email]>
> Date: Mon Dec 21 06:55:30 2020 -0800
> Subject: qede: fix offload for IPIP tunnel packets
> Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
>
> This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream
> stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
>
> Note, this SRU isn't targeted for Bionic due to tx csum offload support only
> landing in 5.0 and onward, meaning the 4.15 kernel still works even without this
> patch. Because of this, Bionic can pick the patch up naturally from upstream
> stable.
>
> [Testcase]
>
> The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part
> of a Kubernetes cluster.
>
> Firstly, get a list of all devices in the system:
>
> $ sudo ifconfig
>
> Next, set all devices down with:
>
> $ sudo ifconfig <device> down
>
> Next, bring up the QLogic QL41xxx device:
>
> $ sudo ifconfig <qlogic nic device> up
>
> Then, attempt to lookup an internal Kubernetes domain:
>
> $ nslookup <internal kubernetes domain address>
>
> Without the patch, the connection will time out:
>
> ;; connection timed out; no servers could be reached
>
> If we look at packet traces with tcpdump, we see it leaves the source, but never
> arrives at the destination.
>
> There is a test kernel available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
>
> If you install it, then Kubernetes internal DNS lookups will succeed.
>
> [Where problems could occur]
>
> If a regression were to occur, then users of the qede driver would be affected.
> This is limited to those with QLogic QL41xxx series NICs. The patch explicitly
> checks for IPIP type packets, so only those particular packets would be affected.
>
> Since IPIP type packets are uncommon, it would not cause a total outage on
> regression, since most packets are not IPIP tunnelled. It could potentially cause
> problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
>
> A workaround would be to use ethtool to disable tx csum offload for all packet
> types, or to revert to an older kernel.
>
> Manish Chopra (1):
>   qede: fix offload for IPIP tunnel packets
>
>  drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team