Quantcast

[Xenial][PATCH 0/1] net: better skb->sender_cpu and skb->napi_id cohabitation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Xenial][PATCH 0/1] net: better skb->sender_cpu and skb->napi_id cohabitation

Leann Ogasawara
From: Leann Ogasawara <[hidden email]>

== Xenial SRU ==
BugLink: http://bugs.launchpad.net/bugs/1673303

We've tried to roll out new firewalls and had to revert back when the
new firewalls almost immediately hung after cutover.

At first we thought it was hardware issues, but after we reproduced it
on 4 different firewalls, we realised it was more likely to be a
problem with the Xenial kernel.

We think we're running into something similar to:

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1579943

We have had success testing the following patch.  We've just migrated
four firewalls that are running with a patched kernel. Previously two
of them would have survived for less than 2 minutes, both have now been
running in production for over an hour.  Please consider the following
patch for a Xenial SRU:

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb

Eric Dumazet (1):
  net: better skb->sender_cpu and skb->napi_id cohabitation

 include/linux/skbuff.h |  3 ---
 net/core/dev.c         | 33 ++++++++++++++++-----------------
 2 files changed, 16 insertions(+), 20 deletions(-)

--
2.10.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Xenial][PATCH 1/1] net: better skb->sender_cpu and skb->napi_id cohabitation

Leann Ogasawara
From: Eric Dumazet <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1673303

skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.

We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()

As suggested by Alexei, we could split the space so that
these errors can not happen.

0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.

This will allow proper busy polling support over tunnels.

Signed-off-by: Eric Dumazet <[hidden email]>
Suggested-by: Alexei Starovoitov <[hidden email]>
Acked-by: Alexei Starovoitov <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb)
Signed-off-by: Leann Ogasawara <[hidden email]>
---
 include/linux/skbuff.h |  3 ---
 net/core/dev.c         | 33 ++++++++++++++++-----------------
 2 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 16e6429..bbec150 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1084,9 +1084,6 @@ static inline void skb_copy_hash(struct sk_buff *to, const struct sk_buff *from)
 
 static inline void skb_sender_cpu_clear(struct sk_buff *skb)
 {
-#ifdef CONFIG_XPS
- skb->sender_cpu = 0;
-#endif
 }
 
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
diff --git a/net/core/dev.c b/net/core/dev.c
index 3e3f95e..57f539e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -182,7 +182,7 @@ EXPORT_SYMBOL(dev_base_lock);
 /* protects napi_hash addition/deletion and napi_gen_id */
 static DEFINE_SPINLOCK(napi_hash_lock);
 
-static unsigned int napi_gen_id;
+static unsigned int napi_gen_id = NR_CPUS;
 static DEFINE_HASHTABLE(napi_hash, 8);
 
 static seqcount_t devnet_rename_seq;
@@ -3022,7 +3022,9 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
  int queue_index = 0;
 
 #ifdef CONFIG_XPS
- if (skb->sender_cpu == 0)
+ u32 sender_cpu = skb->sender_cpu - 1;
+
+ if (sender_cpu >= (u32)NR_CPUS)
  skb->sender_cpu = raw_smp_processor_id() + 1;
 #endif
 
@@ -4699,25 +4701,22 @@ EXPORT_SYMBOL_GPL(napi_by_id);
 
 void napi_hash_add(struct napi_struct *napi)
 {
- if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
+ if (test_and_set_bit(NAPI_STATE_HASHED, &napi->state))
+ return;
 
- spin_lock(&napi_hash_lock);
+ spin_lock(&napi_hash_lock);
 
- /* 0 is not a valid id, we also skip an id that is taken
- * we expect both events to be extremely rare
- */
- napi->napi_id = 0;
- while (!napi->napi_id) {
- napi->napi_id = ++napi_gen_id;
- if (napi_by_id(napi->napi_id))
- napi->napi_id = 0;
- }
+ /* 0..NR_CPUS+1 range is reserved for sender_cpu use */
+ do {
+ if (unlikely(++napi_gen_id < NR_CPUS + 1))
+ napi_gen_id = NR_CPUS + 1;
+ } while (napi_by_id(napi_gen_id));
+ napi->napi_id = napi_gen_id;
 
- hlist_add_head_rcu(&napi->napi_hash_node,
- &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+ hlist_add_head_rcu(&napi->napi_hash_node,
+   &napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
 
- spin_unlock(&napi_hash_lock);
- }
+ spin_unlock(&napi_hash_lock);
 }
 EXPORT_SYMBOL_GPL(napi_hash_add);
 
--
2.10.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ACK/cmnt: [Xenial][PATCH 1/1] net: better skb->sender_cpu and skb->napi_id cohabitation

Stefan Bader-2
Looks ok and good testing. Picked from 4.5, so Yakkety+ should be fine. Not sure
when sharing storage started but we have (afaik) not heard about issues with Trusty.

-Stefan


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ACK: [Xenial][PATCH 0/1] net: better skb->sender_cpu and skb->napi_id cohabitation

Tim Gardner-2
In reply to this post by Leann Ogasawara
Loading...