[SRU][Artful][PATCH 0/2] Fixes for LP:1748408

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][Artful][PATCH 0/2] Fixes for LP:1748408

Joseph Salisbury-3
BugLink: http://bugs.launchpad.net/bugs/1748408

== SRU Justification ==
The bug reporter was seeing an OOM on multiple servers after upgrading from
previous 4.10 series HWE kernels to the new 4.13 HWE series. With the new
kernel, free memory is continously decreasing at a high rate and the servers
start swapping and finally OOMing services within days. With the 4.10 kernel,
decrease of free memory is slower and stabilizes after a while.

It was found that upstream commits 2b9478ffc550 and 62b4c6694dfd resolve
this issue.

== Fixes ==
2b9478ffc550("i40e: Fix memory leak related filter programming status")
62b4c6694dfd("i40e: Add programming descriptors to cleaned_count")

== Regression Potential ==
Low.  Limited to i40e and fix existing regression.

== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

Alexander Duyck (2):
  i40e: Fix memory leak related filter programming status
  i40e: Add programming descriptors to cleaned_count

 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 64 +++++++++++++++++------------
 1 file changed, 37 insertions(+), 27 deletions(-)

--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Artful][PATCH 1/2] i40e: Fix memory leak related filter programming status

Joseph Salisbury-3
From: Alexander Duyck <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1748408

It looks like we weren't correctly placing the pages from buffers that had
been used to return a filter programming status back on the ring. As a
result they were being overwritten and tracking of the pages was lost.

This change works to correct that by incorporating part of
i40e_put_rx_buffer into the programming status handler code. As a result we
should now be correctly placing the pages for those buffers on the
re-allocation list instead of letting them stay in place.

Fixes: 0e626ff7ccbf ("i40e: Fix support for flow director programming status")
Reported-by: Anders K. Pedersen <[hidden email]>
Signed-off-by: Alexander Duyck <[hidden email]>
Tested-by: Anders K Pedersen <[hidden email]>
Signed-off-by: Jeff Kirsher <[hidden email]>
(cherry picked from commit 2b9478ffc550f17c6cd8c69057234e91150f5972)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 63 ++++++++++++++++-------------
 1 file changed, 36 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 2194960..391b187 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1043,6 +1043,32 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 }
 
 /**
+ * i40e_reuse_rx_page - page flip buffer and store it back on the ring
+ * @rx_ring: rx descriptor ring to store buffers on
+ * @old_buff: donor buffer to have page reused
+ *
+ * Synchronizes page for reuse by the adapter
+ **/
+static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
+       struct i40e_rx_buffer *old_buff)
+{
+ struct i40e_rx_buffer *new_buff;
+ u16 nta = rx_ring->next_to_alloc;
+
+ new_buff = &rx_ring->rx_bi[nta];
+
+ /* update, and store next to alloc */
+ nta++;
+ rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+ /* transfer page from old buffer to new buffer */
+ new_buff->dma = old_buff->dma;
+ new_buff->page = old_buff->page;
+ new_buff->page_offset = old_buff->page_offset;
+ new_buff->pagecnt_bias = old_buff->pagecnt_bias;
+}
+
+/**
  * i40e_rx_is_programming_status - check for programming status descriptor
  * @qw: qword representing status_error_len in CPU ordering
  *
@@ -1076,15 +1102,24 @@ static void i40e_clean_programming_status(struct i40e_ring *rx_ring,
   union i40e_rx_desc *rx_desc,
   u64 qw)
 {
- u32 ntc = rx_ring->next_to_clean + 1;
+ struct i40e_rx_buffer *rx_buffer;
+ u32 ntc = rx_ring->next_to_clean;
  u8 id;
 
  /* fetch, update, and store next to clean */
+ rx_buffer = &rx_ring->rx_bi[ntc++];
  ntc = (ntc < rx_ring->count) ? ntc : 0;
  rx_ring->next_to_clean = ntc;
 
  prefetch(I40E_RX_DESC(rx_ring, ntc));
 
+ /* place unused page back on the ring */
+ i40e_reuse_rx_page(rx_ring, rx_buffer);
+ rx_ring->rx_stats.page_reuse_count++;
+
+ /* clear contents of buffer_info */
+ rx_buffer->page = NULL;
+
  id = (qw & I40E_RX_PROG_STATUS_DESC_QW1_PROGID_MASK) >>
   I40E_RX_PROG_STATUS_DESC_QW1_PROGID_SHIFT;
 
@@ -1644,32 +1679,6 @@ static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb,
 }
 
 /**
- * i40e_reuse_rx_page - page flip buffer and store it back on the ring
- * @rx_ring: rx descriptor ring to store buffers on
- * @old_buff: donor buffer to have page reused
- *
- * Synchronizes page for reuse by the adapter
- **/
-static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
-       struct i40e_rx_buffer *old_buff)
-{
- struct i40e_rx_buffer *new_buff;
- u16 nta = rx_ring->next_to_alloc;
-
- new_buff = &rx_ring->rx_bi[nta];
-
- /* update, and store next to alloc */
- nta++;
- rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
-
- /* transfer page from old buffer to new buffer */
- new_buff->dma = old_buff->dma;
- new_buff->page = old_buff->page;
- new_buff->page_offset = old_buff->page_offset;
- new_buff->pagecnt_bias = old_buff->pagecnt_bias;
-}
-
-/**
  * i40e_page_is_reusable - check if any reuse is possible
  * @page: page struct to check
  *
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Artful][PATCH 2/2] i40e: Add programming descriptors to cleaned_count

Joseph Salisbury-3
In reply to this post by Joseph Salisbury-3
From: Alexander Duyck <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1748408

This patch updates the i40e driver to include programming descriptors in
the cleaned_count. Without this change it becomes possible for us to leak
memory as we don't trigger a large enough allocation when the time comes to
allocate new buffers and we end up overwriting a number of rx_buffers equal
to the number of programming descriptors we encountered.

Fixes: 0e626ff7ccbf ("i40e: Fix support for flow director programming status")
Signed-off-by: Alexander Duyck <[hidden email]>
Tested-by: Anders K. Pedersen <[hidden email]>
Signed-off-by: Jeff Kirsher <[hidden email]>
(cherry picked from commit 62b4c6694dfd3821bd5ea5bed48238bbabd5fe8b)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 391b187..d970e23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2107,6 +2107,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 
  if (unlikely(i40e_rx_is_programming_status(qword))) {
  i40e_clean_programming_status(rx_ring, rx_desc, qword);
+ cleaned_count++;
  continue;
  }
  size = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Artful][PATCH 0/2] Fixes for LP:1748408

Thadeu Lima de Souza Cascardo-3
In reply to this post by Joseph Salisbury-3
Clean cherry picks, restricted to a single driver, tested by reporter.

Acked-by: Thadeu Lima de Souza Cascardo <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Artful][PATCH 0/2] Fixes for LP:1748408

Stefan Bader-2
In reply to this post by Joseph Salisbury-3
On 09.03.2018 09:12, Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1748408
>
> == SRU Justification ==
> The bug reporter was seeing an OOM on multiple servers after upgrading from
> previous 4.10 series HWE kernels to the new 4.13 HWE series. With the new
> kernel, free memory is continously decreasing at a high rate and the servers
> start swapping and finally OOMing services within days. With the 4.10 kernel,
> decrease of free memory is slower and stabilizes after a while.
>
> It was found that upstream commits 2b9478ffc550 and 62b4c6694dfd resolve
> this issue.
>
> == Fixes ==
> 2b9478ffc550("i40e: Fix memory leak related filter programming status")
> 62b4c6694dfd("i40e: Add programming descriptors to cleaned_count")
>
> == Regression Potential ==
> Low.  Limited to i40e and fix existing regression.
>
> == Test Case ==
> A test kernel was built with these patches and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
> Alexander Duyck (2):
>   i40e: Fix memory leak related filter programming status
>   i40e: Add programming descriptors to cleaned_count
>
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 64 +++++++++++++++++------------
>  1 file changed, 37 insertions(+), 27 deletions(-)
>
Acked-by: Stefan Bader <[hidden email]>



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED[Artful/backlog]: [SRU][Artful][PATCH 0/2] Fixes for LP:1748408

Kleber Souza
In reply to this post by Joseph Salisbury-3
On 03/09/18 09:12, Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1748408
>
> == SRU Justification ==
> The bug reporter was seeing an OOM on multiple servers after upgrading from
> previous 4.10 series HWE kernels to the new 4.13 HWE series. With the new
> kernel, free memory is continously decreasing at a high rate and the servers
> start swapping and finally OOMing services within days. With the 4.10 kernel,
> decrease of free memory is slower and stabilizes after a while.
>
> It was found that upstream commits 2b9478ffc550 and 62b4c6694dfd resolve
> this issue.
>
> == Fixes ==
> 2b9478ffc550("i40e: Fix memory leak related filter programming status")
> 62b4c6694dfd("i40e: Add programming descriptors to cleaned_count")
>
> == Regression Potential ==
> Low.  Limited to i40e and fix existing regression.
>
> == Test Case ==
> A test kernel was built with these patches and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
> Alexander Duyck (2):
>   i40e: Fix memory leak related filter programming status
>   i40e: Add programming descriptors to cleaned_count
>
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 64 +++++++++++++++++------------
>  1 file changed, 37 insertions(+), 27 deletions(-)
>

Applied to artful/master-next-backlog branch.

Thanks,
Kleber

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team