[SRU][X][PATCH 0/1] LP: #1750038 - stuck in D state in NFS op

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][X][PATCH 0/1] LP: #1750038 - stuck in D state in NFS op

Daniel Axtens
From: Daniel Axtens <[hidden email]>

== SRU Justification ==

[Impact]

Occasionally an application gets stuck in "D" state on NFS reads/sync
and close system calls. All the subsequent operations on the NFS
mounts are stuck and a reboot is required to rectify the situation.

[Fix]

Use GPF_NOIO in some allocations in writeback to avoid a
deadlock. This is upstream in: ae97aa524ef4 ("NFS: Use GFP_NOIO for
two allocations in writeback")

[Testcase]
See "Test scenario" in the previous description below.

A test kernel with this patch was tested heavily (>100hrs of test
suite) without issue.

[Regression Potential]
This changes memory allocation in NFS to use a different policy. This
could potentially affect NFS, or increase the general risk of complex
OOMs.

However, the patch is already in Artful and Bionic without issue.

The patch does not apply to Trusty.

== Previous Description ==

Using Ubuntu Xenial user reports processes hang in D state waiting for
disk io.

Ocassionally one of the applications gets into "D" state on NFS
reads/sync and close system calls. based on the kernel backtraces
seems to be stuck in kmalloc allocation during cleanup of dirty NFS
pages.

All the subsequent operations on the NFS mounts are stuck and reboot
is required to rectify the situation.

[Test scenario]

1) Applications running in Docker environment
2) Application have cgroup limits --cpu-shares --memory -shm-limit
3) python and C++ based applications (torch and caffe)
4) Applications read big lmdb files and write results to NFS shares
5) use NFS v3 , hard and fscache is enabled
6) now swap space is configured

This prevents all other I/O activity on that mount to hang.

we are running into this issue more frequently and identified few
applications causing this problem.

As updated in the description, the problem seems to be happening when
exercising the stack

try_to_free_mem_cgroup_pages+0xba/0x1a0

we see this with docker containers with cgroup option --memory
<USER_SPECIFIED_MEM>.

whenever there is a deadlock, we see that the process that is hung has
reached the maximum cgroup limit, multiple times and typically cleans
up dirty data and caches to bring the usage under the limit.

This reclaim path happens many times and finally we hit probably a
race get into deadlock

Benjamin Coddington (1):
  NFS: Use GFP_NOIO for two allocations in writeback

 fs/nfs/pagelist.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--
2.17.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][X][PATCH 1/1] NFS: Use GFP_NOIO for two allocations in writeback

Daniel Axtens
From: Benjamin Coddington <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1750038

Prevent a deadlock that can occur if we wait on allocations
that try to write back our pages.

Signed-off-by: Benjamin Coddington <[hidden email]>
Fixes: 00bfa30abe869 ("NFS: Create a common pgio_alloc and pgio_release...")
Cc: [hidden email] # 3.16+
Signed-off-by: Trond Myklebust <[hidden email]>
(backported from commit ae97aa524ef495b6276fd26f5d5449fb22975d7c)
Signed-off-by: Daniel Axtens <[hidden email]>
---
 fs/nfs/pagelist.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 8ebfdd00044b..1dc47d1aa7ce 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -29,13 +29,14 @@
 static struct kmem_cache *nfs_page_cachep;
 static const struct rpc_call_ops nfs_pgio_common_ops;
 
-static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
+static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount,
+ gfp_t gfp_flags)
 {
  p->npages = pagecount;
  if (pagecount <= ARRAY_SIZE(p->page_array))
  p->pagevec = p->page_array;
  else {
- p->pagevec = kcalloc(pagecount, sizeof(struct page *), GFP_KERNEL);
+ p->pagevec = kcalloc(pagecount, sizeof(struct page *), gfp_flags);
  if (!p->pagevec)
  p->npages = 0;
  }
@@ -722,6 +723,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 {
  struct nfs_pgio_mirror *new;
  int i;
+ gfp_t gfp_flags = GFP_KERNEL;
 
  desc->pg_moreio = 0;
  desc->pg_inode = inode;
@@ -741,8 +743,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
  if (pg_ops->pg_get_mirror_count) {
  /* until we have a request, we don't have an lseg and no
  * idea how many mirrors there will be */
+ if (desc->pg_rw_ops->rw_mode == FMODE_WRITE)
+ gfp_flags = GFP_NOIO;
  new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
-      sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
+      sizeof(struct nfs_pgio_mirror), gfp_flags);
  desc->pg_mirrors_dynamic = new;
  desc->pg_mirrors = new;
 
@@ -796,10 +800,14 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
  struct list_head *head = &mirror->pg_list;
  struct nfs_commit_info cinfo;
  unsigned int pagecount, pageused;
+ gfp_t gfp_flags = GFP_KERNEL;
 
  pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
- if (!nfs_pgarray_set(&hdr->page_array, pagecount))
+ if (desc->pg_rw_ops->rw_mode == FMODE_WRITE)
+ gfp_flags = GFP_NOIO;
+ if (!nfs_pgarray_set(&hdr->page_array, pagecount, gfp_flags)) {
  return nfs_pgio_error(desc, hdr);
+ }
 
  nfs_init_cinfo(&cinfo, desc->pg_inode, desc->pg_dreq);
  pages = hdr->page_array.pagevec;
--
2.17.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][X][PATCH 1/1] NFS: Use GFP_NOIO for two allocations in writeback

Juerg Haefliger
On 05/08/2018 08:34 AM, Daniel Axtens wrote:

> From: Benjamin Coddington <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1750038
>
> Prevent a deadlock that can occur if we wait on allocations
> that try to write back our pages.
>
> Signed-off-by: Benjamin Coddington <[hidden email]>
> Fixes: 00bfa30abe869 ("NFS: Create a common pgio_alloc and pgio_release...")
> Cc: [hidden email] # 3.16+
> Signed-off-by: Trond Myklebust <[hidden email]>
> (backported from commit ae97aa524ef495b6276fd26f5d5449fb22975d7c)
> Signed-off-by: Daniel Axtens <[hidden email]>
Looks reasonable.

Acked-by: Juerg Haefliger <[hidden email]>


> ---
>  fs/nfs/pagelist.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 8ebfdd00044b..1dc47d1aa7ce 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -29,13 +29,14 @@
>  static struct kmem_cache *nfs_page_cachep;
>  static const struct rpc_call_ops nfs_pgio_common_ops;
>  
> -static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
> +static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount,
> + gfp_t gfp_flags)
>  {
>   p->npages = pagecount;
>   if (pagecount <= ARRAY_SIZE(p->page_array))
>   p->pagevec = p->page_array;
>   else {
> - p->pagevec = kcalloc(pagecount, sizeof(struct page *), GFP_KERNEL);
> + p->pagevec = kcalloc(pagecount, sizeof(struct page *), gfp_flags);
>   if (!p->pagevec)
>   p->npages = 0;
>   }
> @@ -722,6 +723,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>  {
>   struct nfs_pgio_mirror *new;
>   int i;
> + gfp_t gfp_flags = GFP_KERNEL;
>  
>   desc->pg_moreio = 0;
>   desc->pg_inode = inode;
> @@ -741,8 +743,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>   if (pg_ops->pg_get_mirror_count) {
>   /* until we have a request, we don't have an lseg and no
>   * idea how many mirrors there will be */
> + if (desc->pg_rw_ops->rw_mode == FMODE_WRITE)
> + gfp_flags = GFP_NOIO;
>   new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
> -      sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
> +      sizeof(struct nfs_pgio_mirror), gfp_flags);
>   desc->pg_mirrors_dynamic = new;
>   desc->pg_mirrors = new;
>  
> @@ -796,10 +800,14 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>   struct list_head *head = &mirror->pg_list;
>   struct nfs_commit_info cinfo;
>   unsigned int pagecount, pageused;
> + gfp_t gfp_flags = GFP_KERNEL;
>  
>   pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
> - if (!nfs_pgarray_set(&hdr->page_array, pagecount))
> + if (desc->pg_rw_ops->rw_mode == FMODE_WRITE)
> + gfp_flags = GFP_NOIO;
> + if (!nfs_pgarray_set(&hdr->page_array, pagecount, gfp_flags)) {
>   return nfs_pgio_error(desc, hdr);
> + }
>  
>   nfs_init_cinfo(&cinfo, desc->pg_inode, desc->pg_dreq);
>   pages = hdr->page_array.pagevec;
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][X][PATCH 1/1] NFS: Use GFP_NOIO for two allocations in writeback

Kleber Souza
In reply to this post by Daniel Axtens
On 05/08/18 08:34, Daniel Axtens wrote:

> From: Benjamin Coddington <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1750038
>
> Prevent a deadlock that can occur if we wait on allocations
> that try to write back our pages.
>
> Signed-off-by: Benjamin Coddington <[hidden email]>
> Fixes: 00bfa30abe869 ("NFS: Create a common pgio_alloc and pgio_release...")
> Cc: [hidden email] # 3.16+
> Signed-off-by: Trond Myklebust <[hidden email]>
> (backported from commit ae97aa524ef495b6276fd26f5d5449fb22975d7c)
> Signed-off-by: Daniel Axtens <[hidden email]>

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

> ---
>  fs/nfs/pagelist.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 8ebfdd00044b..1dc47d1aa7ce 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -29,13 +29,14 @@
>  static struct kmem_cache *nfs_page_cachep;
>  static const struct rpc_call_ops nfs_pgio_common_ops;
>  
> -static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
> +static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount,
> + gfp_t gfp_flags)
>  {
>   p->npages = pagecount;
>   if (pagecount <= ARRAY_SIZE(p->page_array))
>   p->pagevec = p->page_array;
>   else {
> - p->pagevec = kcalloc(pagecount, sizeof(struct page *), GFP_KERNEL);
> + p->pagevec = kcalloc(pagecount, sizeof(struct page *), gfp_flags);
>   if (!p->pagevec)
>   p->npages = 0;
>   }
> @@ -722,6 +723,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>  {
>   struct nfs_pgio_mirror *new;
>   int i;
> + gfp_t gfp_flags = GFP_KERNEL;
>  
>   desc->pg_moreio = 0;
>   desc->pg_inode = inode;
> @@ -741,8 +743,10 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>   if (pg_ops->pg_get_mirror_count) {
>   /* until we have a request, we don't have an lseg and no
>   * idea how many mirrors there will be */
> + if (desc->pg_rw_ops->rw_mode == FMODE_WRITE)
> + gfp_flags = GFP_NOIO;
>   new = kcalloc(NFS_PAGEIO_DESCRIPTOR_MIRROR_MAX,
> -      sizeof(struct nfs_pgio_mirror), GFP_KERNEL);
> +      sizeof(struct nfs_pgio_mirror), gfp_flags);
>   desc->pg_mirrors_dynamic = new;
>   desc->pg_mirrors = new;
>  
> @@ -796,10 +800,14 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
>   struct list_head *head = &mirror->pg_list;
>   struct nfs_commit_info cinfo;
>   unsigned int pagecount, pageused;
> + gfp_t gfp_flags = GFP_KERNEL;
>  
>   pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
> - if (!nfs_pgarray_set(&hdr->page_array, pagecount))
> + if (desc->pg_rw_ops->rw_mode == FMODE_WRITE)
> + gfp_flags = GFP_NOIO;
> + if (!nfs_pgarray_set(&hdr->page_array, pagecount, gfp_flags)) {
>   return nfs_pgio_error(desc, hdr);
> + }
>  
>   nfs_init_cinfo(&cinfo, desc->pg_inode, desc->pg_dreq);
>   pages = hdr->page_array.pagevec;
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][X][PATCH 0/1] LP: #1750038 - stuck in D state in NFS op

Kleber Souza
In reply to this post by Daniel Axtens
On 05/08/18 08:34, Daniel Axtens wrote:

> From: Daniel Axtens <[hidden email]>
>
> == SRU Justification ==
>
> [Impact]
>
> Occasionally an application gets stuck in "D" state on NFS reads/sync
> and close system calls. All the subsequent operations on the NFS
> mounts are stuck and a reboot is required to rectify the situation.
>
> [Fix]
>
> Use GPF_NOIO in some allocations in writeback to avoid a
> deadlock. This is upstream in: ae97aa524ef4 ("NFS: Use GFP_NOIO for
> two allocations in writeback")
>
> [Testcase]
> See "Test scenario" in the previous description below.
>
> A test kernel with this patch was tested heavily (>100hrs of test
> suite) without issue.
>
> [Regression Potential]
> This changes memory allocation in NFS to use a different policy. This
> could potentially affect NFS, or increase the general risk of complex
> OOMs.
>
> However, the patch is already in Artful and Bionic without issue.
>
> The patch does not apply to Trusty.
>
> == Previous Description ==
>
> Using Ubuntu Xenial user reports processes hang in D state waiting for
> disk io.
>
> Ocassionally one of the applications gets into "D" state on NFS
> reads/sync and close system calls. based on the kernel backtraces
> seems to be stuck in kmalloc allocation during cleanup of dirty NFS
> pages.
>
> All the subsequent operations on the NFS mounts are stuck and reboot
> is required to rectify the situation.
>
> [Test scenario]
>
> 1) Applications running in Docker environment
> 2) Application have cgroup limits --cpu-shares --memory -shm-limit
> 3) python and C++ based applications (torch and caffe)
> 4) Applications read big lmdb files and write results to NFS shares
> 5) use NFS v3 , hard and fscache is enabled
> 6) now swap space is configured
>
> This prevents all other I/O activity on that mount to hang.
>
> we are running into this issue more frequently and identified few
> applications causing this problem.
>
> As updated in the description, the problem seems to be happening when
> exercising the stack
>
> try_to_free_mem_cgroup_pages+0xba/0x1a0
>
> we see this with docker containers with cgroup option --memory
> <USER_SPECIFIED_MEM>.
>
> whenever there is a deadlock, we see that the process that is hung has
> reached the maximum cgroup limit, multiple times and typically cleans
> up dirty data and caches to bring the usage under the limit.
>
> This reclaim path happens many times and finally we hit probably a
> race get into deadlock
>
> Benjamin Coddington (1):
>   NFS: Use GFP_NOIO for two allocations in writeback
>
>  fs/nfs/pagelist.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>

Applied to xenial/master-next branch.

Thanks,
Kleber

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team