Yakkety SRU - ubuntu16.04.2: number of numa_miss and numa_foreign wrong in numastat

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Yakkety SRU - ubuntu16.04.2: number of numa_miss and numa_foreign wrong in numastat

Tim Gardner-2
https://bugs.launchpad.net/bugs/1672953

[PATCH 1/2] mm: fix remote numa hits statistics
[PATCH 2/2] mm: get rid of __GFP_OTHER_NODE

rtg

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/2] mm: fix remote numa hits statistics

Tim Gardner-2
From: Michal Hocko <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1672953

Jia He has noticed that commit b9f00e147f27 ("mm, page_alloc: reduce
branches in zone_statistics") has an unintentional side effect that
remote node allocation requests are accounted as NUMA_MISS rathat than
NUMA_HIT and NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE.

There are many of these potentially because the flag is used very rarely
while we have many users of __alloc_pages_node.

Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
follow up patch) and treat all allocations that were satisfied from the
preferred zone's node as NUMA_HITS because this is the same node we
requested the allocation from in most cases.  If this is not the local
node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.

One downsize would be that an allocation request for a node which is
outside of the mempolicy nodemask would be reported as a hit which is a
bit weird but that was the case before b9f00e147f27 already.

Fixes: b9f00e147f27 ("mm, page_alloc: reduce branches in zone_statistics")
Link: http://lkml.kernel.org/r/20170102153057.9451-2-mhocko@...
Signed-off-by: Michal Hocko <[hidden email]>
Reported-by: Jia He <[hidden email]>
Reviewed-by: Vlastimil Babka <[hidden email]> # with cbmc[1] superpowers
Acked-by: Mel Gorman <[hidden email]>
Cc: Johannes Weiner <[hidden email]>
Cc: Joonsoo Kim <[hidden email]>
Cc: Taku Izumi <[hidden email]>
Signed-off-by: Andrew Morton <[hidden email]>
Signed-off-by: Linus Torvalds <[hidden email]>
(cherry picked from commit 2df26639e708a88dcc22171949da638a9998f3bc)
Signed-off-by: Tim Gardner <[hidden email]>
---
 mm/page_alloc.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 212a017..0a9bda6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2532,30 +2532,23 @@ int __isolate_free_page(struct page *page, unsigned int order)
  * Update NUMA hit/miss statistics
  *
  * Must be called with interrupts disabled.
- *
- * When __GFP_OTHER_NODE is set assume the node of the preferred
- * zone is the local node. This is useful for daemons who allocate
- * memory on behalf of other processes.
  */
 static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
  gfp_t flags)
 {
 #ifdef CONFIG_NUMA
- int local_nid = numa_node_id();
  enum zone_stat_item local_stat = NUMA_LOCAL;
 
- if (unlikely(flags & __GFP_OTHER_NODE)) {
+ if (z->node != numa_node_id())
  local_stat = NUMA_OTHER;
- local_nid = preferred_zone->node;
- }
 
- if (z->node == local_nid) {
+ if (z->node == preferred_zone->node)
  __inc_zone_state(z, NUMA_HIT);
- __inc_zone_state(z, local_stat);
- } else {
+ else {
  __inc_zone_state(z, NUMA_MISS);
  __inc_zone_state(preferred_zone, NUMA_FOREIGN);
  }
+ __inc_zone_state(z, local_stat);
 #endif
 }
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/2] mm: get rid of __GFP_OTHER_NODE

Tim Gardner-2
In reply to this post by Tim Gardner-2
From: Michal Hocko <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1672953

The flag was introduced by commit 78afd5612deb ("mm: add
__GFP_OTHER_NODE flag") to allow proper accounting of remote node
allocations done by kernel daemons on behalf of a process - e.g.
khugepaged.

After "mm: fix remote numa hits statistics" we do not need and actually
use the flag so we can safely remove it because all allocations which
are satisfied from their "home" node are accounted properly.

[[hidden email]: fix build]
Link: http://lkml.kernel.org/r/20170106122225.GK5556@...
Link: http://lkml.kernel.org/r/20170102153057.9451-3-mhocko@...
Signed-off-by: Michal Hocko <[hidden email]>
Acked-by: Mel Gorman <[hidden email]>
Acked-by: Vlastimil Babka <[hidden email]>
Cc: Michal Hocko <[hidden email]>
Cc: Johannes Weiner <[hidden email]>
Cc: Joonsoo Kim <[hidden email]>
Cc: Taku Izumi <[hidden email]>
Signed-off-by: Andrew Morton <[hidden email]>
Signed-off-by: Linus Torvalds <[hidden email]>

(back ported from commit 41b6167e8f746b475668f1da78599fc4284f18db)
Signed-off-by: Tim Gardner <[hidden email]>

Conflicts:
        mm/huge_memory.c
---
 include/linux/gfp.h            | 13 +++----------
 include/trace/events/mmflags.h |  3 +--
 mm/huge_memory.c               |  3 +--
 mm/khugepaged.c                |  5 ++---
 mm/page_alloc.c                |  5 ++---
 tools/perf/builtin-kmem.c      |  1 -
 6 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f8041f9de..9a435ff 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -38,9 +38,8 @@ struct vm_area_struct;
 #define ___GFP_ACCOUNT 0x100000u
 #define ___GFP_NOTRACK 0x200000u
 #define ___GFP_DIRECT_RECLAIM 0x400000u
-#define ___GFP_OTHER_NODE 0x800000u
-#define ___GFP_WRITE 0x1000000u
-#define ___GFP_KSWAPD_RECLAIM 0x2000000u
+#define ___GFP_WRITE 0x800000u
+#define ___GFP_KSWAPD_RECLAIM 0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -172,11 +171,6 @@ struct vm_area_struct;
  * __GFP_NOTRACK_FALSE_POSITIVE is an alias of __GFP_NOTRACK. It's a means of
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
- *
- * __GFP_OTHER_NODE is for allocations that are on a remote node but that
- *   should not be accounted for as a remote allocation in vmstat. A
- *   typical user would be khugepaged collapsing a huge page on a remote
- *   node.
  */
 #define __GFP_COLD ((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN)
@@ -184,10 +178,9 @@ struct vm_area_struct;
 #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
 #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK)
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
-#define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 26
+#define __GFP_BITS_SHIFT 25
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 5a81ab4..06c86fb 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -47,8 +47,7 @@
  {(unsigned long)__GFP_WRITE, "__GFP_WRITE"}, \
  {(unsigned long)__GFP_RECLAIM, "__GFP_RECLAIM"}, \
  {(unsigned long)__GFP_DIRECT_RECLAIM, "__GFP_DIRECT_RECLAIM"},\
- {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"},\
- {(unsigned long)__GFP_OTHER_NODE, "__GFP_OTHER_NODE"} \
+ {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"}\
 
 #define show_gfp_flags(flags) \
  (flags) ? __print_flags(flags, "|", \
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b764fddd..fe61d69 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -847,8 +847,7 @@ static int do_huge_pmd_wp_page_fallback(struct fault_env *fe, pmd_t orig_pmd,
  }
 
  for (i = 0; i < HPAGE_PMD_NR; i++) {
- pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE |
-       __GFP_OTHER_NODE, vma,
+ pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE, vma,
        fe->address, page_to_nid(page));
  if (unlikely(!pages[i] ||
      mem_cgroup_try_charge(pages[i], vma->vm_mm,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 87e1a7ca..a678267 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -943,7 +943,7 @@ static void collapse_huge_page(struct mm_struct *mm,
  VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
  /* Only allocate from the target node */
- gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_OTHER_NODE | __GFP_THISNODE;
+ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
 
  /*
  * Before allocating the hugepage, release the mmap_sem read lock.
@@ -1309,8 +1309,7 @@ static void collapse_shmem(struct mm_struct *mm,
  VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
 
  /* Only allocate from the target node */
- gfp = alloc_hugepage_khugepaged_gfpmask() |
- __GFP_OTHER_NODE | __GFP_THISNODE;
+ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
 
  new_page = khugepaged_alloc_page(hpage, gfp, node);
  if (!new_page) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a9bda6..e0a450c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2533,8 +2533,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
  *
  * Must be called with interrupts disabled.
  */
-static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
- gfp_t flags)
+static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 {
 #ifdef CONFIG_NUMA
  enum zone_stat_item local_stat = NUMA_LOCAL;
@@ -2616,7 +2615,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
  }
 
  __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
- zone_statistics(preferred_zone, zone, gfp_flags);
+ zone_statistics(preferred_zone, zone);
  local_irq_restore(flags);
 
  VM_BUG_ON_PAGE(bad_range(zone, page), page);
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index fdde1bd..e87fc98 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -645,7 +645,6 @@ static const struct {
  { "__GFP_RECLAIM", "R" },
  { "__GFP_DIRECT_RECLAIM", "DR" },
  { "__GFP_KSWAPD_RECLAIM", "KR" },
- { "__GFP_OTHER_NODE", "ON" },
 };
 
 static size_t max_gfp_len;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: Yakkety SRU - ubuntu16.04.2: number of numa_miss and numa_foreign wrong in numastat

Stefan Bader-2
In reply to this post by Tim Gardner-2



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: Yakkety SRU - ubuntu16.04.2: number of numa_miss and numa_foreign wrong in numastat

brad.figg
In reply to this post by Tim Gardner-2
Reply | Threaded
Open this post in threaded view
|

APPLIED: Yakkety SRU - ubuntu16.04.2: number of numa_miss and numa_foreign wrong in numastat

Thadeu Lima de Souza Cascardo-3
In reply to this post by Tim Gardner-2
Applied to yakkety master-next branch.

Thanks.
Cascardo.

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team