[Artful][PATCH 00/12] Backport for Power9 Nest PMU Instrumentation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 00/12] Backport for Power9 Nest PMU Instrumentation

Gustavo Walbon
BugLink: https://bugs.launchpad.net/bugs/1481347

This patchset enable the Power9 Nest PMU Counter to enable it for perf tool.

These Nest PMU Counters provide per-chip metrics such memory bandwidth, Powerbus
bandwidth and so on.

For test it needs Power9 machine with latest version of BMC and the minimum
skiboot version v5.9-rc5.

Anju T (1):
  powerpc/perf/imc: Fix nest events on muti socket system

Anju T Sudhakar (6):
  powerpc/perf: Add nest IMC PMU support
  powerpc/perf: Add core IMC PMU support
  powerpc/perf: Add thread IMC PMU support
  powerpc/perf: Fix for core/nest imc call trace on cpuhotplug
  powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()
  powerpc/perf: Fix IMC initialization crash

Dan Carpenter (1):
  powerpc/perf: Fix double unlock in imc_common_cpuhp_mem_free()

LABBE Corentin (1):
  powerpc/powernv: Fix build error in opal-imc.c when NUMA=n

Madhavan Srinivasan (3):
  powerpc/powernv: Add IMC OPAL APIs
  powerpc/powernv: Detect and create IMC device
  powerpc/perf: Fix usage of nest_imc_refc

 arch/powerpc/include/asm/imc-pmu.h             |  128 +++
 arch/powerpc/include/asm/opal-api.h            |   12 +-
 arch/powerpc/include/asm/opal.h                |    5 +
 arch/powerpc/perf/Makefile                     |    1 +
 arch/powerpc/perf/imc-pmu.c                    | 1335 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/Makefile        |    1 +
 arch/powerpc/platforms/powernv/opal-imc.c      |  226 ++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
 arch/powerpc/platforms/powernv/opal.c          |   14 +
 include/linux/cpuhotplug.h                     |    3 +
 10 files changed, 1727 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h
 create mode 100644 arch/powerpc/perf/imc-pmu.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 01/12] powerpc/powernv: Add IMC OPAL APIs

Gustavo Walbon
From: Madhavan Srinivasan <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

In-Memory Collection (IMC) counters are performance monitoring
infrastructure. These counters need special sequence of SCOMs to
init/start/stop which is handled by OPAL. And OPAL provides three APIs
to init and control these IMC engines.

OPAL API documentation:
  https://github.com/open-power/skiboot/blob/master/doc/opal-api/opal-imc-counters.rst

Patch updates the kernel side powernv platform code to support the new
OPAL APIs

Signed-off-by: Hemant Kumar <[hidden email]>
Signed-off-by: Anju T Sudhakar <[hidden email]>
Signed-off-by: Madhavan Srinivasan <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from commit 28a5db0061014c8afbbb98560cf420c29bc4d8e1)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/include/asm/opal-api.h            | 12 +++++++++++-
 arch/powerpc/include/asm/opal.h                |  5 +++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  3 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 3130a73652c7..7df005965634 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -190,7 +190,10 @@
 #define OPAL_NPU_INIT_CONTEXT 146
 #define OPAL_NPU_DESTROY_CONTEXT 147
 #define OPAL_NPU_MAP_LPAR 148
-#define OPAL_LAST 148
+#define OPAL_IMC_COUNTERS_INIT 149
+#define OPAL_IMC_COUNTERS_START 150
+#define OPAL_IMC_COUNTERS_STOP 151
+#define OPAL_LAST 151
 
 /* Device tree flags */
 
@@ -1084,6 +1087,13 @@ enum {
  XIVE_DUMP_EMU_STATE = 5,
 };
 
+/* "type" argument options for OPAL_IMC_COUNTERS_* calls */
+enum {
+ OPAL_IMC_COUNTERS_NEST = 1,
+ OPAL_IMC_COUNTERS_CORE = 2,
+};
+
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_API_H */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 588fb1c23af9..6b8513c3ad40 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -268,6 +268,11 @@ int64_t opal_xive_free_irq(uint32_t girq);
 int64_t opal_xive_sync(uint32_t type, uint32_t id);
 int64_t opal_xive_dump(uint32_t type, uint32_t id);
 
+int64_t opal_imc_counters_init(uint32_t type, uint64_t address,
+ uint64_t cpu_pir);
+int64_t opal_imc_counters_start(uint32_t type, uint64_t cpu_pir);
+int64_t opal_imc_counters_stop(uint32_t type, uint64_t cpu_pir);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
    int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 4ca6c26a56d5..b77f52ee8263 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -310,3 +310,6 @@ OPAL_CALL(opal_xive_dump, OPAL_XIVE_DUMP);
 OPAL_CALL(opal_npu_init_context, OPAL_NPU_INIT_CONTEXT);
 OPAL_CALL(opal_npu_destroy_context, OPAL_NPU_DESTROY_CONTEXT);
 OPAL_CALL(opal_npu_map_lpar, OPAL_NPU_MAP_LPAR);
+OPAL_CALL(opal_imc_counters_init, OPAL_IMC_COUNTERS_INIT);
+OPAL_CALL(opal_imc_counters_start, OPAL_IMC_COUNTERS_START);
+OPAL_CALL(opal_imc_counters_stop, OPAL_IMC_COUNTERS_STOP);
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 02/12] powerpc/powernv: Detect and create IMC device

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Madhavan Srinivasan <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

Code to create platform device for the In-Memory Collection (IMC)
counters. Platform devices are created based on the IMC compatibility.
New header file created to contain the data structures and macros
needed for In-Memory Collection (IMC) counter pmu devices.

The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes for these IMC
PMUs. Device probe() parses the device to locate three possible IMC
device types (Nest/Core/Thread). Function then branch to parse each
unit nodes to populate vital information such as device memory sizes,
event nodes information, base address for reserve memory access (if
any) and so on. Simple bare-minimum shutdown function added which only
"stops" the engines.

Signed-off-by: Anju T Sudhakar <[hidden email]>
Signed-off-by: Hemant Kumar <[hidden email]>
Signed-off-by: Madhavan Srinivasan <[hidden email]>
[mpe: Fix build with CONFIG_PERF_EVENTS=n]
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 8f95faaac56c18b32d0e23ace55417a440abdb7e)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/include/asm/imc-pmu.h        | 128 +++++++++++++++++
 arch/powerpc/platforms/powernv/Makefile   |   1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 221 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c     |  14 ++
 4 files changed, 364 insertions(+)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
new file mode 100644
index 000000000000..7f74c282710f
--- /dev/null
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -0,0 +1,128 @@
+#ifndef __ASM_POWERPC_IMC_PMU_H
+#define __ASM_POWERPC_IMC_PMU_H
+
+/*
+ * IMC Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *           (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *           (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+/*
+ * For static allocation of some of the structures.
+ */
+#define IMC_MAX_PMUS 32
+
+/*
+ * Compatibility macros for IMC devices
+ */
+#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
+#define IMC_DTB_UNIT_COMPAT "ibm,imc-counters"
+
+
+/*
+ * LDBAR: Counter address and Enable/Disable macro.
+ * perf/imc-pmu.c has the LDBAR layout information.
+ */
+#define THREAD_IMC_LDBAR_MASK           0x0003ffffffffe000ULL
+#define THREAD_IMC_ENABLE               0x8000000000000000ULL
+
+/*
+ * Structure to hold memory address information for imc units.
+ */
+struct imc_mem_info {
+ u64 *vbase;
+ u32 id;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+ u32 value;
+ char *name;
+ char *unit;
+ char *scale;
+};
+
+/* Event attribute array index */
+#define IMC_FORMAT_ATTR 0
+#define IMC_EVENT_ATTR 1
+#define IMC_CPUMASK_ATTR 2
+#define IMC_NULL_ATTR 3
+
+/* PMU Format attribute macros */
+#define IMC_EVENT_OFFSET_MASK 0xffffffffULL
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will hold the
+ * pmu functions, events, counter memory information
+ * and attrs for each imc pmu and will be referenced at
+ * the time of pmu registration.
+ */
+struct imc_pmu {
+ struct pmu pmu;
+ struct imc_mem_info *mem_info;
+ struct imc_events **events;
+ /*
+ * Attribute groups for the PMU. Slot 0 used for
+ * format attribute, slot 1 used for cpusmask attribute,
+ * slot 2 used for event attribute. Slot 3 keep as
+ * NULL.
+ */
+ const struct attribute_group *attr_groups[4];
+ u32 counter_mem_size;
+ int domain;
+ /*
+ * flag to notify whether the memory is mmaped
+ * or allocated by kernel.
+ */
+ bool imc_counter_mmaped;
+};
+
+/*
+ * Structure to hold id, lock and reference count for the imc events which
+ * are inited.
+ */
+struct imc_pmu_ref {
+ struct mutex lock;
+ unsigned int id;
+ int refc;
+};
+
+/*
+ * In-Memory Collection Counters type.
+ * Data comes from Device tree.
+ * Three device type are supported.
+ */
+
+enum {
+ IMC_TYPE_THREAD = 0x1,
+ IMC_TYPE_CORE = 0x4,
+ IMC_TYPE_CHIP           = 0x10,
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST 1
+#define IMC_DOMAIN_CORE 2
+#define IMC_DOMAIN_THREAD 3
+
+extern int init_imc_pmu(struct device_node *parent,
+ struct imc_pmu *pmu_ptr, int pmu_id);
+extern void thread_imc_disable(void);
+#endif /* __ASM_POWERPC_IMC_PMU_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb3f482..a0d4353a17c9 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE) += opal-memory-errors.o
 obj-$(CONFIG_TRACEPOINTS) += opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
+obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
new file mode 100644
index 000000000000..f57a6fbd3f57
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -0,0 +1,221 @@
+/*
+ * OPAL IMC interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ * (C) 2017 Anju T Sudhakar, IBM Corporation.
+ * (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ */
+#include <linux/kernel.h>
+#include <linux/platform_device.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
+#include <linux/crash_dump.h>
+#include <asm/opal.h>
+#include <asm/io.h>
+#include <asm/imc-pmu.h>
+#include <asm/cputhreads.h>
+
+/*
+ * imc_get_mem_addr_nest: Function to get nest counter memory region
+ * for each chip
+ */
+static int imc_get_mem_addr_nest(struct device_node *node,
+ struct imc_pmu *pmu_ptr,
+ u32 offset)
+{
+ int nr_chips = 0, i;
+ u64 *base_addr_arr, baddr;
+ u32 *chipid_arr;
+
+ nr_chips = of_property_count_u32_elems(node, "chip-id");
+ if (nr_chips <= 0)
+ return -ENODEV;
+
+ base_addr_arr = kcalloc(nr_chips, sizeof(u64), GFP_KERNEL);
+ if (!base_addr_arr)
+ return -ENOMEM;
+
+ chipid_arr = kcalloc(nr_chips, sizeof(u32), GFP_KERNEL);
+ if (!chipid_arr)
+ return -ENOMEM;
+
+ if (of_property_read_u32_array(node, "chip-id", chipid_arr, nr_chips))
+ goto error;
+
+ if (of_property_read_u64_array(node, "base-addr", base_addr_arr,
+ nr_chips))
+ goto error;
+
+ pmu_ptr->mem_info = kcalloc(nr_chips, sizeof(struct imc_mem_info),
+ GFP_KERNEL);
+ if (!pmu_ptr->mem_info)
+ goto error;
+
+ for (i = 0; i < nr_chips; i++) {
+ pmu_ptr->mem_info[i].id = chipid_arr[i];
+ baddr = base_addr_arr[i] + offset;
+ pmu_ptr->mem_info[i].vbase = phys_to_virt(baddr);
+ }
+
+ pmu_ptr->imc_counter_mmaped = true;
+ kfree(base_addr_arr);
+ kfree(chipid_arr);
+ return 0;
+
+error:
+ kfree(pmu_ptr->mem_info);
+ kfree(base_addr_arr);
+ kfree(chipid_arr);
+ return -1;
+}
+
+/*
+ * imc_pmu_create : Takes the parent device which is the pmu unit, pmu_index
+ *    and domain as the inputs.
+ * Allocates memory for the struct imc_pmu, sets up its domain, size and offsets
+ */
+static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
+{
+ int ret = 0;
+ struct imc_pmu *pmu_ptr;
+ u32 offset;
+
+ /* memory for pmu */
+ pmu_ptr = kzalloc(sizeof(struct imc_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ /* Set the domain */
+ pmu_ptr->domain = domain;
+
+ ret = of_property_read_u32(parent, "size", &pmu_ptr->counter_mem_size);
+ if (ret) {
+ ret = -EINVAL;
+ goto free_pmu;
+ }
+
+ if (!of_property_read_u32(parent, "offset", &offset)) {
+ if (imc_get_mem_addr_nest(parent, pmu_ptr, offset)) {
+ ret = -EINVAL;
+ goto free_pmu;
+ }
+ }
+
+ return 0;
+
+free_pmu:
+ kfree(pmu_ptr);
+ return ret;
+}
+
+static void disable_nest_pmu_counters(void)
+{
+ int nid, cpu;
+ struct cpumask *l_cpumask;
+
+ get_online_cpus();
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+ cpu = cpumask_first(l_cpumask);
+ opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
+       get_hard_smp_processor_id(cpu));
+ }
+ put_online_cpus();
+}
+
+static void disable_core_pmu_counters(void)
+{
+ cpumask_t cores_map;
+ int cpu, rc;
+
+ get_online_cpus();
+ /* Disable the IMC Core functions */
+ cores_map = cpu_online_cores_map();
+ for_each_cpu(cpu, &cores_map) {
+ rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,
+    get_hard_smp_processor_id(cpu));
+ if (rc)
+ pr_err("%s: Failed to stop Core (cpu = %d)\n",
+ __FUNCTION__, cpu);
+ }
+ put_online_cpus();
+}
+
+static int opal_imc_counters_probe(struct platform_device *pdev)
+{
+ struct device_node *imc_dev = pdev->dev.of_node;
+ int pmu_count = 0, domain;
+ u32 type;
+
+ /*
+ * Check whether this is kdump kernel. If yes, force the engines to
+ * stop and return.
+ */
+ if (is_kdump_kernel()) {
+ disable_nest_pmu_counters();
+ disable_core_pmu_counters();
+ return -ENODEV;
+ }
+
+ for_each_compatible_node(imc_dev, NULL, IMC_DTB_UNIT_COMPAT) {
+ if (of_property_read_u32(imc_dev, "type", &type)) {
+ pr_warn("IMC Device without type property\n");
+ continue;
+ }
+
+ switch (type) {
+ case IMC_TYPE_CHIP:
+ domain = IMC_DOMAIN_NEST;
+ break;
+ case IMC_TYPE_CORE:
+ domain =IMC_DOMAIN_CORE;
+ break;
+ case IMC_TYPE_THREAD:
+ domain = IMC_DOMAIN_THREAD;
+ break;
+ default:
+ pr_warn("IMC Unknown Device type \n");
+ domain = -1;
+ break;
+ }
+
+ if (!imc_pmu_create(imc_dev, pmu_count, domain))
+ pmu_count++;
+ }
+
+ return 0;
+}
+
+static void opal_imc_counters_shutdown(struct platform_device *pdev)
+{
+ /*
+ * Function only stops the engines which is bare minimum.
+ * TODO: Need to handle proper memory cleanup and pmu
+ * unregister.
+ */
+ disable_nest_pmu_counters();
+ disable_core_pmu_counters();
+}
+
+static const struct of_device_id opal_imc_match[] = {
+ { .compatible = IMC_DTB_COMPAT },
+ {},
+};
+
+static struct platform_driver opal_imc_driver = {
+ .driver = {
+ .name = "opal-imc-counters",
+ .of_match_table = opal_imc_match,
+ },
+ .probe = opal_imc_counters_probe,
+ .shutdown = opal_imc_counters_shutdown,
+};
+
+builtin_platform_driver(opal_imc_driver);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index cad6b57ce494..946158e6fc1d 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -16,6 +16,7 @@
 #include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/of_platform.h>
+#include <linux/of_address.h>
 #include <linux/interrupt.h>
 #include <linux/notifier.h>
 #include <linux/slab.h>
@@ -30,6 +31,7 @@
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
+#include <asm/imc-pmu.h>
 
 #include "powernv.h"
 
@@ -720,6 +722,15 @@ static void opal_pdev_init(const char *compatible)
  of_platform_device_create(np, NULL, NULL);
 }
 
+static void __init opal_imc_init_dev(void)
+{
+ struct device_node *np;
+
+ np = of_find_compatible_node(NULL, NULL, IMC_DTB_COMPAT);
+ if (np)
+ of_platform_device_create(np, NULL, NULL);
+}
+
 static int kopald(void *unused)
 {
  unsigned long timeout = msecs_to_jiffies(opal_heartbeat) + 1;
@@ -793,6 +804,9 @@ static int __init opal_init(void)
  /* Setup a heatbeat thread if requested by OPAL */
  opal_init_heartbeat();
 
+ /* Detect In-Memory Collection counters and create devices*/
+ opal_imc_init_dev();
+
  /* Create leds platform devices */
  leds = of_find_node_by_path("/ibm,opal/leds");
  if (leds) {
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 03/12] powerpc/perf: Add nest IMC PMU support

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Anju T Sudhakar <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

Add support to register Nest In-Memory Collection PMU counters.
Patch adds a new device file called "imc-pmu.c" under powerpc/perf
folder to contain all the device PMU functions.

Device tree parser code added to parse the PMU events information
and create sysfs event attributes for the PMU.

Cpumask attribute added along with Cpu hotplug online/offline functions
specific for nest PMU. A new state "CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE"
added for the cpu hotplug callbacks. Error handle path frees the memory
and unregisters the CPU hotplug callbacks.

Signed-off-by: Anju T Sudhakar <[hidden email]>
Signed-off-by: Hemant Kumar <[hidden email]>
Signed-off-by: Madhavan Srinivasan <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 885dcd709ba9120b9935415b8b0f9d1b94e5826b)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/Makefile                |   1 +
 arch/powerpc/perf/imc-pmu.c               | 749 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-imc.c |   5 +
 include/linux/cpuhotplug.h                |   1 +
 4 files changed, 756 insertions(+)
 create mode 100644 arch/powerpc/perf/imc-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 4d606b99a5cb..3f3a5ce66495 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -8,6 +8,7 @@ obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
    isa207-common.o power8-pmu.o power9-pmu.o
 obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o
 
+obj-$(CONFIG_PPC_POWERNV) += imc-pmu.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
 
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
new file mode 100644
index 000000000000..4543faa1bb0d
--- /dev/null
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -0,0 +1,749 @@
+/*
+ * In-Memory Collection (IMC) Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *           (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *           (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ */
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/imc-pmu.h>
+#include <asm/cputhreads.h>
+#include <asm/smp.h>
+#include <linux/string.h>
+
+/* Nest IMC data structures and variables */
+
+/*
+ * Used to avoid races in counting the nest-pmu units during hotplug
+ * register and unregister
+ */
+static DEFINE_MUTEX(nest_init_lock);
+static DEFINE_PER_CPU(struct imc_pmu_ref *, local_nest_imc_refc);
+static struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static cpumask_t nest_imc_cpumask;
+struct imc_pmu_ref *nest_imc_refc;
+static int nest_pmus;
+
+struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
+{
+ return container_of(event->pmu, struct imc_pmu, pmu);
+}
+
+PMU_FORMAT_ATTR(event, "config:0-40");
+PMU_FORMAT_ATTR(offset, "config:0-31");
+PMU_FORMAT_ATTR(rvalue, "config:32");
+PMU_FORMAT_ATTR(mode, "config:33-40");
+static struct attribute *imc_format_attrs[] = {
+ &format_attr_event.attr,
+ &format_attr_offset.attr,
+ &format_attr_rvalue.attr,
+ &format_attr_mode.attr,
+ NULL,
+};
+
+static struct attribute_group imc_format_group = {
+ .name = "format",
+ .attrs = imc_format_attrs,
+};
+
+/* Get the cpumask printed to a buffer "buf" */
+static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+ struct imc_pmu *imc_pmu = container_of(pmu, struct imc_pmu, pmu);
+ cpumask_t *active_mask;
+
+ /* Subsequenct patch will add more pmu types here */
+ switch(imc_pmu->domain){
+ case IMC_DOMAIN_NEST:
+ active_mask = &nest_imc_cpumask;
+ break;
+ default:
+ return 0;
+ }
+
+ return cpumap_print_to_pagebuf(true, buf, active_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *imc_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group imc_pmu_cpumask_attr_group = {
+ .attrs = imc_pmu_cpumask_attrs,
+};
+
+/* device_str_attr_create : Populate event "name" and string "str" in attribute */
+static struct attribute *device_str_attr_create(const char *name, const char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+ if (!attr)
+ return NULL;
+ sysfs_attr_init(&attr->attr.attr);
+
+ attr->event_str = str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+struct imc_events *imc_parse_event(struct device_node *np, const char *scale,
+  const char *unit, const char *prefix, u32 base)
+{
+ struct imc_events *event;
+ const char *s;
+ u32 reg;
+
+ event = kzalloc(sizeof(struct imc_events), GFP_KERNEL);
+ if (!event)
+ return NULL;
+
+ if (of_property_read_u32(np, "reg", &reg))
+ goto error;
+ /* Add the base_reg value to the "reg" */
+ event->value = base + reg;
+
+ if (of_property_read_string(np, "event-name", &s))
+ goto error;
+
+ event->name = kasprintf(GFP_KERNEL, "%s%s", prefix, s);
+ if (!event->name)
+ goto error;
+
+ if (of_property_read_string(np, "scale", &s))
+ s = scale;
+
+ if (s) {
+ event->scale = kstrdup(s, GFP_KERNEL);
+ if (!event->scale)
+ goto error;
+ }
+
+ if (of_property_read_string(np, "unit", &s))
+ s = unit;
+
+ if (s) {
+ event->unit = kstrdup(s, GFP_KERNEL);
+ if (!event->unit)
+ goto error;
+ }
+
+ return event;
+error:
+ kfree(event->unit);
+ kfree(event->scale);
+ kfree(event->name);
+ kfree(event);
+
+ return NULL;
+}
+
+/*
+ * update_events_in_group: Update the "events" information in an attr_group
+ *                         and assign the attr_group to the pmu "pmu".
+ */
+static int update_events_in_group(struct device_node *node, struct imc_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs, *dev_str;
+ struct device_node *np, *pmu_events;
+ struct imc_events *ev;
+ u32 handle, base_reg;
+ int i=0, j=0, ct;
+ const char *prefix, *g_scale, *g_unit;
+ const char *ev_val_str, *ev_scale_str, *ev_unit_str;
+
+ if (!of_property_read_u32(node, "events", &handle))
+ pmu_events = of_find_node_by_phandle(handle);
+ else
+ return 0;
+
+ /* Did not find any node with a given phandle */
+ if (!pmu_events)
+ return 0;
+
+ /* Get a count of number of child nodes */
+ ct = of_get_child_count(pmu_events);
+
+ /* Get the event prefix */
+ if (of_property_read_string(node, "events-prefix", &prefix))
+ return 0;
+
+ /* Get a global unit and scale data if available */
+ if (of_property_read_string(node, "scale", &g_scale))
+ g_scale = NULL;
+
+ if (of_property_read_string(node, "unit", &g_unit))
+ g_unit = NULL;
+
+ /* "reg" property gives out the base offset of the counters data */
+ of_property_read_u32(node, "reg", &base_reg);
+
+ /* Allocate memory for the events */
+ pmu->events = kcalloc(ct, sizeof(struct imc_events), GFP_KERNEL);
+ if (!pmu->events)
+ return -ENOMEM;
+
+ ct = 0;
+ /* Parse the events and update the struct */
+ for_each_child_of_node(pmu_events, np) {
+ ev = imc_parse_event(np, g_scale, g_unit, prefix, base_reg);
+ if (ev)
+ pmu->events[ct++] = ev;
+ }
+
+ /* Allocate memory for attribute group */
+ attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ /*
+ * Allocate memory for attributes.
+ * Since we have count of events for this pmu, we also allocate
+ * memory for the scale and unit attribute for now.
+ * "ct" has the total event structs added from the events-parent node.
+ * So allocate three times the "ct" (this includes event, event_scale and
+ * event_unit).
+ */
+ attrs = kcalloc(((ct * 3) + 1), sizeof(struct attribute *), GFP_KERNEL);
+ if (!attrs) {
+ kfree(attr_group);
+ kfree(pmu->events);
+ return -ENOMEM;
+ }
+
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+ do {
+ ev_val_str = kasprintf(GFP_KERNEL, "event=0x%x", pmu->events[i]->value);
+ dev_str = device_str_attr_create(pmu->events[i]->name, ev_val_str);
+ if (!dev_str)
+ continue;
+
+ attrs[j++] = dev_str;
+ if (pmu->events[i]->scale) {
+ ev_scale_str = kasprintf(GFP_KERNEL, "%s.scale",pmu->events[i]->name);
+ dev_str = device_str_attr_create(ev_scale_str, pmu->events[i]->scale);
+ if (!dev_str)
+ continue;
+
+ attrs[j++] = dev_str;
+ }
+
+ if (pmu->events[i]->unit) {
+ ev_unit_str = kasprintf(GFP_KERNEL, "%s.unit",pmu->events[i]->name);
+ dev_str = device_str_attr_create(ev_unit_str, pmu->events[i]->unit);
+ if (!dev_str)
+ continue;
+
+ attrs[j++] = dev_str;
+ }
+ } while (++i < ct);
+
+ /* Save the event attribute */
+ pmu->attr_groups[IMC_EVENT_ATTR] = attr_group;
+
+ kfree(pmu->events);
+ return 0;
+}
+
+/* get_nest_pmu_ref: Return the imc_pmu_ref struct for the given node */
+static struct imc_pmu_ref *get_nest_pmu_ref(int cpu)
+{
+ return per_cpu(local_nest_imc_refc, cpu);
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ struct imc_pmu **pn = per_nest_pmu_arr;
+ int i;
+
+ if (old_cpu < 0 || new_cpu < 0)
+ return;
+
+ for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
+ perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);
+}
+
+static int ppc_nest_imc_cpu_offline(unsigned int cpu)
+{
+ int nid, target = -1;
+ const struct cpumask *l_cpumask;
+ struct imc_pmu_ref *ref;
+
+ /*
+ * Check in the designated list for this cpu. Dont bother
+ * if not one of them.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
+ return 0;
+
+ /*
+ * Now that this cpu is one of the designated,
+ * find a next cpu a) which is online and b) in same chip.
+ */
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ target = cpumask_any_but(l_cpumask, cpu);
+
+ /*
+ * Update the cpumask with the target cpu and
+ * migrate the context if needed
+ */
+ if (target >= 0 && target < nr_cpu_ids) {
+ cpumask_set_cpu(target, &nest_imc_cpumask);
+ nest_change_cpu_context(cpu, target);
+ } else {
+ opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
+       get_hard_smp_processor_id(cpu));
+ /*
+ * If this is the last cpu in this chip then, skip the reference
+ * count mutex lock and make the reference count on this chip zero.
+ */
+ ref = get_nest_pmu_ref(cpu);
+ if (!ref)
+ return -EINVAL;
+
+ ref->refc = 0;
+ }
+ return 0;
+}
+
+static int ppc_nest_imc_cpu_online(unsigned int cpu)
+{
+ const struct cpumask *l_cpumask;
+ static struct cpumask tmp_mask;
+ int res;
+
+ /* Get the cpumask of this node */
+ l_cpumask = cpumask_of_node(cpu_to_node(cpu));
+
+ /*
+ * If this is not the first online CPU on this node, then
+ * just return.
+ */
+ if (cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask))
+ return 0;
+
+ /*
+ * If this is the first online cpu on this node
+ * disable the nest counters by making an OPAL call.
+ */
+ res = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
+     get_hard_smp_processor_id(cpu));
+ if (res)
+ return res;
+
+ /* Make this CPU the designated target for counter collection */
+ cpumask_set_cpu(cpu, &nest_imc_cpumask);
+ return 0;
+}
+
+static int nest_pmu_cpumask_init(void)
+{
+ return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
+ "perf/powerpc/imc:online",
+ ppc_nest_imc_cpu_online,
+ ppc_nest_imc_cpu_offline);
+}
+
+static void nest_imc_counters_release(struct perf_event *event)
+{
+ int rc, node_id;
+ struct imc_pmu_ref *ref;
+
+ if (event->cpu < 0)
+ return;
+
+ node_id = cpu_to_node(event->cpu);
+
+ /*
+ * See if we need to disable the nest PMU.
+ * If no events are currently in use, then we have to take a
+ * mutex to ensure that we don't race with another task doing
+ * enable or disable the nest counters.
+ */
+ ref = get_nest_pmu_ref(event->cpu);
+ if (!ref)
+ return;
+
+ /* Take the mutex lock for this node and then decrement the reference count */
+ mutex_lock(&ref->lock);
+ ref->refc--;
+ if (ref->refc == 0) {
+ rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
+    get_hard_smp_processor_id(event->cpu));
+ if (rc) {
+ mutex_unlock(&nest_imc_refc[node_id].lock);
+ pr_err("nest-imc: Unable to stop the counters for core %d\n", node_id);
+ return;
+ }
+ } else if (ref->refc < 0) {
+ WARN(1, "nest-imc: Invalid event reference count\n");
+ ref->refc = 0;
+ }
+ mutex_unlock(&ref->lock);
+}
+
+static int nest_imc_event_init(struct perf_event *event)
+{
+ int chip_id, rc, node_id;
+ u32 l_config, config = event->attr.config;
+ struct imc_mem_info *pcni;
+ struct imc_pmu *pmu;
+ struct imc_pmu_ref *ref;
+ bool flag = false;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user   ||
+    event->attr.exclude_kernel ||
+    event->attr.exclude_hv     ||
+    event->attr.exclude_idle   ||
+    event->attr.exclude_host   ||
+    event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ pmu = imc_event_to_pmu(event);
+
+ /* Sanity check for config (event offset) */
+ if ((config & IMC_EVENT_OFFSET_MASK) > pmu->counter_mem_size)
+ return -EINVAL;
+
+ /*
+ * Nest HW counter memory resides in a per-chip reserve-memory (HOMER).
+ * Get the base memory addresss for this cpu.
+ */
+ chip_id = topology_physical_package_id(event->cpu);
+ pcni = pmu->mem_info;
+ do {
+ if (pcni->id == chip_id) {
+ flag = true;
+ break;
+ }
+ pcni++;
+ } while (pcni);
+
+ if (!flag)
+ return -ENODEV;
+
+ /*
+ * Add the event offset to the base address.
+ */
+ l_config = config & IMC_EVENT_OFFSET_MASK;
+ event->hw.event_base = (u64)pcni->vbase + l_config;
+ node_id = cpu_to_node(event->cpu);
+
+ /*
+ * Get the imc_pmu_ref struct for this node.
+ * Take the mutex lock and then increment the count of nest pmu events
+ * inited.
+ */
+ ref = get_nest_pmu_ref(event->cpu);
+ if (!ref)
+ return -EINVAL;
+
+ mutex_lock(&ref->lock);
+ if (ref->refc == 0) {
+ rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_NEST,
+     get_hard_smp_processor_id(event->cpu));
+ if (rc) {
+ mutex_unlock(&nest_imc_refc[node_id].lock);
+ pr_err("nest-imc: Unable to start the counters for node %d\n",
+ node_id);
+ return rc;
+ }
+ }
+ ++ref->refc;
+ mutex_unlock(&ref->lock);
+
+ event->destroy = nest_imc_counters_release;
+ return 0;
+}
+
+static u64 * get_event_base_addr(struct perf_event *event)
+{
+ /*
+ * Subsequent patch will add code to detect caller imc pmu
+ * and return accordingly.
+ */
+ return (u64 *)event->hw.event_base;
+}
+
+static u64 imc_read_counter(struct perf_event *event)
+{
+ u64 *addr, data;
+
+ /*
+ * In-Memory Collection (IMC) counters are free flowing counters.
+ * So we take a snapshot of the counter value on enable and save it
+ * to calculate the delta at later stage to present the event counter
+ * value.
+ */
+ addr = get_event_base_addr(event);
+ data = be64_to_cpu(READ_ONCE(*addr));
+ local64_set(&event->hw.prev_count, data);
+
+ return data;
+}
+
+static void imc_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count;
+
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = imc_read_counter(event);
+ final_count = counter_new - counter_prev;
+
+ /* Update the delta to the event count */
+ local64_add(final_count, &event->count);
+}
+
+static void imc_event_start(struct perf_event *event, int flags)
+{
+ /*
+ * In Memory Counters are free flowing counters. HW or the microcode
+ * keeps adding to the counter offset in memory. To get event
+ * counter value, we snapshot the value here and we calculate
+ * delta at later point.
+ */
+ imc_read_counter(event);
+}
+
+static void imc_event_stop(struct perf_event *event, int flags)
+{
+ /*
+ * Take a snapshot and calculate the delta and update
+ * the event counter values.
+ */
+ imc_event_update(event);
+}
+
+static int imc_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ imc_event_start(event, flags);
+
+ return 0;
+}
+
+/* update_pmu_ops : Populate the appropriate operations for "pmu" */
+static int update_pmu_ops(struct imc_pmu *pmu)
+{
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.add = imc_event_add;
+ pmu->pmu.del = imc_event_stop;
+ pmu->pmu.start = imc_event_start;
+ pmu->pmu.stop = imc_event_stop;
+ pmu->pmu.read = imc_event_update;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+ pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
+
+ /* Subsequenct patch will add more pmu types here */
+ switch (pmu->domain) {
+ case IMC_DOMAIN_NEST:
+ pmu->pmu.event_init = nest_imc_event_init;
+ pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+/* init_nest_pmu_ref: Initialize the imc_pmu_ref struct for all the nodes */
+static int init_nest_pmu_ref(void)
+{
+ int nid, i, cpu;
+
+ nest_imc_refc = kcalloc(num_possible_nodes(), sizeof(*nest_imc_refc),
+ GFP_KERNEL);
+
+ if (!nest_imc_refc)
+ return -ENOMEM;
+
+ i = 0;
+ for_each_node(nid) {
+ /*
+ * Mutex lock to avoid races while tracking the number of
+ * sessions using the chip's nest pmu units.
+ */
+ mutex_init(&nest_imc_refc[i].lock);
+
+ /*
+ * Loop to init the "id" with the node_id. Variable "i" initialized to
+ * 0 and will be used as index to the array. "i" will not go off the
+ * end of the array since the "for_each_node" loops for "N_POSSIBLE"
+ * nodes only.
+ */
+ nest_imc_refc[i++].id = nid;
+ }
+
+ /*
+ * Loop to init the per_cpu "local_nest_imc_refc" with the proper
+ * "nest_imc_refc" index. This makes get_nest_pmu_ref() alot simple.
+ */
+ for_each_possible_cpu(cpu) {
+ nid = cpu_to_node(cpu);
+ for_each_online_node(i) {
+ if (nest_imc_refc[i].id == nid) {
+ per_cpu(local_nest_imc_refc, cpu) = &nest_imc_refc[i];
+ break;
+ }
+ }
+ }
+ return 0;
+}
+
+/*
+ * Common function to unregister cpu hotplug callback and
+ * free the memory.
+ * TODO: Need to handle pmu unregistering, which will be
+ * done in followup series.
+ */
+static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr)
+{
+ if (pmu_ptr->domain == IMC_DOMAIN_NEST) {
+ mutex_unlock(&nest_init_lock);
+ if (nest_pmus == 1) {
+ cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE);
+ kfree(nest_imc_refc);
+ }
+
+ if (nest_pmus > 0)
+ nest_pmus--;
+ mutex_unlock(&nest_init_lock);
+ }
+
+ /* Only free the attr_groups which are dynamically allocated  */
+ kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
+ kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
+ kfree(pmu_ptr);
+ return;
+}
+
+
+/*
+ * imc_mem_init : Function to support memory allocation for core imc.
+ */
+static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
+ int pmu_index)
+{
+ const char *s;
+
+ if (of_property_read_string(parent, "name", &s))
+ return -ENODEV;
+
+ /* Subsequenct patch will add more pmu types here */
+ switch (pmu_ptr->domain) {
+ case IMC_DOMAIN_NEST:
+ /* Update the pmu name */
+ pmu_ptr->pmu.name = kasprintf(GFP_KERNEL, "%s%s_imc", "nest_", s);
+ if (!pmu_ptr->pmu.name)
+ return -ENOMEM;
+
+ /* Needed for hotplug/migration */
+ per_nest_pmu_arr[pmu_index] = pmu_ptr;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * init_imc_pmu : Setup and register the IMC pmu device.
+ *
+ * @parent: Device tree unit node
+ * @pmu_ptr: memory allocated for this pmu
+ * @pmu_idx: Count of nest pmc registered
+ *
+ * init_imc_pmu() setup pmu cpumask and registers for a cpu hotplug callback.
+ * Handles failure cases and accordingly frees memory.
+ */
+int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_idx)
+{
+ int ret;
+
+ ret = imc_mem_init(pmu_ptr, parent, pmu_idx);
+ if (ret)
+ goto err_free;
+
+ /* Subsequenct patch will add more pmu types here */
+ switch (pmu_ptr->domain) {
+ case IMC_DOMAIN_NEST:
+ /*
+ * Nest imc pmu need only one cpu per chip, we initialize the
+ * cpumask for the first nest imc pmu and use the same for the
+ * rest. To handle the cpuhotplug callback unregister, we track
+ * the number of nest pmus in "nest_pmus".
+ */
+ mutex_lock(&nest_init_lock);
+ if (nest_pmus == 0) {
+ ret = init_nest_pmu_ref();
+ if (ret) {
+ mutex_unlock(&nest_init_lock);
+ goto err_free;
+ }
+ /* Register for cpu hotplug notification. */
+ ret = nest_pmu_cpumask_init();
+ if (ret) {
+ mutex_unlock(&nest_init_lock);
+ goto err_free;
+ }
+ }
+ nest_pmus++;
+ mutex_unlock(&nest_init_lock);
+ break;
+ default:
+ return  -1; /* Unknown domain */
+ }
+
+ ret = update_events_in_group(parent, pmu_ptr);
+ if (ret)
+ goto err_free;
+
+ ret = update_pmu_ops(pmu_ptr);
+ if (ret)
+ goto err_free;
+
+ ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ if (ret)
+ goto err_free;
+
+ pr_info("%s performance monitor hardware support registered\n",
+ pmu_ptr->pmu.name);
+
+ return 0;
+
+err_free:
+ imc_common_cpuhp_mem_free(pmu_ptr);
+ return ret;
+}
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index f57a6fbd3f57..b903bf5e6006 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -108,6 +108,11 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
  }
  }
 
+ /* Function to register IMC pmu */
+ ret = init_imc_pmu(parent, pmu_ptr, pmu_index);
+ if (ret)
+ pr_err("IMC PMU %s Register failed\n", pmu_ptr->pmu.name);
+
  return 0;
 
 free_pmu:
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 82b30e638430..05d1907713e9 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -137,6 +137,7 @@ enum cpuhp_state {
  CPUHP_AP_PERF_ARM_L2X0_ONLINE,
  CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
  CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
+ CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
  CPUHP_AP_WORKQUEUE_ONLINE,
  CPUHP_AP_RCUTREE_ONLINE,
  CPUHP_AP_ONLINE_DYN,
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 04/12] powerpc/perf: Add core IMC PMU support

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Anju T Sudhakar <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

Add support to register Core In-Memory Collection PMU counters.
Patch adds core IMC specific data structures, along with memory
init functions and CPU hotplug support.

Signed-off-by: Anju T Sudhakar <[hidden email]>
Signed-off-by: Hemant Kumar <[hidden email]>
Signed-off-by: Madhavan Srinivasan <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 39a846db1d574a498511ffccd75223a35cdcb059)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 303 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/cpuhotplug.h  |   1 +
 2 files changed, 300 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 4543faa1bb0d..482f8d6d5e65 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -31,6 +31,12 @@ static cpumask_t nest_imc_cpumask;
 struct imc_pmu_ref *nest_imc_refc;
 static int nest_pmus;
 
+/* Core IMC data structures and variables */
+
+static cpumask_t core_imc_cpumask;
+struct imc_pmu_ref *core_imc_refc;
+static struct imc_pmu *core_imc_pmu;
+
 struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
  return container_of(event->pmu, struct imc_pmu, pmu);
@@ -62,11 +68,13 @@ static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
  struct imc_pmu *imc_pmu = container_of(pmu, struct imc_pmu, pmu);
  cpumask_t *active_mask;
 
- /* Subsequenct patch will add more pmu types here */
  switch(imc_pmu->domain){
  case IMC_DOMAIN_NEST:
  active_mask = &nest_imc_cpumask;
  break;
+ case IMC_DOMAIN_CORE:
+ active_mask = &core_imc_cpumask;
+ break;
  default:
  return 0;
  }
@@ -486,6 +494,240 @@ static int nest_imc_event_init(struct perf_event *event)
  return 0;
 }
 
+/*
+ * core_imc_mem_init : Initializes memory for the current core.
+ *
+ * Uses alloc_pages_node() and uses the returned address as an argument to
+ * an opal call to configure the pdbar. The address sent as an argument is
+ * converted to physical address before the opal call is made. This is the
+ * base address at which the core imc counters are populated.
+ */
+static int core_imc_mem_init(int cpu, int size)
+{
+ int phys_id, rc = 0, core_id = (cpu / threads_per_core);
+ struct imc_mem_info *mem_info;
+
+ /*
+ * alloc_pages_node() will allocate memory for core in the
+ * local node only.
+ */
+ phys_id = topology_physical_package_id(cpu);
+ mem_info = &core_imc_pmu->mem_info[core_id];
+ mem_info->id = core_id;
+
+ /* We need only vbase for core counters */
+ mem_info->vbase = page_address(alloc_pages_node(phys_id,
+  GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+  get_order(size)));
+ if (!mem_info->vbase)
+ return -ENOMEM;
+
+ /* Init the mutex */
+ core_imc_refc[core_id].id = core_id;
+ mutex_init(&core_imc_refc[core_id].lock);
+
+ rc = opal_imc_counters_init(OPAL_IMC_COUNTERS_CORE,
+ __pa((void *)mem_info->vbase),
+ get_hard_smp_processor_id(cpu));
+ if (rc) {
+ free_pages((u64)mem_info->vbase, get_order(size));
+ mem_info->vbase = NULL;
+ }
+
+ return rc;
+}
+
+static bool is_core_imc_mem_inited(int cpu)
+{
+ struct imc_mem_info *mem_info;
+ int core_id = (cpu / threads_per_core);
+
+ mem_info = &core_imc_pmu->mem_info[core_id];
+ if (!mem_info->vbase)
+ return false;
+
+ return true;
+}
+
+static int ppc_core_imc_cpu_online(unsigned int cpu)
+{
+ const struct cpumask *l_cpumask;
+ static struct cpumask tmp_mask;
+ int ret = 0;
+
+ /* Get the cpumask for this core */
+ l_cpumask = cpu_sibling_mask(cpu);
+
+ /* If a cpu for this core is already set, then, don't do anything */
+ if (cpumask_and(&tmp_mask, l_cpumask, &core_imc_cpumask))
+ return 0;
+
+ if (!is_core_imc_mem_inited(cpu)) {
+ ret = core_imc_mem_init(cpu, core_imc_pmu->counter_mem_size);
+ if (ret) {
+ pr_info("core_imc memory allocation for cpu %d failed\n", cpu);
+ return ret;
+ }
+ }
+
+ /* set the cpu in the mask */
+ cpumask_set_cpu(cpu, &core_imc_cpumask);
+ return 0;
+}
+
+static int ppc_core_imc_cpu_offline(unsigned int cpu)
+{
+ unsigned int ncpu, core_id;
+ struct imc_pmu_ref *ref;
+
+ /*
+ * clear this cpu out of the mask, if not present in the mask,
+ * don't bother doing anything.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask))
+ return 0;
+
+ /* Find any online cpu in that core except the current "cpu" */
+ ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);
+
+ if (ncpu >= 0 && ncpu < nr_cpu_ids) {
+ cpumask_set_cpu(ncpu, &core_imc_cpumask);
+ perf_pmu_migrate_context(&core_imc_pmu->pmu, cpu, ncpu);
+ } else {
+ /*
+ * If this is the last cpu in this core then, skip taking refernce
+ * count mutex lock for this core and directly zero "refc" for
+ * this core.
+ */
+ opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,
+       get_hard_smp_processor_id(cpu));
+ core_id = cpu / threads_per_core;
+ ref = &core_imc_refc[core_id];
+ if (!ref)
+ return -EINVAL;
+
+ ref->refc = 0;
+ }
+ return 0;
+}
+
+static int core_imc_pmu_cpumask_init(void)
+{
+ return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
+ "perf/powerpc/imc_core:online",
+ ppc_core_imc_cpu_online,
+ ppc_core_imc_cpu_offline);
+}
+
+static void core_imc_counters_release(struct perf_event *event)
+{
+ int rc, core_id;
+ struct imc_pmu_ref *ref;
+
+ if (event->cpu < 0)
+ return;
+ /*
+ * See if we need to disable the IMC PMU.
+ * If no events are currently in use, then we have to take a
+ * mutex to ensure that we don't race with another task doing
+ * enable or disable the core counters.
+ */
+ core_id = event->cpu / threads_per_core;
+
+ /* Take the mutex lock and decrement the refernce count for this core */
+ ref = &core_imc_refc[core_id];
+ if (!ref)
+ return;
+
+ mutex_lock(&ref->lock);
+ ref->refc--;
+ if (ref->refc == 0) {
+ rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,
+    get_hard_smp_processor_id(event->cpu));
+ if (rc) {
+ mutex_unlock(&ref->lock);
+ pr_err("IMC: Unable to stop the counters for core %d\n", core_id);
+ return;
+ }
+ } else if (ref->refc < 0) {
+ WARN(1, "core-imc: Invalid event reference count\n");
+ ref->refc = 0;
+ }
+ mutex_unlock(&ref->lock);
+}
+
+static int core_imc_event_init(struct perf_event *event)
+{
+ int core_id, rc;
+ u64 config = event->attr.config;
+ struct imc_mem_info *pcmi;
+ struct imc_pmu *pmu;
+ struct imc_pmu_ref *ref;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user   ||
+    event->attr.exclude_kernel ||
+    event->attr.exclude_hv     ||
+    event->attr.exclude_idle   ||
+    event->attr.exclude_host   ||
+    event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ event->hw.idx = -1;
+ pmu = imc_event_to_pmu(event);
+
+ /* Sanity check for config (event offset) */
+ if (((config & IMC_EVENT_OFFSET_MASK) > pmu->counter_mem_size))
+ return -EINVAL;
+
+ if (!is_core_imc_mem_inited(event->cpu))
+ return -ENODEV;
+
+ core_id = event->cpu / threads_per_core;
+ pcmi = &core_imc_pmu->mem_info[core_id];
+ if ((!pcmi->vbase))
+ return -ENODEV;
+
+ /* Get the core_imc mutex for this core */
+ ref = &core_imc_refc[core_id];
+ if (!ref)
+ return -EINVAL;
+
+ /*
+ * Core pmu units are enabled only when it is used.
+ * See if this is triggered for the first time.
+ * If yes, take the mutex lock and enable the core counters.
+ * If not, just increment the count in core_imc_refc struct.
+ */
+ mutex_lock(&ref->lock);
+ if (ref->refc == 0) {
+ rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE,
+     get_hard_smp_processor_id(event->cpu));
+ if (rc) {
+ mutex_unlock(&ref->lock);
+ pr_err("core-imc: Unable to start the counters for core %d\n",
+ core_id);
+ return rc;
+ }
+ }
+ ++ref->refc;
+ mutex_unlock(&ref->lock);
+
+ event->hw.event_base = (u64)pcmi->vbase + (config & IMC_EVENT_OFFSET_MASK);
+ event->destroy = core_imc_counters_release;
+ return 0;
+}
+
 static u64 * get_event_base_addr(struct perf_event *event)
 {
  /*
@@ -564,12 +806,15 @@ static int update_pmu_ops(struct imc_pmu *pmu)
  pmu->pmu.attr_groups = pmu->attr_groups;
  pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
 
- /* Subsequenct patch will add more pmu types here */
  switch (pmu->domain) {
  case IMC_DOMAIN_NEST:
  pmu->pmu.event_init = nest_imc_event_init;
  pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
  break;
+ case IMC_DOMAIN_CORE:
+ pmu->pmu.event_init = core_imc_event_init;
+ pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
+ break;
  default:
  break;
  }
@@ -621,6 +866,22 @@ static int init_nest_pmu_ref(void)
  return 0;
 }
 
+static void cleanup_all_core_imc_memory(void)
+{
+ int i, nr_cores = num_present_cpus() / threads_per_core;
+ struct imc_mem_info *ptr = core_imc_pmu->mem_info;
+ int size = core_imc_pmu->counter_mem_size;
+
+ /* mem_info will never be NULL */
+ for (i = 0; i < nr_cores; i++) {
+ if (ptr[i].vbase)
+ free_pages((u64)ptr->vbase, get_order(size));
+ }
+
+ kfree(ptr);
+ kfree(core_imc_refc);
+}
+
 /*
  * Common function to unregister cpu hotplug callback and
  * free the memory.
@@ -641,6 +902,12 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr)
  mutex_unlock(&nest_init_lock);
  }
 
+ /* Free core_imc memory */
+ if (pmu_ptr->domain == IMC_DOMAIN_CORE) {
+ cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE);
+ cleanup_all_core_imc_memory();
+ }
+
  /* Only free the attr_groups which are dynamically allocated  */
  kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
  kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
@@ -656,11 +923,11 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
  int pmu_index)
 {
  const char *s;
+ int nr_cores;
 
  if (of_property_read_string(parent, "name", &s))
  return -ENODEV;
 
- /* Subsequenct patch will add more pmu types here */
  switch (pmu_ptr->domain) {
  case IMC_DOMAIN_NEST:
  /* Update the pmu name */
@@ -671,6 +938,27 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
  /* Needed for hotplug/migration */
  per_nest_pmu_arr[pmu_index] = pmu_ptr;
  break;
+ case IMC_DOMAIN_CORE:
+ /* Update the pmu name */
+ pmu_ptr->pmu.name = kasprintf(GFP_KERNEL, "%s%s", s, "_imc");
+ if (!pmu_ptr->pmu.name)
+ return -ENOMEM;
+
+ nr_cores = num_present_cpus() / threads_per_core;
+ pmu_ptr->mem_info = kcalloc(nr_cores, sizeof(struct imc_mem_info),
+ GFP_KERNEL);
+
+ if (!pmu_ptr->mem_info)
+ return -ENOMEM;
+
+ core_imc_refc = kcalloc(nr_cores, sizeof(struct imc_pmu_ref),
+ GFP_KERNEL);
+
+ if (!core_imc_refc)
+ return -ENOMEM;
+
+ core_imc_pmu = pmu_ptr;
+ break;
  default:
  return -EINVAL;
  }
@@ -696,7 +984,6 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id
  if (ret)
  goto err_free;
 
- /* Subsequenct patch will add more pmu types here */
  switch (pmu_ptr->domain) {
  case IMC_DOMAIN_NEST:
  /*
@@ -722,6 +1009,14 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id
  nest_pmus++;
  mutex_unlock(&nest_init_lock);
  break;
+ case IMC_DOMAIN_CORE:
+ ret = core_imc_pmu_cpumask_init();
+ if (ret) {
+ cleanup_all_core_imc_memory();
+ return ret;
+ }
+
+ break;
  default:
  return  -1; /* Unknown domain */
  }
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 05d1907713e9..47eb6a1f2db9 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -138,6 +138,7 @@ enum cpuhp_state {
  CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
  CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
  CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
+ CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
  CPUHP_AP_WORKQUEUE_ONLINE,
  CPUHP_AP_RCUTREE_ONLINE,
  CPUHP_AP_ONLINE_DYN,
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 05/12] powerpc/perf: Add thread IMC PMU support

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Anju T Sudhakar <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

Add support to register Thread In-Memory Collection PMU counters.
Patch adds thread IMC specific data structures, along with memory
init functions and CPU hotplug support.

Signed-off-by: Anju T Sudhakar <[hidden email]>
Signed-off-by: Hemant Kumar <[hidden email]>
Signed-off-by: Madhavan Srinivasan <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from f74c89bd80fb3f1328fdf4a44eeba793cdce4222)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 270 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/cpuhotplug.h  |   1 +
 2 files changed, 267 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 482f8d6d5e65..46cd912af060 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -37,6 +37,12 @@ static cpumask_t core_imc_cpumask;
 struct imc_pmu_ref *core_imc_refc;
 static struct imc_pmu *core_imc_pmu;
 
+/* Thread IMC data structures and variables */
+
+static DEFINE_PER_CPU(u64 *, thread_imc_mem);
+static struct imc_pmu *thread_imc_pmu;
+static int thread_imc_mem_size;
+
 struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
  return container_of(event->pmu, struct imc_pmu, pmu);
@@ -728,15 +734,188 @@ static int core_imc_event_init(struct perf_event *event)
  return 0;
 }
 
-static u64 * get_event_base_addr(struct perf_event *event)
+/*
+ * Allocates a page of memory for each of the online cpus, and write the
+ * physical base address of that page to the LDBAR for that cpu.
+ *
+ * LDBAR Register Layout:
+ *
+ *  0          4         8         12        16        20        24        28
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
+ *   | |       [   ]    [                   Counter Address [8:50]
+ *   | * Mode    |
+ *   |           * PB Scope
+ *   * Enable/Disable
+ *
+ *  32        36        40        44        48        52        56        60
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
+ *           Counter Address [8:50]              ]
+ *
+ */
+static int thread_imc_mem_alloc(int cpu_id, int size)
+{
+ u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, cpu_id);
+ int phys_id = topology_physical_package_id(cpu_id);
+
+ if (!local_mem) {
+ /*
+ * This case could happen only once at start, since we dont
+ * free the memory in cpu offline path.
+ */
+ local_mem = page_address(alloc_pages_node(phys_id,
+  GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+  get_order(size)));
+ if (!local_mem)
+ return -ENOMEM;
+
+ per_cpu(thread_imc_mem, cpu_id) = local_mem;
+ }
+
+ ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | THREAD_IMC_ENABLE;
+
+ mtspr(SPRN_LDBAR, ldbar_value);
+ return 0;
+}
+
+static int ppc_thread_imc_cpu_online(unsigned int cpu)
 {
+ return thread_imc_mem_alloc(cpu, thread_imc_mem_size);
+}
+
+static int ppc_thread_imc_cpu_offline(unsigned int cpu)
+{
+ mtspr(SPRN_LDBAR, 0);
+ return 0;
+}
+
+static int thread_imc_cpu_init(void)
+{
+ return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
+  "perf/powerpc/imc_thread:online",
+  ppc_thread_imc_cpu_online,
+  ppc_thread_imc_cpu_offline);
+}
+
+void thread_imc_pmu_sched_task(struct perf_event_context *ctx,
+      bool sched_in)
+{
+ int core_id;
+ struct imc_pmu_ref *ref;
+
+ if (!is_core_imc_mem_inited(smp_processor_id()))
+ return;
+
+ core_id = smp_processor_id() / threads_per_core;
  /*
- * Subsequent patch will add code to detect caller imc pmu
- * and return accordingly.
+ * imc pmus are enabled only when it is used.
+ * See if this is triggered for the first time.
+ * If yes, take the mutex lock and enable the counters.
+ * If not, just increment the count in ref count struct.
  */
+ ref = &core_imc_refc[core_id];
+ if (!ref)
+ return;
+
+ if (sched_in) {
+ mutex_lock(&ref->lock);
+ if (ref->refc == 0) {
+ if (opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE,
+     get_hard_smp_processor_id(smp_processor_id()))) {
+ mutex_unlock(&ref->lock);
+ pr_err("thread-imc: Unable to start the counter\
+ for core %d\n", core_id);
+ return;
+ }
+ }
+ ++ref->refc;
+ mutex_unlock(&ref->lock);
+ } else {
+ mutex_lock(&ref->lock);
+ ref->refc--;
+ if (ref->refc == 0) {
+ if (opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,
+    get_hard_smp_processor_id(smp_processor_id()))) {
+ mutex_unlock(&ref->lock);
+ pr_err("thread-imc: Unable to stop the counters\
+ for core %d\n", core_id);
+ return;
+ }
+ } else if (ref->refc < 0) {
+ ref->refc = 0;
+ }
+ mutex_unlock(&ref->lock);
+ }
+
+ return;
+}
+
+static int thread_imc_event_init(struct perf_event *event)
+{
+ u32 config = event->attr.config;
+ struct task_struct *target;
+ struct imc_pmu *pmu;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ event->hw.idx = -1;
+ pmu = imc_event_to_pmu(event);
+
+ /* Sanity check for config offset */
+ if (((config & IMC_EVENT_OFFSET_MASK) > pmu->counter_mem_size))
+ return -EINVAL;
+
+ target = event->hw.target;
+ if (!target)
+ return -EINVAL;
+
+ event->pmu->task_ctx_nr = perf_sw_context;
+ return 0;
+}
+
+static bool is_thread_imc_pmu(struct perf_event *event)
+{
+ if (!strncmp(event->pmu->name, "thread_imc", strlen("thread_imc")))
+ return true;
+
+ return false;
+}
+
+static u64 * get_event_base_addr(struct perf_event *event)
+{
+ u64 addr;
+
+ if (is_thread_imc_pmu(event)) {
+ addr = (u64)per_cpu(thread_imc_mem, smp_processor_id());
+ return (u64 *)(addr + (event->attr.config & IMC_EVENT_OFFSET_MASK));
+ }
+
  return (u64 *)event->hw.event_base;
 }
 
+static void thread_imc_pmu_start_txn(struct pmu *pmu,
+     unsigned int txn_flags)
+{
+ if (txn_flags & ~PERF_PMU_TXN_ADD)
+ return;
+ perf_pmu_disable(pmu);
+}
+
+static void thread_imc_pmu_cancel_txn(struct pmu *pmu)
+{
+ perf_pmu_enable(pmu);
+}
+
+static int thread_imc_pmu_commit_txn(struct pmu *pmu)
+{
+ perf_pmu_enable(pmu);
+ return 0;
+}
+
 static u64 imc_read_counter(struct perf_event *event)
 {
  u64 *addr, data;
@@ -794,6 +973,26 @@ static int imc_event_add(struct perf_event *event, int flags)
  return 0;
 }
 
+static int thread_imc_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ imc_event_start(event, flags);
+
+ /* Enable the sched_task to start the engine */
+ perf_sched_cb_inc(event->ctx->pmu);
+ return 0;
+}
+
+static void thread_imc_event_del(struct perf_event *event, int flags)
+{
+ /*
+ * Take a snapshot and calculate the delta and update
+ * the event counter values.
+ */
+ imc_event_update(event);
+ perf_sched_cb_dec(event->ctx->pmu);
+}
+
 /* update_pmu_ops : Populate the appropriate operations for "pmu" */
 static int update_pmu_ops(struct imc_pmu *pmu)
 {
@@ -815,6 +1014,15 @@ static int update_pmu_ops(struct imc_pmu *pmu)
  pmu->pmu.event_init = core_imc_event_init;
  pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
  break;
+ case IMC_DOMAIN_THREAD:
+ pmu->pmu.event_init = thread_imc_event_init;
+ pmu->pmu.sched_task = thread_imc_pmu_sched_task;
+ pmu->pmu.add = thread_imc_event_add;
+ pmu->pmu.del = thread_imc_event_del;
+ pmu->pmu.start_txn = thread_imc_pmu_start_txn;
+ pmu->pmu.cancel_txn = thread_imc_pmu_cancel_txn;
+ pmu->pmu.commit_txn = thread_imc_pmu_commit_txn;
+ break;
  default:
  break;
  }
@@ -882,6 +1090,31 @@ static void cleanup_all_core_imc_memory(void)
  kfree(core_imc_refc);
 }
 
+static void thread_imc_ldbar_disable(void *dummy)
+{
+ /*
+ * By Zeroing LDBAR, we disable thread-imc
+ * updates.
+ */
+ mtspr(SPRN_LDBAR, 0);
+}
+
+void thread_imc_disable(void)
+{
+ on_each_cpu(thread_imc_ldbar_disable, NULL, 1);
+}
+
+static void cleanup_all_thread_imc_memory(void)
+{
+ int i, order = get_order(thread_imc_mem_size);
+
+ for_each_online_cpu(i) {
+ if (per_cpu(thread_imc_mem, i))
+ free_pages((u64)per_cpu(thread_imc_mem, i), order);
+
+ }
+}
+
 /*
  * Common function to unregister cpu hotplug callback and
  * free the memory.
@@ -908,6 +1141,12 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr)
  cleanup_all_core_imc_memory();
  }
 
+ /* Free thread_imc memory */
+ if (pmu_ptr->domain == IMC_DOMAIN_THREAD) {
+ cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE);
+ cleanup_all_thread_imc_memory();
+ }
+
  /* Only free the attr_groups which are dynamically allocated  */
  kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
  kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
@@ -923,7 +1162,7 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
  int pmu_index)
 {
  const char *s;
- int nr_cores;
+ int nr_cores, cpu, res;
 
  if (of_property_read_string(parent, "name", &s))
  return -ENODEV;
@@ -959,6 +1198,21 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
 
  core_imc_pmu = pmu_ptr;
  break;
+ case IMC_DOMAIN_THREAD:
+ /* Update the pmu name */
+ pmu_ptr->pmu.name = kasprintf(GFP_KERNEL, "%s%s", s, "_imc");
+ if (!pmu_ptr->pmu.name)
+ return -ENOMEM;
+
+ thread_imc_mem_size = pmu_ptr->counter_mem_size;
+ for_each_online_cpu(cpu) {
+ res = thread_imc_mem_alloc(cpu, pmu_ptr->counter_mem_size);
+ if (res)
+ return res;
+ }
+
+ thread_imc_pmu = pmu_ptr;
+ break;
  default:
  return -EINVAL;
  }
@@ -1017,6 +1271,14 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id
  }
 
  break;
+ case IMC_DOMAIN_THREAD:
+ ret = thread_imc_cpu_init();
+ if (ret) {
+ cleanup_all_thread_imc_memory();
+ return ret;
+ }
+
+ break;
  default:
  return  -1; /* Unknown domain */
  }
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 47eb6a1f2db9..f24bfb2b9a2d 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -139,6 +139,7 @@ enum cpuhp_state {
  CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
  CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
  CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
+ CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
  CPUHP_AP_WORKQUEUE_ONLINE,
  CPUHP_AP_RCUTREE_ONLINE,
  CPUHP_AP_ONLINE_DYN,
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 06/12] powerpc/perf: Fix double unlock in imc_common_cpuhp_mem_free()

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Dan Carpenter <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

This function is not called with the nest_init_lock held, and it also
unlocks the nest_init_lock immediately below, so it's fairly clear
that this is a typo and should be locking the lock.

Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Dan Carpenter <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from b3376dcc6c62452fe24e76d8fc35bb13eb7ee178)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 46cd912af060..52017f6eafd9 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1124,7 +1124,7 @@ static void cleanup_all_thread_imc_memory(void)
 static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr)
 {
  if (pmu_ptr->domain == IMC_DOMAIN_NEST) {
- mutex_unlock(&nest_init_lock);
+ mutex_lock(&nest_init_lock);
  if (nest_pmus == 1) {
  cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE);
  kfree(nest_imc_refc);
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 07/12] powerpc/perf/imc: Fix nest events on muti socket system

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Anju T <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

In a multi node system with discontiguous node ids, nest event values
are not showing up properly. eg. lscpu output:

  NUMA node0 CPU(s): 0-15
  NUMA node8 CPU(s): 16-31

Nest event values on such systems can be counted on CPUs <= 15:

  $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 0-14 -I 1000 sleep 1000
  #           time             counts unit events
       1.000294577    30,17,24,42,880 nest_powerbus0_imc/PM_PB_CYC/

But not on CPUs >= 16:

  $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
  #           time             counts unit events
       1.000049902    <not supported> nest_powerbus0_imc/PM_PB_CYC/

This is because, when fetching the reference count, the node id (which
may be sparse) is used as the array index, not the node number (which
is 0 based and contiguous).

Fix it by using the node number as the array index.

  $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
  #           time             counts unit events
       1.000241961    26,12,35,28,704 nest_powerbus0_imc/PM_PB_CYC/

Signed-off-by: Anju T Sudhakar <[hidden email]>
[mpe: Change log tweaks for clarity and brevity]
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 7efbae90892b7858f1d4873d34ffffbeb460ed8b)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 52017f6eafd9..a8f95f96d54b 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1064,7 +1064,7 @@ static int init_nest_pmu_ref(void)
  */
  for_each_possible_cpu(cpu) {
  nid = cpu_to_node(cpu);
- for_each_online_node(i) {
+ for (i = 0; i < num_possible_nodes(); i++) {
  if (nest_imc_refc[i].id == nid) {
  per_cpu(local_nest_imc_refc, cpu) = &nest_imc_refc[i];
  break;
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 08/12] powerpc/powernv: Fix build error in opal-imc.c when NUMA=n

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: LABBE Corentin <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

When building a random powerpc kernel I hit this build error:

  arch/powerpc/platforms/powernv/opal-imc.c:130:13: error : assignment
  discards « const » qualifier from pointer target type
  [-Werror=discarded-qualifiers]
     l_cpumask = cpumask_of_node(nid);
             ^

This happens because when CONFIG_NUMA=n cpumask_of_node() returns a
const pointer.

This patch simply adds const to l_cpumask to fix this issue.

Signed-off-by: Corentin Labbe <[hidden email]>
Reviewed-by: Madhavan Srinivasan <[hidden email]>
[mpe: Flesh out change log]
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 6538ac30841638b2fd345725a9fd155c6fe1768a)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/platforms/powernv/opal-imc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index b903bf5e6006..21f6531fae20 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -123,7 +123,7 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
 static void disable_nest_pmu_counters(void)
 {
  int nid, cpu;
- struct cpumask *l_cpumask;
+ const struct cpumask *l_cpumask;
 
  get_online_cpus();
  for_each_online_node(nid) {
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 09/12] powerpc/perf: Fix usage of nest_imc_refc

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Madhavan Srinivasan <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

nest_imc_refc is a reference count struct, used to track number of
active perf sessions using the nest units.

Currently the code accesses nest_imc_refc using node_id, which is
incorrect, the array is indexed by node number. Meaning in the case of
sparse node ids we index off the end of the array.

Fix it to use get_nest_pmu_ref() which uses the existing per-cpu
variable local_nest_imc_refc.

Fixes: 885dcd709ba91 ('powerpc/perf: Add nest IMC PMU support')
Reported-by: Dan Carpenter <[hidden email]>
Signed-off-by: Madhavan Srinivasan <[hidden email]>
[mpe: Tweak change log]
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 711bd207a233141308d0aea0d2e286ee6b4b23cd)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index a8f95f96d54b..9ccac86f3463 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -404,7 +404,7 @@ static void nest_imc_counters_release(struct perf_event *event)
  rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
     get_hard_smp_processor_id(event->cpu));
  if (rc) {
- mutex_unlock(&nest_imc_refc[node_id].lock);
+ mutex_unlock(&ref->lock);
  pr_err("nest-imc: Unable to stop the counters for core %d\n", node_id);
  return;
  }
@@ -487,7 +487,7 @@ static int nest_imc_event_init(struct perf_event *event)
  rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_NEST,
      get_hard_smp_processor_id(event->cpu));
  if (rc) {
- mutex_unlock(&nest_imc_refc[node_id].lock);
+ mutex_unlock(&ref->lock);
  pr_err("nest-imc: Unable to start the counters for node %d\n",
  node_id);
  return rc;
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 10/12] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Anju T Sudhakar <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count.

The case where event ref count hit a negative value is, when perf session is
started, followed by offlining of all cpus in a given core.
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
ref->count to zero, if the current cpu which is about to offline is the last
cpu in a given core and make an OPAL call to disable the engine in that core.
And on perf session termination, perf->destroy (core_imc_counters_release) will
first decrement the ref->count for this core and based on the ref->count value
an opal call is made to disable the core-imc engine.
Now, since cpuhotplug path already clears the ref->count for core and disabled
the engine, perf->destroy() decrementing again at event termination make it
negative which in turn fires the WARN_ON. The same happens for nest units.

Add a check to see if the reference count is alreday zero, before decrementing
the count, so that the ref count will not hit a negative value.

Signed-off-by: Anju T Sudhakar <[hidden email]>
Reviewed-by: Santosh Sivaraj <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 0d923820c6db1644c27c2d0a5af8920fc0f8cd81)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9ccac86f3463..e3a1f65933b5 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event *event)
 
  /* Take the mutex lock for this node and then decrement the reference count */
  mutex_lock(&ref->lock);
+ if (ref->refc == 0) {
+ /*
+ * The scenario where this is true is, when perf session is
+ * started, followed by offlining of all cpus in a given node.
+ *
+ * In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
+ * function set the ref->count to zero, if the cpu which is
+ * about to offline is the last cpu in a given node and make
+ * an OPAL call to disable the engine in that node.
+ *
+ */
+ mutex_unlock(&ref->lock);
+ return;
+ }
  ref->refc--;
  if (ref->refc == 0) {
  rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
@@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event *event)
  return;
 
  mutex_lock(&ref->lock);
+ if (ref->refc == 0) {
+ /*
+ * The scenario where this is true is, when perf session is
+ * started, followed by offlining of all cpus in a given core.
+ *
+ * In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
+ * function set the ref->count to zero, if the cpu which is
+ * about to offline is the last cpu in a given core and make
+ * an OPAL call to disable the engine in that core.
+ *
+ */
+ mutex_unlock(&ref->lock);
+ return;
+ }
  ref->refc--;
  if (ref->refc == 0) {
  rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 11/12] powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Anju T Sudhakar <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

Stack trace output during a stress test:
 [    4.310049] Freeing initrd memory: 22592K
[    4.310646] rtas_flash: no firmware flash support
[    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
[    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
[    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
[    4.313588] Call Trace:
[    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
[    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
[    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
[    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
[    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
[    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
[    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
[    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
[    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
[    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
[    4.314313] Mem-Info:
[    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0

core_imc_mem_init() at system boot use alloc_pages_node() to get memory
and alloc_pages_node() throws this stack dump when tried to allocate
memory from a node which has no memory behind it. Add a ___GFP_NOWARN
flag in allocation request as a fix.

Signed-off-by: Anju T Sudhakar <[hidden email]>
Reported-by: Michael Ellerman <[hidden email]>
Reported-by: Venkat R.B <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from cd4f2b30e5ef7d4bde61eb515372d96e8aec1690)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index e3a1f65933b5..39a0203fd8a5 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -537,8 +537,8 @@ static int core_imc_mem_init(int cpu, int size)
 
  /* We need only vbase for core counters */
  mem_info->vbase = page_address(alloc_pages_node(phys_id,
-  GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
-  get_order(size)));
+  GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE |
+  __GFP_NOWARN, get_order(size)));
  if (!mem_info->vbase)
  return -ENOMEM;
 
@@ -791,8 +791,8 @@ static int thread_imc_mem_alloc(int cpu_id, int size)
  * free the memory in cpu offline path.
  */
  local_mem = page_address(alloc_pages_node(phys_id,
-  GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
-  get_order(size)));
+  GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE |
+  __GFP_NOWARN, get_order(size)));
  if (!local_mem)
  return -ENOMEM;
 
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Artful][PATCH 12/12] powerpc/perf: Fix IMC initialization crash

Gustavo Walbon
In reply to this post by Gustavo Walbon
From: Anju T Sudhakar <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1481347

Panic observed with latest firmware, and upstream kernel:

 NIP init_imc_pmu+0x8c/0xcf0
 LR  init_imc_pmu+0x2f8/0xcf0
 Call Trace:
   init_imc_pmu+0x2c8/0xcf0 (unreliable)
   opal_imc_counters_probe+0x300/0x400
   platform_drv_probe+0x64/0x110
   driver_probe_device+0x3d8/0x580
   __driver_attach+0x14c/0x1a0
   bus_for_each_dev+0x8c/0xf0
   driver_attach+0x34/0x50
   bus_add_driver+0x298/0x350
   driver_register+0x9c/0x180
   __platform_driver_register+0x5c/0x70
   opal_imc_driver_init+0x2c/0x40
   do_one_initcall+0x64/0x1d0
   kernel_init_freeable+0x280/0x374
   kernel_init+0x24/0x160
   ret_from_kernel_thread+0x5c/0x74

While registering nest imc at init, cpu-hotplug callback
nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
memory and cpuhotplug setup.

But when cleaning up the attribute group, we are dereferencing the
attribute element array without checking whether the backing element
is not NULL. This causes the kernel panic.

Add a check for the backing element prior to dereferencing the
attribute element, to handle the failing case gracefully.

Signed-off-by: Anju T Sudhakar <[hidden email]>
Reported-by: Pridhiviraj Paidipeddi <[hidden email]>
[mpe: Trim change log]
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry-picked from 0d8ba16278ec30a262d931875018abee332f926f)
Signed-off-by: Gustavo Walbon <[hidden email]>
---
 arch/powerpc/perf/imc-pmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 39a0203fd8a5..88126245881b 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1176,7 +1176,8 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr)
  }
 
  /* Only free the attr_groups which are dynamically allocated  */
- kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
+ if (pmu_ptr->attr_groups[IMC_EVENT_ATTR])
+ kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
  kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
  kfree(pmu_ptr);
  return;
--
2.13.3


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [Artful][PATCH 00/12] Backport for Power9 Nest PMU Instrumentation

Stefan Bader-2
In reply to this post by Gustavo Walbon
On 02.11.2017 15:04, Gustavo Walbon wrote:

> BugLink: https://bugs.launchpad.net/bugs/1481347
>
> This patchset enable the Power9 Nest PMU Counter to enable it for perf tool.
>
> These Nest PMU Counters provide per-chip metrics such memory bandwidth, Powerbus
> bandwidth and so on.
>
> For test it needs Power9 machine with latest version of BMC and the minimum
> skiboot version v5.9-rc5.
>
> Anju T (1):
>   powerpc/perf/imc: Fix nest events on muti socket system
>
> Anju T Sudhakar (6):
>   powerpc/perf: Add nest IMC PMU support
>   powerpc/perf: Add core IMC PMU support
>   powerpc/perf: Add thread IMC PMU support
>   powerpc/perf: Fix for core/nest imc call trace on cpuhotplug
>   powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()
>   powerpc/perf: Fix IMC initialization crash
>
> Dan Carpenter (1):
>   powerpc/perf: Fix double unlock in imc_common_cpuhp_mem_free()
>
> LABBE Corentin (1):
>   powerpc/powernv: Fix build error in opal-imc.c when NUMA=n
>
> Madhavan Srinivasan (3):
>   powerpc/powernv: Add IMC OPAL APIs
>   powerpc/powernv: Detect and create IMC device
>   powerpc/perf: Fix usage of nest_imc_refc
>
>  arch/powerpc/include/asm/imc-pmu.h             |  128 +++
>  arch/powerpc/include/asm/opal-api.h            |   12 +-
>  arch/powerpc/include/asm/opal.h                |    5 +
>  arch/powerpc/perf/Makefile                     |    1 +
>  arch/powerpc/perf/imc-pmu.c                    | 1335 ++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/Makefile        |    1 +
>  arch/powerpc/platforms/powernv/opal-imc.c      |  226 ++++
>  arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
>  arch/powerpc/platforms/powernv/opal.c          |   14 +
>  include/linux/cpuhotplug.h                     |    3 +
>  10 files changed, 1727 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/include/asm/imc-pmu.h
>  create mode 100644 arch/powerpc/perf/imc-pmu.c
>  create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c
>
As Artful is released now, the bug report needs SRU justification info added.
Also, is this part of upstream 4.14 or would Bionic still need to pick up the set?

Generally looks isolated and testable, so

Acked-by: Stefan Bader <[hidden email]>

-Stefan


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [Artful][PATCH 00/12] Backport for Power9 Nest PMU Instrumentation

Kleber Souza
In reply to this post by Gustavo Walbon
On 11/02/17 15:04, Gustavo Walbon wrote:

> BugLink: https://bugs.launchpad.net/bugs/1481347
>
> This patchset enable the Power9 Nest PMU Counter to enable it for perf tool.
>
> These Nest PMU Counters provide per-chip metrics such memory bandwidth, Powerbus
> bandwidth and so on.
>
> For test it needs Power9 machine with latest version of BMC and the minimum
> skiboot version v5.9-rc5.
>
> Anju T (1):
>   powerpc/perf/imc: Fix nest events on muti socket system
>
> Anju T Sudhakar (6):
>   powerpc/perf: Add nest IMC PMU support
>   powerpc/perf: Add core IMC PMU support
>   powerpc/perf: Add thread IMC PMU support
>   powerpc/perf: Fix for core/nest imc call trace on cpuhotplug
>   powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()
>   powerpc/perf: Fix IMC initialization crash
>
> Dan Carpenter (1):
>   powerpc/perf: Fix double unlock in imc_common_cpuhp_mem_free()
>
> LABBE Corentin (1):
>   powerpc/powernv: Fix build error in opal-imc.c when NUMA=n
>
> Madhavan Srinivasan (3):
>   powerpc/powernv: Add IMC OPAL APIs
>   powerpc/powernv: Detect and create IMC device
>   powerpc/perf: Fix usage of nest_imc_refc
>
>  arch/powerpc/include/asm/imc-pmu.h             |  128 +++
>  arch/powerpc/include/asm/opal-api.h            |   12 +-
>  arch/powerpc/include/asm/opal.h                |    5 +
>  arch/powerpc/perf/Makefile                     |    1 +
>  arch/powerpc/perf/imc-pmu.c                    | 1335 ++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/Makefile        |    1 +
>  arch/powerpc/platforms/powernv/opal-imc.c      |  226 ++++
>  arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
>  arch/powerpc/platforms/powernv/opal.c          |   14 +
>  include/linux/cpuhotplug.h                     |    3 +
>  10 files changed, 1727 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/include/asm/imc-pmu.h
>  create mode 100644 arch/powerpc/perf/imc-pmu.c
>  create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c
>

As Stefan pointed out, please provide the SRU justification info in the bug.

Apart from that, the changes are limited to powerpc and powerpc/perf and
are clean cherry picks, so little regression potential.

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: ACK/cmnt: [Artful][PATCH 00/12] Backport for Power9 Nest PMU Instrumentation

Gustavo Walbon


On 15/11/17 11:12, Kleber Souza wrote:

> On 11/02/17 15:04, Gustavo Walbon wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1481347
>>
>> This patchset enable the Power9 Nest PMU Counter to enable it for perf tool.
>>
>> These Nest PMU Counters provide per-chip metrics such memory bandwidth, Powerbus
>> bandwidth and so on.
>>
>> For test it needs Power9 machine with latest version of BMC and the minimum
>> skiboot version v5.9-rc5.
>>
>> Anju T (1):
>>    powerpc/perf/imc: Fix nest events on muti socket system
>>
>> Anju T Sudhakar (6):
>>    powerpc/perf: Add nest IMC PMU support
>>    powerpc/perf: Add core IMC PMU support
>>    powerpc/perf: Add thread IMC PMU support
>>    powerpc/perf: Fix for core/nest imc call trace on cpuhotplug
>>    powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node()
>>    powerpc/perf: Fix IMC initialization crash
>>
>> Dan Carpenter (1):
>>    powerpc/perf: Fix double unlock in imc_common_cpuhp_mem_free()
>>
>> LABBE Corentin (1):
>>    powerpc/powernv: Fix build error in opal-imc.c when NUMA=n
>>
>> Madhavan Srinivasan (3):
>>    powerpc/powernv: Add IMC OPAL APIs
>>    powerpc/powernv: Detect and create IMC device
>>    powerpc/perf: Fix usage of nest_imc_refc
>>
>>   arch/powerpc/include/asm/imc-pmu.h             |  128 +++
>>   arch/powerpc/include/asm/opal-api.h            |   12 +-
>>   arch/powerpc/include/asm/opal.h                |    5 +
>>   arch/powerpc/perf/Makefile                     |    1 +
>>   arch/powerpc/perf/imc-pmu.c                    | 1335 ++++++++++++++++++++++++
>>   arch/powerpc/platforms/powernv/Makefile        |    1 +
>>   arch/powerpc/platforms/powernv/opal-imc.c      |  226 ++++
>>   arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
>>   arch/powerpc/platforms/powernv/opal.c          |   14 +
>>   include/linux/cpuhotplug.h                     |    3 +
>>   10 files changed, 1727 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/powerpc/include/asm/imc-pmu.h
>>   create mode 100644 arch/powerpc/perf/imc-pmu.c
>>   create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c
>>
> As Stefan pointed out, please provide the SRU justification info in the bug.
>
> Apart from that, the changes are limited to powerpc and powerpc/perf and
> are clean cherry picks, so little regression potential.
>
> Acked-by: Kleber Sacilotto de Souza <[hidden email]>
>
[SRU Justification]
This feature provides the latest core and nest PMU events which it comes
with the skiboot on Power P9 Arch.

Those patches are available on mainline kernel tree in
tags/powerpc-4.14-1~314 .

The commit is a clean cherry pick in Artful.

[Testcase]
1 - Verify if the perf has the PMU events available via the latest
version of skiboot.
# perf list | grep core_imc
# perf list | grep nest_
# perf list | grep thread_imc

2 - Test couple of them like that:
# ./perf stat -e nest_mcs0/PM_MCS_UP_128B_DATA_XFER_MC2/ -a -A sleep 5


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [Artful][PATCH 00/12] Backport for Power9 Nest PMU Instrumentation

Thadeu Lima de Souza Cascardo-3
In reply to this post by Gustavo Walbon
Applied to artful master-next branch.

Thanks.
Cascardo.

Applied-to: artful/master-next

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team