[Bionic][PATCH 00/12] Enable NVLink2 devices on guests

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 00/12] Enable NVLink2 devices on guests

Jose Ricardo Ziviani-2
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989
Based on master-next (currently UBUNTU: Ubuntu-4.15.0-47.50)

This patchset enables QEMU/KVM guests to use passed through NVLink2 devices.

The host side runs a custom version of Ubuntu Bionic (kernel/QEMU) but it is very important for clients to use the image that they get from Canonical's website. To accomplish that we've worked on a small patchset to cover only the guest side - avoiding changes beyond that. All patches are upstream.

Thank you very much,

Jose R. Ziviani

Alexey Kardashevskiy (9):
  powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit
    table is enabled
  powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
  powerpc/powernv: Move npu struct from pnv_phb to pci_controller
  powerpc/powernv/npu: Move OPAL calls away from context manipulation
  powerpc/pseries/iommu: Use memory@ nodes in max RAM address
    calculation
  powerpc/pseries/npu: Enable platform support
  powerpc/pseries: Remove IOMMU API support for non-LPAR systems
  powerpc/powernv/npu: Check mmio_atsd array bounds when populating
  powerpc/powernv/npu: Fault user page into the hypervisor's pagetable

Haren Myneni (1):
  powerpc/powernv: Export opal_check_token symbol

Nicholas Piggin (1):
  powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET

Vaibhav Jain (1):
  powerpc/powernv: Make possible for user to force a full ipl cec reboot

 arch/powerpc/include/asm/opal-api.h           |   8 +
 arch/powerpc/include/asm/opal.h               |   1 +
 arch/powerpc/include/asm/pci-bridge.h         |   3 +-
 arch/powerpc/include/asm/pci.h                |   3 +
 arch/powerpc/platforms/powernv/npu-dma.c      | 198 +++++++++++-------
 .../powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c         |   1 +
 arch/powerpc/platforms/powernv/pci-ioda.c     |  45 +++-
 arch/powerpc/platforms/powernv/pci.h          |  19 +-
 arch/powerpc/platforms/powernv/setup.c        |  36 +++-
 arch/powerpc/platforms/powernv/smp.c          |  17 +-
 arch/powerpc/platforms/pseries/iommu.c        |  42 +++-
 arch/powerpc/platforms/pseries/pci.c          |  23 ++
 13 files changed, 282 insertions(+), 115 deletions(-)

--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 01/12] powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is enabled

Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

GPUs and the corresponding NVLink bridges get different PEs as they
have separate translation validation entries (TVEs). We put these PEs
to the same IOMMU group so they cannot be passed through separately.
So the iommu_table_group_ops::set_window/unset_window for GPUs do set
tables to the NPU PEs as well which means that iommu_table's list of
attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked.
This list is used for TCE cache invalidation.

The problem is that NPU PE has just a single TVE and can be programmed
to point to 32bit or 64bit windows while GPU PE has two (as any other
PCI device). So we end up having an 32bit iommu_table struct linked to
both PEs even though only the 64bit TCE table cache can be invalidated
on NPU. And a relatively recent skiboot detects this and prints
errors.

This changes GPU's iommu_table_group_ops::set_window/unset_window to
make sure that NPU PE is only linked to the table actually used by the
hardware. If there are two tables used by an IOMMU group, the NPU PE
will use the last programmed one which with the current use scenarios
is expected to be a 64bit one.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit d41ce7b1bcc3e1d02cc9da3b83c0fe355fcb68e0)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 27 ++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d32397523cd9..f2f3e8b612e5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2676,14 +2676,23 @@ static struct pnv_ioda_pe *gpe_table_group_to_npe(
 static long pnv_pci_ioda2_npu_set_window(struct iommu_table_group *table_group,
  int num, struct iommu_table *tbl)
 {
+ struct pnv_ioda_pe *npe = gpe_table_group_to_npe(table_group);
+ int num2 = (num == 0) ? 1 : 0;
  long ret = pnv_pci_ioda2_set_window(table_group, num, tbl);
 
  if (ret)
  return ret;
 
- ret = pnv_npu_set_window(gpe_table_group_to_npe(table_group), num, tbl);
- if (ret)
+ if (table_group->tables[num2])
+ pnv_npu_unset_window(npe, num2);
+
+ ret = pnv_npu_set_window(npe, num, tbl);
+ if (ret) {
  pnv_pci_ioda2_unset_window(table_group, num);
+ if (table_group->tables[num2])
+ pnv_npu_set_window(npe, num2,
+ table_group->tables[num2]);
+ }
 
  return ret;
 }
@@ -2692,12 +2701,24 @@ static long pnv_pci_ioda2_npu_unset_window(
  struct iommu_table_group *table_group,
  int num)
 {
+ struct pnv_ioda_pe *npe = gpe_table_group_to_npe(table_group);
+ int num2 = (num == 0) ? 1 : 0;
  long ret = pnv_pci_ioda2_unset_window(table_group, num);
 
  if (ret)
  return ret;
 
- return pnv_npu_unset_window(gpe_table_group_to_npe(table_group), num);
+ if (!npe->table_group.tables[num])
+ return 0;
+
+ ret = pnv_npu_unset_window(npe, num);
+ if (ret)
+ return ret;
+
+ if (table_group->tables[num2])
+ ret = pnv_npu_set_window(npe, num2, table_group->tables[num2]);
+
+ return ret;
 }
 
 static void pnv_ioda2_npu_take_ownership(struct iommu_table_group *table_group)
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 02/12] powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Nicholas Piggin <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

Although it is often possible to recover a CPU that was interrupted
from OPAL with a system reset NMI, it's undesirable to interrupt them
for a few reasons. Firstly because dump/debug code itself needs to
call firmware, so it could hang on a lock or possibly corrupt a
per-cpu data structure if it or another CPU was interrupted from
OPAL. Secondly, the kexec crash dump code will not return from
interrupt to unwind the OPAL call.

Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to
another CPU, which wait for it to leave firmware (or time out) to
avoid this problem in normal conditions. Firmware bugs may still
result in a timeout and interrupting OPAL, but that is the best
option (stops the CPU, and possibly allows firmware to be debugged).

Signed-off-by: Nicholas Piggin <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit ee03b9b4479d1302d01cebedda3518dc967697b7)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/include/asm/opal-api.h            |  7 +++++++
 arch/powerpc/include/asm/opal.h                |  1 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 arch/powerpc/platforms/powernv/smp.c           | 17 ++++++++++++++++-
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 789223f2a095..402f2cd80f9e 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -201,6 +201,7 @@
 #define OPAL_SET_POWER_SHIFT_RATIO 155
 #define OPAL_SENSOR_GROUP_CLEAR 156
 #define OPAL_PCI_SET_P2P 157
+#define OPAL_QUIESCE 158
 #define OPAL_NPU_SPA_SETUP 159
 #define OPAL_NPU_SPA_CLEAR_CACHE 160
 #define OPAL_NPU_TL_SET 161
@@ -208,6 +209,12 @@
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165
 #define OPAL_LAST 165
 
+#define QUIESCE_HOLD 1 /* Spin all calls at entry */
+#define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
+#define QUIESCE_LOCK_BREAK 3 /* Set to ignore locks. */
+#define QUIESCE_RESUME 4 /* Un-quiesce */
+#define QUIESCE_RESUME_FAST_REBOOT 5 /* Un-quiesce, fast reboot */
+
 /* Device tree flags */
 
 /*
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index c3f0ee833319..5e3dd3bf7602 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -293,6 +293,7 @@ int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 int opal_sensor_group_clear(u32 group_hndl, int token);
 
 s64 opal_signal_system_reset(s32 cpu);
+s64 opal_quiesce(u64 shutdown_type, s32 cpu);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 3da30c2f26b4..5eb1466665a4 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -320,6 +320,7 @@ OPAL_CALL(opal_set_powercap, OPAL_SET_POWERCAP);
 OPAL_CALL(opal_get_power_shift_ratio, OPAL_GET_POWER_SHIFT_RATIO);
 OPAL_CALL(opal_set_power_shift_ratio, OPAL_SET_POWER_SHIFT_RATIO);
 OPAL_CALL(opal_sensor_group_clear, OPAL_SENSOR_GROUP_CLEAR);
+OPAL_CALL(opal_quiesce, OPAL_QUIESCE);
 OPAL_CALL(opal_npu_spa_setup, OPAL_NPU_SPA_SETUP);
 OPAL_CALL(opal_npu_spa_clear_cache, OPAL_NPU_SPA_CLEAR_CACHE);
 OPAL_CALL(opal_npu_tl_set, OPAL_NPU_TL_SET);
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 9664c8461f03..09c156203898 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -334,7 +334,16 @@ static int pnv_cause_nmi_ipi(int cpu)
  int64_t rc;
 
  if (cpu >= 0) {
- rc = opal_signal_system_reset(get_hard_smp_processor_id(cpu));
+ int h = get_hard_smp_processor_id(cpu);
+
+ if (opal_check_token(OPAL_QUIESCE))
+ opal_quiesce(QUIESCE_HOLD, h);
+
+ rc = opal_signal_system_reset(h);
+
+ if (opal_check_token(OPAL_QUIESCE))
+ opal_quiesce(QUIESCE_RESUME, h);
+
  if (rc != OPAL_SUCCESS)
  return 0;
  return 1;
@@ -343,6 +352,8 @@ static int pnv_cause_nmi_ipi(int cpu)
  bool success = true;
  int c;
 
+ if (opal_check_token(OPAL_QUIESCE))
+ opal_quiesce(QUIESCE_HOLD, -1);
 
  /*
  * We do not use broadcasts (yet), because it's not clear
@@ -358,6 +369,10 @@ static int pnv_cause_nmi_ipi(int cpu)
  if (rc != OPAL_SUCCESS)
  success = false;
  }
+
+ if (opal_check_token(OPAL_QUIESCE))
+ opal_quiesce(QUIESCE_RESUME, -1);
+
  if (success)
  return 1;
 
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 03/12] powerpc/powernv: Export opal_check_token symbol

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Haren Myneni <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.

Signed-off-by: Haren Myneni <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 6e708000ec2c93c2bde6a46aa2d6c3e80d4eaeb9)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/platforms/powernv/opal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 1dc4f9f65a36..4bd365fad701 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -930,6 +930,7 @@ EXPORT_SYMBOL_GPL(opal_flash_read);
 EXPORT_SYMBOL_GPL(opal_flash_write);
 EXPORT_SYMBOL_GPL(opal_flash_erase);
 EXPORT_SYMBOL_GPL(opal_prd_msg);
+EXPORT_SYMBOL_GPL(opal_check_token);
 
 /* Convert a region of vmalloc memory to an opal sg list */
 struct opal_sg_list *opal_vmalloc_to_sg_list(void *vmalloc_addr,
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 04/12] powerpc/powernv: Make possible for user to force a full ipl cec reboot

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Vaibhav Jain <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

Ever since fast reboot is enabled by default in opal,
opal_cec_reboot() will use fast-reset instead of full IPL to perform
system reboot. This leaves the user with no direct way to force a full
IPL reboot except changing an nvram setting that persistently disables
fast-reset for all subsequent reboots.

This patch provides a more direct way for the user to force a one-shot
full IPL reboot by passing the command line argument 'full' to the
reboot command. So the user will be able to tweak the reboot behavior
via:

  $ sudo reboot full # Force a full ipl reboot skipping fast-reset

  or
  $ sudo reboot   # default reboot path (usually fast-reset)

The reboot command passes the un-parsed command argument to the kernel
via the 'Reboot' syscall which is then passed on to the arch function
pnv_restart(). The patch updates pnv_restart() to handle this cmd-arg
and issues opal_cec_reboot2 with OPAL_REBOOT_FULL_IPL to force a full
IPL reset.

Signed-off-by: Vaibhav Jain <[hidden email]>
Acked-by: Andrew Donnellan <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 8139046a5a34787849df81f4a5875cf4b404a7a1)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/include/asm/opal-api.h    |  1 +
 arch/powerpc/platforms/powernv/setup.c | 36 +++++++++++++++++++++-----
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 402f2cd80f9e..e2515843b37e 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -1047,6 +1047,7 @@ enum OpalSysCooling {
 enum {
  OPAL_REBOOT_NORMAL = 0,
  OPAL_REBOOT_PLATFORM_ERROR = 1,
+ OPAL_REBOOT_FULL_IPL = 2,
 };
 
 /* Argument to OPAL_PCI_TCE_KILL */
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 34e36f91a38e..be7eac602402 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -217,17 +217,41 @@ static void pnv_prepare_going_down(void)
 
 static void  __noreturn pnv_restart(char *cmd)
 {
- long rc = OPAL_BUSY;
+ long rc;
 
  pnv_prepare_going_down();
 
- while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
- rc = opal_cec_reboot();
- if (rc == OPAL_BUSY_EVENT)
- opal_poll_events(NULL);
+ do {
+ if (!cmd)
+ rc = opal_cec_reboot();
+ else if (strcmp(cmd, "full") == 0)
+ rc = opal_cec_reboot2(OPAL_REBOOT_FULL_IPL, NULL);
  else
+ rc = OPAL_UNSUPPORTED;
+
+ if (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
+ /* Opal is busy wait for some time and retry */
+ opal_poll_events(NULL);
  mdelay(10);
- }
+
+ } else if (cmd && rc) {
+ /* Unknown error while issuing reboot */
+ if (rc == OPAL_UNSUPPORTED)
+ pr_err("Unsupported '%s' reboot.\n", cmd);
+ else
+ pr_err("Unable to issue '%s' reboot. Err=%ld\n",
+       cmd, rc);
+ pr_info("Forcing a cec-reboot\n");
+ cmd = NULL;
+ rc = OPAL_BUSY;
+
+ } else if (rc != OPAL_SUCCESS) {
+ /* Unknown error while issuing cec-reboot */
+ pr_err("Unable to reboot. Err=%ld\n", rc);
+ }
+
+ } while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT);
+
  for (;;)
  opal_poll_events(NULL);
 }
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 05/12] powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

The pcidev value stored in pci_dn is only used for NPU/NPU2
initialization. We can easily drop the cached pointer and
use an ancient helper - pci_get_domain_bus_and_slot() instead in order
to reduce complexity.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Acked-by: Russell Currey <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 902bdc57451c2c64aa139bbe24067f70a186db0a)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/include/asm/pci-bridge.h     | 2 --
 arch/powerpc/platforms/powernv/npu-dma.c  | 5 ++++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 62ed83db04ae..7c5b0b867d3a 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -197,8 +197,6 @@ struct pci_dn {
  struct iommu_table_group *table_group; /* for phb's or bridges */
 
  int pci_ext_config_space; /* for pci devices */
-
- struct pci_dev *pcidev; /* back-pointer to the pci device */
 #ifdef CONFIG_EEH
  struct eeh_dev *edev; /* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 18226895681e..d5ed73d1614e 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -52,7 +52,10 @@ static DEFINE_SPINLOCK(npu_context_lock);
  */
 static struct pci_dev *get_pci_dev(struct device_node *dn)
 {
- return PCI_DN(dn)->pcidev;
+ struct pci_dn *pdn = PCI_DN(dn);
+
+ return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
+   pdn->busno, pdn->devfn);
 }
 
 /* Given a NPU device get the associated PCI device. */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f2f3e8b612e5..abe5245edaa2 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1073,7 +1073,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
  * At some point we want to remove the PDN completely anyways
  */
  pci_dev_get(dev);
- pdn->pcidev = dev;
  pdn->pe_number = pe->pe_number;
  pe->flags = PNV_IODA_PE_DEV;
  pe->pdev = dev;
@@ -1120,7 +1119,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, struct pnv_ioda_pe *pe)
  continue;
 
  pe->device_count++;
- pdn->pcidev = dev;
  pdn->pe_number = pe->pe_number;
  if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
  pnv_ioda_setup_same_PE(dev->subordinate, pe);
@@ -1235,7 +1233,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
  pci_dev_get(npu_pdev);
  npu_pdn = pci_get_pdn(npu_pdev);
  rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
- npu_pdn->pcidev = npu_pdev;
  npu_pdn->pe_number = pe_num;
  phb->ioda.pe_rmap[rid] = pe->pe_number;
 
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 06/12] powerpc/powernv: Move npu struct from pnv_phb to pci_controller

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

The powernv PCI code stores NPU data in the pnv_phb struct. The latter
is referenced by pci_controller::private_data. We are going to have NPU2
support in the pseries platform as well but it does not store any
private_data in in the pci_controller struct; and even if it did,
it would be a different data structure.

This makes npu a pointer and stores it one level higher in
the pci_controller struct.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(backported from commit 46a1449d9e39478a35d35d9d9025776f6cee24fb)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/include/asm/pci-bridge.h     |  1 +
 arch/powerpc/platforms/powernv/npu-dma.c  | 74 +++++++++++++++++------
 arch/powerpc/platforms/powernv/pci-ioda.c |  2 +-
 arch/powerpc/platforms/powernv/pci.h      | 17 +-----
 4 files changed, 59 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 7c5b0b867d3a..4354187195dc 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -129,6 +129,7 @@ struct pci_controller {
 #endif /* CONFIG_PPC64 */
 
  void *private_data;
+ struct npu *npu;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index d5ed73d1614e..a1d5e4905f6e 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -400,6 +400,25 @@ struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
  return gpe;
 }
 
+/*
+ * NPU2 ATS
+ */
+/* Maximum possible number of ATSD MMIO registers per NPU */
+#define NV_NMMU_ATSD_REGS 8
+
+/* An NPU descriptor, valid for POWER9 only */
+struct npu {
+ int index;
+ __be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
+ unsigned int mmio_atsd_count;
+
+ /* Bitmask for MMIO register usage */
+ unsigned long mmio_atsd_usage;
+
+ /* Do we need to explicitly flush the nest mmu? */
+ bool nmmu_flush;
+};
+
 /* Maximum number of nvlinks per npu */
 #define NV_MAX_LINKS 6
 
@@ -558,7 +577,6 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
  int i, j;
  struct npu *npu;
  struct pci_dev *npdev;
- struct pnv_phb *nphb;
 
  for (i = 0; i <= max_npu2_index; i++) {
  mmio_atsd_reg[i].reg = -1;
@@ -573,8 +591,7 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
  if (!npdev)
  continue;
 
- nphb = pci_bus_to_host(npdev->bus)->private_data;
- npu = &nphb->npu;
+ npu = pci_bus_to_host(npdev->bus)->npu;
  mmio_atsd_reg[i].npu = npu;
  mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
  while (mmio_atsd_reg[i].reg < 0) {
@@ -733,6 +750,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
  struct pnv_phb *nphb;
  struct npu *npu;
  struct npu_context *npu_context;
+ struct pci_controller *hose;
 
  /*
  * At present we don't support GPUs connected to multiple NPUs and I'm
@@ -760,8 +778,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
  return ERR_PTR(-EINVAL);
  }
 
- nphb = pci_bus_to_host(npdev->bus)->private_data;
- npu = &nphb->npu;
+ hose = pci_bus_to_host(npdev->bus);
+ nphb = hose->private_data;
+ npu = hose->npu;
 
  /*
  * Setup the NPU context table for a particular GPU. These need to be
@@ -835,7 +854,7 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
  */
  WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], npdev);
 
- if (!nphb->npu.nmmu_flush) {
+ if (!npu->nmmu_flush) {
  /*
  * If we're not explicitly flushing ourselves we need to mark
  * the thread for global flushes
@@ -873,6 +892,7 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
  struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
  struct device_node *nvlink_dn;
  u32 nvlink_index;
+ struct pci_controller *hose;
 
  if (WARN_ON(!npdev))
  return;
@@ -880,8 +900,9 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
  if (!firmware_has_feature(FW_FEATURE_OPAL))
  return;
 
- nphb = pci_bus_to_host(npdev->bus)->private_data;
- npu = &nphb->npu;
+ hose = pci_bus_to_host(npdev->bus);
+ nphb = hose->private_data;
+ npu = hose->npu;
  nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
  if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
  &nvlink_index)))
@@ -959,9 +980,15 @@ int pnv_npu2_init(struct pnv_phb *phb)
  struct pci_dev *gpdev;
  static int npu_index;
  uint64_t rc = 0;
+ struct pci_controller *hose = phb->hose;
+ struct npu *npu;
+ int ret;
 
- phb->npu.nmmu_flush =
- of_property_read_bool(phb->hose->dn, "ibm,nmmu-flush");
+ npu = kzalloc(sizeof(*npu), GFP_KERNEL);
+ if (!npu)
+ return -ENOMEM;
+
+ npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
  for_each_child_of_node(phb->hose->dn, dn) {
  gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
  if (gpdev) {
@@ -975,18 +1002,29 @@ int pnv_npu2_init(struct pnv_phb *phb)
  }
  }
 
- for (i = 0; !of_property_read_u64_index(phb->hose->dn, "ibm,mmio-atsd",
+ for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
  i, &mmio_atsd); i++)
- phb->npu.mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
+ npu->mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
 
- pr_info("NPU%lld: Found %d MMIO ATSD registers", phb->opal_id, i);
- phb->npu.mmio_atsd_count = i;
- phb->npu.mmio_atsd_usage = 0;
+ pr_info("NPU%d: Found %d MMIO ATSD registers", hose->global_number, i);
+ npu->mmio_atsd_count = i;
+ npu->mmio_atsd_usage = 0;
  npu_index++;
- if (WARN_ON(npu_index >= NV_MAX_NPUS))
- return -ENOSPC;
+ if (WARN_ON(npu_index >= NV_MAX_NPUS)) {
+ ret = -ENOSPC;
+ goto fail_exit;
+ }
  max_npu2_index = npu_index;
- phb->npu.index = npu_index;
+ npu->index = npu_index;
+ hose->npu = npu;
 
  return 0;
+
+fail_exit:
+ for (i = 0; i < npu->mmio_atsd_count; ++i)
+ iounmap(npu->mmio_atsd_regs[i]);
+
+ kfree(npu);
+
+ return ret;
 }
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index abe5245edaa2..690a41ffa693 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1280,7 +1280,7 @@ static void pnv_pci_ioda_setup_PEs(void)
  pnv_ioda_reserve_pe(phb, 0);
  pnv_ioda_setup_npu_PEs(hose->bus);
  if (phb->model == PNV_PHB_MODEL_NPU2)
- pnv_npu2_init(phb);
+ WARN_ON_ONCE(pnv_npu2_init(phb));
  }
  if (phb->type == PNV_PHB_NPU_OCAPI) {
  bus = hose->bus;
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index eada4b6068cb..6225f906dc46 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -8,9 +8,6 @@
 
 struct pci_dn;
 
-/* Maximum possible number of ATSD MMIO registers per NPU */
-#define NV_NMMU_ATSD_REGS 8
-
 enum pnv_phb_type {
  PNV_PHB_IODA1 = 0,
  PNV_PHB_IODA2 = 1,
@@ -181,22 +178,10 @@ struct pnv_phb {
  unsigned int diag_data_size;
  u8 *diag_data;
 
- /* Nvlink2 data */
- struct npu {
- int index;
- __be64 *mmio_atsd_regs[NV_NMMU_ATSD_REGS];
- unsigned int mmio_atsd_count;
-
- /* Bitmask for MMIO register usage */
- unsigned long mmio_atsd_usage;
-
- /* Do we need to explicitly flush the nest mmu? */
- bool nmmu_flush;
- } npu;
-
 #ifdef CONFIG_CXL_BASE
  struct cxl_afu *cxl_afu;
 #endif
+
  int p2p_target_count;
 };
 
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 07/12] powerpc/powernv/npu: Move OPAL calls away from context manipulation

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

When introduced, the NPU context init/destroy helpers called OPAL which
enabled/disabled PID (a userspace memory context ID) filtering in an NPU
per a GPU; this was a requirement for P9 DD1.0. However newer chip
revision added a PID wildcard support so there is no more need to
call OPAL every time a new context is initialized. Also, since the PID
wildcard support was added, skiboot does not clear wildcard entries
in the NPU so these remain in the hardware till the system reboot.

This moves LPID and wildcard programming to the PE setup code which
executes once during the booting process so NPU2 context init/destroy
won't need to do additional configuration.

This replaces the check for FW_FEATURE_OPAL with a check for npu!=NULL as
this is the way to tell if the NPU support is present and configured.

This moves pnv_npu2_init() declaration as pseries should be able to use it.
This keeps pnv_npu2_map_lpar() in powernv as pseries is not allowed to
call that. This exports pnv_npu2_map_lpar_dev() as following patches
will use it from the VFIO driver.

While at it, replace redundant list_for_each_entry_safe() with
a simpler list_for_each_entry().

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 0e759bd75285e96fbb4013d1303b08fdb8ba58e1)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/include/asm/pci.h            |   3 +
 arch/powerpc/platforms/powernv/npu-dma.c  | 111 ++++++++++++----------
 arch/powerpc/platforms/powernv/pci-ioda.c |  15 ++-
 arch/powerpc/platforms/powernv/pci.h      |   2 +-
 4 files changed, 77 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 8dc32eacc97c..7efbabb1f465 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -144,5 +144,8 @@ extern void pcibios_scan_phb(struct pci_controller *hose);
 
 extern struct pci_dev *pnv_pci_get_gpu_dev(struct pci_dev *npdev);
 extern struct pci_dev *pnv_pci_get_npu_dev(struct pci_dev *gpdev, int index);
+extern int pnv_npu2_init(struct pci_controller *hose);
+extern int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, unsigned int lparid,
+ unsigned long msr);
 
 #endif /* __ASM_POWERPC_PCI_H */
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index a1d5e4905f6e..53713ff439a9 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -592,6 +592,9 @@ static void acquire_atsd_reg(struct npu_context *npu_context,
  continue;
 
  npu = pci_bus_to_host(npdev->bus)->npu;
+ if (!npu)
+ continue;
+
  mmio_atsd_reg[i].npu = npu;
  mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
  while (mmio_atsd_reg[i].reg < 0) {
@@ -747,7 +750,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
  u32 nvlink_index;
  struct device_node *nvlink_dn;
  struct mm_struct *mm = current->mm;
- struct pnv_phb *nphb;
  struct npu *npu;
  struct npu_context *npu_context;
  struct pci_controller *hose;
@@ -758,13 +760,14 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
  */
  struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
 
- if (!firmware_has_feature(FW_FEATURE_OPAL))
- return ERR_PTR(-ENODEV);
-
  if (!npdev)
  /* No nvlink associated with this GPU device */
  return ERR_PTR(-ENODEV);
 
+ /* We only support DR/PR/HV in pnv_npu2_map_lpar_dev() */
+ if (flags & ~(MSR_DR | MSR_PR | MSR_HV))
+ return ERR_PTR(-EINVAL);
+
  nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
  if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
  &nvlink_index)))
@@ -779,20 +782,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
  }
 
  hose = pci_bus_to_host(npdev->bus);
- nphb = hose->private_data;
  npu = hose->npu;
-
- /*
- * Setup the NPU context table for a particular GPU. These need to be
- * per-GPU as we need the tables to filter ATSDs when there are no
- * active contexts on a particular GPU. It is safe for these to be
- * called concurrently with destroy as the OPAL call takes appropriate
- * locks and refcounts on init/destroy.
- */
- rc = opal_npu_init_context(nphb->opal_id, mm->context.id, flags,
- PCI_DEVID(gpdev->bus->number, gpdev->devfn));
- if (rc < 0)
- return ERR_PTR(-ENOSPC);
+ if (!npu)
+ return ERR_PTR(-ENODEV);
 
  /*
  * We store the npu pci device so we can more easily get at the
@@ -804,9 +796,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
  if (npu_context->release_cb != cb ||
  npu_context->priv != priv) {
  spin_unlock(&npu_context_lock);
- opal_npu_destroy_context(nphb->opal_id, mm->context.id,
- PCI_DEVID(gpdev->bus->number,
- gpdev->devfn));
  return ERR_PTR(-EINVAL);
  }
 
@@ -832,9 +821,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
 
  if (rc) {
  kfree(npu_context);
- opal_npu_destroy_context(nphb->opal_id, mm->context.id,
- PCI_DEVID(gpdev->bus->number,
- gpdev->devfn));
  return ERR_PTR(rc);
  }
 
@@ -887,7 +873,6 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
  struct pci_dev *gpdev)
 {
  int removed;
- struct pnv_phb *nphb;
  struct npu *npu;
  struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
  struct device_node *nvlink_dn;
@@ -897,19 +882,15 @@ void pnv_npu2_destroy_context(struct npu_context *npu_context,
  if (WARN_ON(!npdev))
  return;
 
- if (!firmware_has_feature(FW_FEATURE_OPAL))
- return;
-
  hose = pci_bus_to_host(npdev->bus);
- nphb = hose->private_data;
  npu = hose->npu;
+ if (!npu)
+ return;
  nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
  if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
  &nvlink_index)))
  return;
  WRITE_ONCE(npu_context->npdev[npu->index][nvlink_index], NULL);
- opal_npu_destroy_context(nphb->opal_id, npu_context->mm->context.id,
- PCI_DEVID(gpdev->bus->number, gpdev->devfn));
  spin_lock(&npu_context_lock);
  removed = kref_put(&npu_context->kref, pnv_npu2_release_context);
  spin_unlock(&npu_context_lock);
@@ -941,9 +922,6 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
  /* mmap_sem should be held so the struct_mm must be present */
  struct mm_struct *mm = context->mm;
 
- if (!firmware_has_feature(FW_FEATURE_OPAL))
- return -ENODEV;
-
  WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
 
  for (i = 0; i < count; i++) {
@@ -972,15 +950,11 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
 }
 EXPORT_SYMBOL(pnv_npu2_handle_fault);
 
-int pnv_npu2_init(struct pnv_phb *phb)
+int pnv_npu2_init(struct pci_controller *hose)
 {
  unsigned int i;
  u64 mmio_atsd;
- struct device_node *dn;
- struct pci_dev *gpdev;
  static int npu_index;
- uint64_t rc = 0;
- struct pci_controller *hose = phb->hose;
  struct npu *npu;
  int ret;
 
@@ -989,18 +963,6 @@ int pnv_npu2_init(struct pnv_phb *phb)
  return -ENOMEM;
 
  npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
- for_each_child_of_node(phb->hose->dn, dn) {
- gpdev = pnv_pci_get_gpu_dev(get_pci_dev(dn));
- if (gpdev) {
- rc = opal_npu_map_lpar(phb->opal_id,
- PCI_DEVID(gpdev->bus->number, gpdev->devfn),
- 0, 0);
- if (rc)
- dev_err(&gpdev->dev,
- "Error %lld mapping device to LPAR\n",
- rc);
- }
- }
 
  for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
  i, &mmio_atsd); i++)
@@ -1028,3 +990,52 @@ int pnv_npu2_init(struct pnv_phb *phb)
 
  return ret;
 }
+
+int pnv_npu2_map_lpar_dev(struct pci_dev *gpdev, unsigned int lparid,
+ unsigned long msr)
+{
+ int ret;
+ struct pci_dev *npdev = pnv_pci_get_npu_dev(gpdev, 0);
+ struct pci_controller *hose;
+ struct pnv_phb *nphb;
+
+ if (!npdev)
+ return -ENODEV;
+
+ hose = pci_bus_to_host(npdev->bus);
+ nphb = hose->private_data;
+
+ dev_dbg(&gpdev->dev, "Map LPAR opalid=%llu lparid=%u\n",
+ nphb->opal_id, lparid);
+ /*
+ * Currently we only support radix and non-zero LPCR only makes sense
+ * for hash tables so skiboot expects the LPCR parameter to be a zero.
+ */
+ ret = opal_npu_map_lpar(nphb->opal_id,
+ PCI_DEVID(gpdev->bus->number, gpdev->devfn), lparid,
+ 0 /* LPCR bits */);
+ if (ret) {
+ dev_err(&gpdev->dev, "Error %d mapping device to LPAR\n", ret);
+ return ret;
+ }
+
+ dev_dbg(&gpdev->dev, "init context opalid=%llu msr=%lx\n",
+ nphb->opal_id, msr);
+ ret = opal_npu_init_context(nphb->opal_id, 0/*__unused*/, msr,
+ PCI_DEVID(gpdev->bus->number, gpdev->devfn));
+ if (ret < 0)
+ dev_err(&gpdev->dev, "Failed to init context: %d\n", ret);
+ else
+ ret = 0;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_npu2_map_lpar_dev);
+
+void pnv_npu2_map_lpar(struct pnv_ioda_pe *gpe, unsigned long msr)
+{
+ struct pci_dev *gpdev;
+
+ list_for_each_entry(gpdev, &gpe->pbus->devices, bus_list)
+ pnv_npu2_map_lpar_dev(gpdev, 0, msr);
+}
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 690a41ffa693..c162253a405b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1268,19 +1268,20 @@ static void pnv_ioda_setup_npu_PEs(struct pci_bus *bus)
 
 static void pnv_pci_ioda_setup_PEs(void)
 {
- struct pci_controller *hose, *tmp;
+ struct pci_controller *hose;
  struct pnv_phb *phb;
  struct pci_bus *bus;
  struct pci_dev *pdev;
+ struct pnv_ioda_pe *pe;
 
- list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+ list_for_each_entry(hose, &hose_list, list_node) {
  phb = hose->private_data;
  if (phb->type == PNV_PHB_NPU_NVLINK) {
  /* PE#0 is needed for error reporting */
  pnv_ioda_reserve_pe(phb, 0);
  pnv_ioda_setup_npu_PEs(hose->bus);
  if (phb->model == PNV_PHB_MODEL_NPU2)
- WARN_ON_ONCE(pnv_npu2_init(phb));
+ WARN_ON_ONCE(pnv_npu2_init(hose));
  }
  if (phb->type == PNV_PHB_NPU_OCAPI) {
  bus = hose->bus;
@@ -1288,6 +1289,14 @@ static void pnv_pci_ioda_setup_PEs(void)
  pnv_ioda_setup_dev_PE(pdev);
  }
  }
+ list_for_each_entry(hose, &hose_list, list_node) {
+ phb = hose->private_data;
+ if (phb->type != PNV_PHB_IODA2)
+ continue;
+
+ list_for_each_entry(pe, &phb->ioda.pe_list, list)
+ pnv_npu2_map_lpar(pe, MSR_DR | MSR_PR | MSR_HV);
+ }
 }
 
 #ifdef CONFIG_PCI_IOV
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 6225f906dc46..3b2a43f64ab0 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -214,6 +214,7 @@ extern void pnv_pci_init_ioda_hub(struct device_node *np);
 extern void pnv_pci_init_ioda2_phb(struct device_node *np);
 extern void pnv_pci_init_npu_phb(struct device_node *np);
 extern void pnv_pci_init_npu2_opencapi_phb(struct device_node *np);
+extern void pnv_npu2_map_lpar(struct pnv_ioda_pe *gpe, unsigned long msr);
 extern void pnv_pci_reset_secondary_bus(struct pci_dev *dev);
 extern int pnv_eeh_phb_reset(struct pci_controller *hose, int option);
 
@@ -245,7 +246,6 @@ extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
 extern long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num);
 extern void pnv_npu_take_ownership(struct pnv_ioda_pe *npe);
 extern void pnv_npu_release_ownership(struct pnv_ioda_pe *npe);
-extern int pnv_npu2_init(struct pnv_phb *phb);
 
 /* cxl functions */
 extern bool pnv_cxl_enable_device_hook(struct pci_dev *dev);
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 08/12] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

We might have memory@ nodes with "linux,usable-memory" set to zero
(for example, to replicate powernv's behaviour for GPU coherent memory)
which means that the memory needs an extra initialization but since
it can be used afterwards, the pseries platform will try mapping it
for DMA so the DMA window needs to cover those memory regions too;
if the window cannot cover new memory regions, the memory onlining fails.

This walks through the memory nodes to find the highest RAM address to
let a huge DMA window cover that too in case this memory gets onlined
later.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 68c0449ea16d775e762b532afddb4d6a5f161877)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/platforms/pseries/iommu.c | 33 +++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 69921f72e2da..fcb8e7f5736e 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -964,6 +964,37 @@ struct failed_ddw_pdn {
 
 static LIST_HEAD(failed_ddw_pdn_list);
 
+static phys_addr_t ddw_memory_hotplug_max(void)
+{
+ phys_addr_t max_addr = memory_hotplug_max();
+ struct device_node *memory;
+
+ for_each_node_by_type(memory, "memory") {
+ unsigned long start, size;
+ int ranges, n_mem_addr_cells, n_mem_size_cells, len;
+ const __be32 *memcell_buf;
+
+ memcell_buf = of_get_property(memory, "reg", &len);
+ if (!memcell_buf || len <= 0)
+ continue;
+
+ n_mem_addr_cells = of_n_addr_cells(memory);
+ n_mem_size_cells = of_n_size_cells(memory);
+
+ /* ranges in cell */
+ ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
+
+ start = of_read_number(memcell_buf, n_mem_addr_cells);
+ memcell_buf += n_mem_addr_cells;
+ size = of_read_number(memcell_buf, n_mem_size_cells);
+ memcell_buf += n_mem_size_cells;
+
+ max_addr = max_t(phys_addr_t, max_addr, start + size);
+ }
+
+ return max_addr;
+}
+
 /*
  * If the PE supports dynamic dma windows, and there is space for a table
  * that can map all pages in a linear offset, then setup such a table,
@@ -1053,7 +1084,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
  }
  /* verify the window * number of ptes will map the partition */
  /* check largest block * page size > max memory hotplug addr */
- max_addr = memory_hotplug_max();
+ max_addr = ddw_memory_hotplug_max();
  if (query.largest_available_block < (max_addr >> page_shift)) {
  dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
   "%llu-sized pages\n", max_addr,  query.largest_available_block,
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 09/12] powerpc/pseries/npu: Enable platform support

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

We already changed NPU API for GPUs to not to call OPAL and the remaining
bit is initializing NPU structures.

This searches for POWER9 NVLinks attached to any device on a PHB and
initializes an NPU structure if any found.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(backported from commit 3be2df00e299821ad255498ac4411906a8d59cfa)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/platforms/pseries/pci.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
index 09eba5a9929a..6a0e1dcd4e79 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -29,6 +29,7 @@
 #include <asm/pci-bridge.h>
 #include <asm/prom.h>
 #include <asm/ppc-pci.h>
+#include <asm/pci.h>
 #include "pseries.h"
 
 #if 0
@@ -73,9 +74,31 @@ static void __init pSeries_request_regions(void)
 
 void __init pSeries_final_fixup(void)
 {
+ struct pci_controller *hose;
+
  pSeries_request_regions();
 
  eeh_addr_cache_build();
+
+ list_for_each_entry(hose, &hose_list, list_node) {
+ struct device_node *dn = hose->dn, *nvdn;
+
+ while (1) {
+ dn = of_find_all_nodes(dn);
+ if (!dn)
+ break;
+ nvdn = of_parse_phandle(dn, "ibm,nvlink", 0);
+ if (!nvdn)
+ continue;
+ if (!of_device_is_compatible(nvdn, "ibm,npu-link"))
+ continue;
+ if (!of_device_is_compatible(nvdn->parent,
+ "ibm,power9-npu"))
+ continue;
+ WARN_ON_ONCE(pnv_npu2_init(hose));
+ break;
+ }
+ }
 }
 
 /*
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 10/12] powerpc/pseries: Remove IOMMU API support for non-LPAR systems

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

The pci_dma_bus_setup_pSeries and pci_dma_dev_setup_pSeries hooks are
registered for the pseries platform which does not have FW_FEATURE_LPAR;
these would be pre-powernv platforms which we never supported PCI pass
through for anyway so remove it.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Reviewed-by: David Gibson <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit c409c6316166993163e29312aeaaf1c0c300a04a)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/platforms/pseries/iommu.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index fcb8e7f5736e..7f3790d3ee58 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -645,7 +645,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
  iommu_table_setparms(pci->phb, dn, tbl);
  tbl->it_ops = &iommu_table_pseries_ops;
  iommu_init_table(tbl, pci->phb->node);
- iommu_register_group(pci->table_group, pci_domain_nr(bus), 0);
 
  /* Divide the rest (1.75GB) among the children */
  pci->phb->dma_window_size = 0x80000000ul;
@@ -756,10 +755,7 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
  iommu_table_setparms(phb, dn, tbl);
  tbl->it_ops = &iommu_table_pseries_ops;
  iommu_init_table(tbl, phb->node);
- iommu_register_group(PCI_DN(dn)->table_group,
- pci_domain_nr(phb->bus), 0);
  set_iommu_table_base(&dev->dev, tbl);
- iommu_add_device(&dev->dev);
  return;
  }
 
@@ -770,11 +766,10 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
  while (dn && PCI_DN(dn) && PCI_DN(dn)->table_group == NULL)
  dn = dn->parent;
 
- if (dn && PCI_DN(dn)) {
+ if (dn && PCI_DN(dn))
  set_iommu_table_base(&dev->dev,
  PCI_DN(dn)->table_group->tables[0]);
- iommu_add_device(&dev->dev);
- } else
+ else
  printk(KERN_WARNING "iommu: Device %s has no iommu table\n",
        pci_name(dev));
 }
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 11/12] powerpc/powernv/npu: Check mmio_atsd array bounds when populating

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

A broken device tree might contain more than 8 values and introduce hard
to debug memory corruption bug. This adds the boundary check.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 135ef954051b102870a8d47a8eb822af1f1b1ec1)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 53713ff439a9..0440d0c01142 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -964,8 +964,9 @@ int pnv_npu2_init(struct pci_controller *hose)
 
  npu->nmmu_flush = of_property_read_bool(hose->dn, "ibm,nmmu-flush");
 
- for (i = 0; !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
- i, &mmio_atsd); i++)
+ for (i = 0; i < ARRAY_SIZE(npu->mmio_atsd_regs) &&
+ !of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
+ i, &mmio_atsd); i++)
  npu->mmio_atsd_regs[i] = ioremap(mmio_atsd, 32);
 
  pr_info("NPU%d: Found %d MMIO ATSD registers", hose->global_number, i);
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Bionic][PATCH 12/12] powerpc/powernv/npu: Fault user page into the hypervisor's pagetable

Jose Ricardo Ziviani-2
In reply to this post by Jose Ricardo Ziviani-2
From: Alexey Kardashevskiy <[hidden email]>

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989

When a page fault happens in a GPU, the GPU signals the OS and the GPU
driver calls the fault handler which populated a page table; this allows
the GPU to complete an ATS request.

On the bare metal get_user_pages() is enough as it adds a pte to
the kernel page table but under KVM the partition scope tree does not get
updated so ATS will still fail.

This reads a byte from an effective address which causes HV storage
interrupt and KVM updates the partition scope tree.

Signed-off-by: Alexey Kardashevskiy <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 58629c0dc34904d135af944d120eb23165ec3b61)
Signed-off-by: Jose Ricardo Ziviani <[hidden email]>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 0440d0c01142..362e31c99f5c 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -918,6 +918,8 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
  u64 rc = 0, result = 0;
  int i, is_write;
  struct page *page[1];
+ const char __user *u;
+ char c;
 
  /* mmap_sem should be held so the struct_mm must be present */
  struct mm_struct *mm = context->mm;
@@ -930,18 +932,17 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
  is_write ? FOLL_WRITE : 0,
  page, NULL, NULL);
 
- /*
- * To support virtualised environments we will have to do an
- * access to the page to ensure it gets faulted into the
- * hypervisor. For the moment virtualisation is not supported in
- * other areas so leave the access out.
- */
  if (rc != 1) {
  status[i] = rc;
  result = -EFAULT;
  continue;
  }
 
+ /* Make sure partition scoped tree gets a pte */
+ u = page_address(page[0]);
+ if (__get_user(c, u))
+ result = -EFAULT;
+
  status[i] = 0;
  put_page(page[0]);
  }
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team