[SRU][Xenial][PATCH 0/2] rfi-flush: Switch to new linear fallback flush (LP #1744173)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][Xenial][PATCH 0/2] rfi-flush: Switch to new linear fallback flush (LP #1744173)

Juerg Haefliger
BugLink: https://bugs.launchpad.net/bugs/1744173

[Impact]
Change flush method from "congruence-first with dependencies" to "linear with no dependencies", which increases flush performance by 8x on P8, and
3x on P9. Measured with null syscall loop, which will have the flush area in the L2 cache.

The flush also becomes simpler and more adaptable to different cache
geometries.

[Test Case]
TBD.

[Regression Potential]
The risk is deemed low since the changes are confined to POWER only and the provided test kernels have been tested by IBM.

Signed-off-by: Juerg Haefliger <[hidden email]>


Michael Ellerman (1):
  UBUNTU: SAUCE: rfi-flush: Make it possible to call setup_rfi_flush()
    again

Nicholas Piggin (1):
  powerpc/64s: Improve RFI L1-D cache flush fallback

 arch/powerpc/include/asm/paca.h      |  3 +-
 arch/powerpc/kernel/asm-offsets.c    |  3 +-
 arch/powerpc/kernel/exceptions-64s.S | 76 +++++++++++++---------------
 arch/powerpc/kernel/setup_64.c       | 35 +++++++------
 4 files changed, 57 insertions(+), 60 deletions(-)

--
2.17.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Xenial][PATCH 1/2] powerpc/64s: Improve RFI L1-D cache flush fallback

Juerg Haefliger
From: Nicholas Piggin <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1744173

The fallback RFI flush is used when firmware does not provide a way
to flush the cache. It's a "displacement flush" that evicts useful
data by displacing it with an uninteresting buffer.

The flush has to take care to work with implementation specific cache
replacment policies, so the recipe has been in flux. The initial
slow but conservative approach is to touch all lines of a congruence
class, with dependencies between each load. It has since been
determined that a linear pattern of loads without dependencies is
sufficient, and is significantly faster.

Measuring the speed of a null syscall with RFI fallback flush enabled
gives the relative improvement:

P8 - 1.83x
P9 - 1.75x

The flush also becomes simpler and more adaptable to different cache
geometries.

Signed-off-by: Nicholas Piggin <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(backported from bdcb1aefc5b3f7d0f1dc8b02673602bca2ff7a4b)
Signed-off-by: Gustavo Walbon <[hidden email]>
Signed-off-by: Juerg Haefliger <[hidden email]>
---
 arch/powerpc/include/asm/paca.h      |  3 +-
 arch/powerpc/kernel/asm-offsets.c    |  3 +-
 arch/powerpc/kernel/exceptions-64s.S | 76 +++++++++++++---------------
 arch/powerpc/kernel/setup_64.c       | 13 +----
 4 files changed, 39 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9e76e27d96c7..08ea3b49cfed 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -208,8 +208,7 @@ struct paca_struct {
  */
  u64 exrfi[13] __aligned(0x80);
  void *rfi_flush_fallback_area;
- u64 l1d_flush_congruence;
- u64 l1d_flush_sets;
+ u64 l1d_flush_size;
 #endif
 };
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 46acc17dfed1..ec3ec682072c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -245,8 +245,7 @@ int main(void)
  DEFINE(PACA_IN_MCE, offsetof(struct paca_struct, in_mce));
  DEFINE(PACA_RFI_FLUSH_FALLBACK_AREA, offsetof(struct paca_struct, rfi_flush_fallback_area));
  DEFINE(PACA_EXRFI, offsetof(struct paca_struct, exrfi));
- DEFINE(PACA_L1D_FLUSH_CONGRUENCE, offsetof(struct paca_struct, l1d_flush_congruence));
- DEFINE(PACA_L1D_FLUSH_SETS, offsetof(struct paca_struct, l1d_flush_sets));
+ DEFINE(PACA_L1D_FLUSH_SIZE, offsetof(struct paca_struct, l1d_flush_size));
 #endif
  DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id));
  DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state));
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 218bff6ea243..088c930f5554 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1608,39 +1608,37 @@ rfi_flush_fallback:
  std r9,PACA_EXRFI+EX_R9(r13)
  std r10,PACA_EXRFI+EX_R10(r13)
  std r11,PACA_EXRFI+EX_R11(r13)
- std r12,PACA_EXRFI+EX_R12(r13)
- std r8,PACA_EXRFI+EX_R13(r13)
  mfctr r9
  ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13)
- ld r11,PACA_L1D_FLUSH_SETS(r13)
- ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13)
- /*
- * The load adresses are at staggered offsets within cachelines,
- * which suits some pipelines better (on others it should not
- * hurt).
- */
- addi r12,r12,8
+ ld r11,PACA_L1D_FLUSH_SIZE(r13)
+ srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */
  mtctr r11
  DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
 
  /* order ld/st prior to dcbt stop all streams with flushing */
  sync
-1: li r8,0
- .rept 8 /* 8-way set associative */
- ldx r11,r10,r8
- add r8,r8,r12
- xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not
- add r8,r8,r11 // Add 0, this creates a dependency on the ldx
- .endr
- addi r10,r10,128 /* 128 byte cache line */
+
+ /*
+ * The load adresses are at staggered offsets within cachelines,
+ * which suits some pipelines better (on others it should not
+ * hurt).
+ */
+1:
+ ld r11,(0x80 + 8)*0(r10)
+ ld r11,(0x80 + 8)*1(r10)
+ ld r11,(0x80 + 8)*2(r10)
+ ld r11,(0x80 + 8)*3(r10)
+ ld r11,(0x80 + 8)*4(r10)
+ ld r11,(0x80 + 8)*5(r10)
+ ld r11,(0x80 + 8)*6(r10)
+ ld r11,(0x80 + 8)*7(r10)
+ addi r10,r10,0x80*8
  bdnz 1b
 
  mtctr r9
  ld r9,PACA_EXRFI+EX_R9(r13)
  ld r10,PACA_EXRFI+EX_R10(r13)
  ld r11,PACA_EXRFI+EX_R11(r13)
- ld r12,PACA_EXRFI+EX_R12(r13)
- ld r8,PACA_EXRFI+EX_R13(r13)
  GET_SCRATCH0(r13);
  rfid
 
@@ -1651,39 +1649,37 @@ hrfi_flush_fallback:
  std r9,PACA_EXRFI+EX_R9(r13)
  std r10,PACA_EXRFI+EX_R10(r13)
  std r11,PACA_EXRFI+EX_R11(r13)
- std r12,PACA_EXRFI+EX_R12(r13)
- std r8,PACA_EXRFI+EX_R13(r13)
  mfctr r9
  ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13)
- ld r11,PACA_L1D_FLUSH_SETS(r13)
- ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13)
- /*
- * The load adresses are at staggered offsets within cachelines,
- * which suits some pipelines better (on others it should not
- * hurt).
- */
- addi r12,r12,8
+ ld r11,PACA_L1D_FLUSH_SIZE(r13)
+ srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */
  mtctr r11
  DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
 
  /* order ld/st prior to dcbt stop all streams with flushing */
  sync
-1: li r8,0
- .rept 8 /* 8-way set associative */
- ldx r11,r10,r8
- add r8,r8,r12
- xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not
- add r8,r8,r11 // Add 0, this creates a dependency on the ldx
- .endr
- addi r10,r10,128 /* 128 byte cache line */
+
+ /*
+ * The load adresses are at staggered offsets within cachelines,
+ * which suits some pipelines better (on others it should not
+ * hurt).
+ */
+1:
+ ld r11,(0x80 + 8)*0(r10)
+ ld r11,(0x80 + 8)*1(r10)
+ ld r11,(0x80 + 8)*2(r10)
+ ld r11,(0x80 + 8)*3(r10)
+ ld r11,(0x80 + 8)*4(r10)
+ ld r11,(0x80 + 8)*5(r10)
+ ld r11,(0x80 + 8)*6(r10)
+ ld r11,(0x80 + 8)*7(r10)
+ addi r10,r10,0x80*8
  bdnz 1b
 
  mtctr r9
  ld r9,PACA_EXRFI+EX_R9(r13)
  ld r10,PACA_EXRFI+EX_R10(r13)
  ld r11,PACA_EXRFI+EX_R11(r13)
- ld r12,PACA_EXRFI+EX_R12(r13)
- ld r8,PACA_EXRFI+EX_R13(r13)
  GET_SCRATCH0(r13);
  hrfid
 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 8a5d92f12d1a..70dfe49868e1 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -982,19 +982,8 @@ static void init_fallback_flush(void)
  memset(l1d_flush_fallback_area, 0, l1d_size * 2);
 
  for_each_possible_cpu(cpu) {
- /*
- * The fallback flush is currently coded for 8-way
- * associativity. Different associativity is possible, but it
- * will be treated as 8-way and may not evict the lines as
- * effectively.
- *
- * 128 byte lines are mandatory.
- */
- u64 c = l1d_size / 8;
-
  paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area;
- paca[cpu].l1d_flush_congruence = c;
- paca[cpu].l1d_flush_sets = c / 128;
+ paca[cpu].l1d_flush_size = l1d_size;
  }
 }
 
--
2.17.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Xenial][PATCH 2/2] UBUNTU: SAUCE: rfi-flush: Make it possible to call setup_rfi_flush() again

Juerg Haefliger
In reply to this post by Juerg Haefliger
From: Michael Ellerman <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1744173

For PowerVM migration we want to be able to call setup_rfi_flush()
again after we've migrated the partition.

To support that we need to check that we're not trying to allocate the
fallback flush area after memblock has gone away. If so we just fail,
we don't support migrating from a patched to an unpatched machine. Or
we do support it, but there will be no RFI flush enabled on the
destination.

Signed-off-by: Michael Ellerman <[hidden email]>
Signed-off-by: Breno Leitao <[hidden email]>
Signed-off-by: Juerg Haefliger <[hidden email]>
---
 arch/powerpc/kernel/setup_64.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 70dfe49868e1..efc6371d62b3 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -961,14 +961,22 @@ void setup_stf_barrier(void)
  stf_barrier_enable(enable);
 }
 
-static void init_fallback_flush(void)
+static bool init_fallback_flush(void)
 {
  u64 l1d_size, limit;
  int cpu;
 
  /* Only allocate the fallback flush area once (at boot time). */
  if (l1d_flush_fallback_area)
- return;
+ return true;
+
+ /*
+ * Once the slab allocator is up it's too late to allocate the fallback
+ * flush area, so return an error. This could happen if we migrated from
+ * a patched machine to an unpatched machine.
+ */
+ if (slab_is_available())
+ return false;
 
  l1d_size = ppc64_caches.dsize;
  limit = min(safe_stack_limit(), ppc64_rma_size);
@@ -985,13 +993,19 @@ static void init_fallback_flush(void)
  paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area;
  paca[cpu].l1d_flush_size = l1d_size;
  }
+
+ return true;
 }
 
 void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 {
  if (types & L1D_FLUSH_FALLBACK) {
- pr_info("rfi-flush: fallback displacement flush available\n");
- init_fallback_flush();
+ if (init_fallback_flush())
+ pr_info("rfi-flush: Using fallback displacement flush\n");
+ else {
+ pr_warn("rfi-flush: Error unable to use fallback displacement flush!\n");
+ types &= ~L1D_FLUSH_FALLBACK;
+ }
  }
 
  if (types & L1D_FLUSH_ORI)
--
2.17.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[Acked] [SRU][Xenial][PATCH 0/2] rfi-flush: Switch to new linear fallback flush (LP #1744173)

Andy Whitcroft-3
In reply to this post by Juerg Haefliger
On Wed, May 30, 2018 at 11:35:37AM +0200, Juerg Haefliger wrote:

> BugLink: https://bugs.launchpad.net/bugs/1744173
>
> [Impact]
> Change flush method from "congruence-first with dependencies" to "linear with no dependencies", which increases flush performance by 8x on P8, and
> 3x on P9. Measured with null syscall loop, which will have the flush area in the L2 cache.
>
> The flush also becomes simpler and more adaptable to different cache
> geometries.
>
> [Test Case]
> TBD.
>
> [Regression Potential]
> The risk is deemed low since the changes are confined to POWER only and the provided test kernels have been tested by IBM.
>
> Signed-off-by: Juerg Haefliger <[hidden email]>
>
>
> Michael Ellerman (1):
>   UBUNTU: SAUCE: rfi-flush: Make it possible to call setup_rfi_flush()
>     again
>
> Nicholas Piggin (1):
>   powerpc/64s: Improve RFI L1-D cache flush fallback
>
>  arch/powerpc/include/asm/paca.h      |  3 +-
>  arch/powerpc/kernel/asm-offsets.c    |  3 +-
>  arch/powerpc/kernel/exceptions-64s.S | 76 +++++++++++++---------------
>  arch/powerpc/kernel/setup_64.c       | 35 +++++++------
>  4 files changed, 57 insertions(+), 60 deletions(-)
>

Essentially h/w specific magic :).  Looks to be changing things as
claimed and testing is good.

Acked-by: Andy Whitcroft <[hidden email]>

-apw

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU][Xenial][PATCH 0/2] rfi-flush: Switch to new linear fallback flush (LP #1744173)

Juerg Haefliger
In reply to this post by Juerg Haefliger
Ping. Can I get a second review, please?

On 05/30/2018 11:35 AM, Juerg Haefliger wrote:

> BugLink: https://bugs.launchpad.net/bugs/1744173
>
> [Impact]
> Change flush method from "congruence-first with dependencies" to "linear with no dependencies", which increases flush performance by 8x on P8, and
> 3x on P9. Measured with null syscall loop, which will have the flush area in the L2 cache.
>
> The flush also becomes simpler and more adaptable to different cache
> geometries.
>
> [Test Case]
> TBD.
>
> [Regression Potential]
> The risk is deemed low since the changes are confined to POWER only and the provided test kernels have been tested by IBM.
>
> Signed-off-by: Juerg Haefliger <[hidden email]>
>
>
> Michael Ellerman (1):
>   UBUNTU: SAUCE: rfi-flush: Make it possible to call setup_rfi_flush()
>     again
>
> Nicholas Piggin (1):
>   powerpc/64s: Improve RFI L1-D cache flush fallback
>
>  arch/powerpc/include/asm/paca.h      |  3 +-
>  arch/powerpc/kernel/asm-offsets.c    |  3 +-
>  arch/powerpc/kernel/exceptions-64s.S | 76 +++++++++++++---------------
>  arch/powerpc/kernel/setup_64.c       | 35 +++++++------
>  4 files changed, 57 insertions(+), 60 deletions(-)
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Xenial][PATCH 0/2] rfi-flush: Switch to new linear fallback flush (LP #1744173)

Khaled Elmously
In reply to this post by Juerg Haefliger
On 2018-05-30 11:35:37 , Juerg Haefliger wrote:

> BugLink: https://bugs.launchpad.net/bugs/1744173
>
> [Impact]
> Change flush method from "congruence-first with dependencies" to "linear with no dependencies", which increases flush performance by 8x on P8, and
> 3x on P9. Measured with null syscall loop, which will have the flush area in the L2 cache.
>
> The flush also becomes simpler and more adaptable to different cache
> geometries.
>
> [Test Case]
> TBD.
>
> [Regression Potential]
> The risk is deemed low since the changes are confined to POWER only and the provided test kernels have been tested by IBM.
>
> Signed-off-by: Juerg Haefliger <[hidden email]>
>
>
> Michael Ellerman (1):
>   UBUNTU: SAUCE: rfi-flush: Make it possible to call setup_rfi_flush()
>     again
>
> Nicholas Piggin (1):
>   powerpc/64s: Improve RFI L1-D cache flush fallback
>
>  arch/powerpc/include/asm/paca.h      |  3 +-
>  arch/powerpc/kernel/asm-offsets.c    |  3 +-
>  arch/powerpc/kernel/exceptions-64s.S | 76 +++++++++++++---------------
>  arch/powerpc/kernel/setup_64.c       | 35 +++++++------
>  4 files changed, 57 insertions(+), 60 deletions(-)
>

Acked-by: Khalid Elmously <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][Xenial][PATCH 0/2] rfi-flush: Switch to new linear fallback flush (LP #1744173)

Juerg Haefliger
In reply to this post by Juerg Haefliger
Applied to xenial/master-next.

...Juerg

On 05/30/2018 11:35 AM, Juerg Haefliger wrote:

> BugLink: https://bugs.launchpad.net/bugs/1744173
>
> [Impact]
> Change flush method from "congruence-first with dependencies" to "linear with no dependencies", which increases flush performance by 8x on P8, and
> 3x on P9. Measured with null syscall loop, which will have the flush area in the L2 cache.
>
> The flush also becomes simpler and more adaptable to different cache
> geometries.
>
> [Test Case]
> TBD.
>
> [Regression Potential]
> The risk is deemed low since the changes are confined to POWER only and the provided test kernels have been tested by IBM.
>
> Signed-off-by: Juerg Haefliger <[hidden email]>
>
>
> Michael Ellerman (1):
>   UBUNTU: SAUCE: rfi-flush: Make it possible to call setup_rfi_flush()
>     again
>
> Nicholas Piggin (1):
>   powerpc/64s: Improve RFI L1-D cache flush fallback
>
>  arch/powerpc/include/asm/paca.h      |  3 +-
>  arch/powerpc/kernel/asm-offsets.c    |  3 +-
>  arch/powerpc/kernel/exceptions-64s.S | 76 +++++++++++++---------------
>  arch/powerpc/kernel/setup_64.c       | 35 +++++++------
>  4 files changed, 57 insertions(+), 60 deletions(-)
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment