[SRU][Bionic][PATCH 0/2] Fixes for LP:1773964

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][Bionic][PATCH 0/2] Fixes for LP:1773964

Joseph Salisbury-3
BugLink: http://bugs.launchpad.net/bugs/1773964

== SRU Justification ==
IBM is requesting these two commits in powerpc to fix an error they are
seeing:
IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n

The first commit reduces some existing high latency delays.  The second
commit fixes the fact that the OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware.

== Fixes ==
34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops")

== Regression Potential ==
Low.  The first commit is specific to powerpc.  The second commit is, commit 628daa8d5abf
and it been cc'd to upstream stable to fix a regression, so it has had
additional upstream review.

== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.


Nicholas Piggin (2):
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry
    loops
  rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

 arch/powerpc/include/asm/opal.h           |  3 +++
 arch/powerpc/platforms/powernv/opal-rtc.c |  8 ++++---
 drivers/rtc/rtc-opal.c                    | 37 +++++++++++++++++++------------
 3 files changed, 31 insertions(+), 17 deletions(-)

--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Bionic][PATCH 1/2] powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops

Joseph Salisbury-3
From: Nicholas Piggin <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1773964

This is the start of an effort to tidy up and standardise all the
delays. Existing loops have a range of delay/sleep periods from 1ms
to 20ms, and some have no delay. They all loop forever except rtc,
which times out after 10 retries, and that uses 10ms delays. So use
10ms as our standard delay. The OPAL maintainer agrees 10ms is a
reasonable starting point.

The idea is to use the same recipe everywhere, once this is proven to
work then it will be documented as an OPAL API standard. Then both
firmware and OS can agree, and if a particular call needs something
else, then that can be documented with reasoning.

This is not the end-all of this effort, it's just a relatively easy
change that fixes some existing high latency delays. There should be
provision for standardising timeouts and/or interruptible loops where
possible, so non-fatal firmware errors don't cause hangs.

Signed-off-by: Nicholas Piggin <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 34dd25de9fe3f60bfdb31b473bf04b28262d0896)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 arch/powerpc/include/asm/opal.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index dde6008..c3f0ee8 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -21,6 +21,9 @@
 /* We calculate number of sg entries based on PAGE_SIZE */
 #define SG_ENTRIES_PER_NODE ((PAGE_SIZE - 16) / sizeof(struct opal_sg_entry))
 
+/* Default time to sleep or delay between OPAL_BUSY/OPAL_BUSY_EVENT loops */
+#define OPAL_BUSY_DELAY_MS 10
+
 /* /sys/firmware/opal */
 extern struct kobject *opal_kobj;
 
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][Bionic][PATCH 2/2] rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

Joseph Salisbury-3
In reply to this post by Joseph Salisbury-3
From: Nicholas Piggin <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1773964

The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware, which causes large scheduling
latencies, up to 50 seconds have been observed here when RTC stops
responding (BMC reboot can do it).

Fix this by converting it to the standard form OPAL_BUSY loop that
sleeps.

Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Cc: [hidden email] # v3.2+
Signed-off-by: Nicholas Piggin <[hidden email]>
Acked-by: Alexandre Belloni <[hidden email]>
Signed-off-by: Michael Ellerman <[hidden email]>
(cherry picked from commit 682e6b4da5cbe8e9a53f979a58c2a9d7dc997175)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 arch/powerpc/platforms/powernv/opal-rtc.c |  8 ++++---
 drivers/rtc/rtc-opal.c                    | 37 +++++++++++++++++++------------
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-rtc.c b/arch/powerpc/platforms/powernv/opal-rtc.c
index f886886..aa2a513 100644
--- a/arch/powerpc/platforms/powernv/opal-rtc.c
+++ b/arch/powerpc/platforms/powernv/opal-rtc.c
@@ -48,10 +48,12 @@ unsigned long __init opal_get_boot_time(void)
 
  while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
  rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
- if (rc == OPAL_BUSY_EVENT)
+ if (rc == OPAL_BUSY_EVENT) {
+ mdelay(OPAL_BUSY_DELAY_MS);
  opal_poll_events(NULL);
- else if (rc == OPAL_BUSY)
- mdelay(10);
+ } else if (rc == OPAL_BUSY) {
+ mdelay(OPAL_BUSY_DELAY_MS);
+ }
  }
  if (rc != OPAL_SUCCESS)
  return 0;
diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
index 304e891..60f2250 100644
--- a/drivers/rtc/rtc-opal.c
+++ b/drivers/rtc/rtc-opal.c
@@ -57,7 +57,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms)
 
 static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
 {
- long rc = OPAL_BUSY;
+ s64 rc = OPAL_BUSY;
  int retries = 10;
  u32 y_m_d;
  u64 h_m_s_ms;
@@ -66,13 +66,17 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
 
  while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
  rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
- if (rc == OPAL_BUSY_EVENT)
+ if (rc == OPAL_BUSY_EVENT) {
+ msleep(OPAL_BUSY_DELAY_MS);
  opal_poll_events(NULL);
- else if (retries-- && (rc == OPAL_HARDWARE
-       || rc == OPAL_INTERNAL_ERROR))
- msleep(10);
- else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
- break;
+ } else if (rc == OPAL_BUSY) {
+ msleep(OPAL_BUSY_DELAY_MS);
+ } else if (rc == OPAL_HARDWARE || rc == OPAL_INTERNAL_ERROR) {
+ if (retries--) {
+ msleep(10); /* Wait 10ms before retry */
+ rc = OPAL_BUSY; /* go around again */
+ }
+ }
  }
 
  if (rc != OPAL_SUCCESS)
@@ -87,21 +91,26 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
 
 static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm)
 {
- long rc = OPAL_BUSY;
+ s64 rc = OPAL_BUSY;
  int retries = 10;
  u32 y_m_d = 0;
  u64 h_m_s_ms = 0;
 
  tm_to_opal(tm, &y_m_d, &h_m_s_ms);
+
  while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
  rc = opal_rtc_write(y_m_d, h_m_s_ms);
- if (rc == OPAL_BUSY_EVENT)
+ if (rc == OPAL_BUSY_EVENT) {
+ msleep(OPAL_BUSY_DELAY_MS);
  opal_poll_events(NULL);
- else if (retries-- && (rc == OPAL_HARDWARE
-       || rc == OPAL_INTERNAL_ERROR))
- msleep(10);
- else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
- break;
+ } else if (rc == OPAL_BUSY) {
+ msleep(OPAL_BUSY_DELAY_MS);
+ } else if (rc == OPAL_HARDWARE || rc == OPAL_INTERNAL_ERROR) {
+ if (retries--) {
+ msleep(10); /* Wait 10ms before retry */
+ rc = OPAL_BUSY; /* go around again */
+ }
+ }
  }
 
  return rc == OPAL_SUCCESS ? 0 : -EIO;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Bionic][PATCH 0/2] Fixes for LP:1773964

Stefan Bader-2
In reply to this post by Joseph Salisbury-3
On 13.06.2018 17:10, Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1773964
>
> == SRU Justification ==
> IBM is requesting these two commits in powerpc to fix an error they are
> seeing:
> IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
> CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n
>
> The first commit reduces some existing high latency delays.  The second
> commit fixes the fact that the OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
> OPAL_BUSY_EVENT from firmware.
>
> == Fixes ==
> 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
> 682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops")
>
> == Regression Potential ==
> Low.  The first commit is specific to powerpc.  The second commit is, commit 628daa8d5abf
> and it been cc'd to upstream stable to fix a regression, so it has had
> additional upstream review.
>
> == Test Case ==
> A test kernel was built with these patches and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
>
> Nicholas Piggin (2):
>   powerpc/powernv: define a standard delay for OPAL_BUSY type retry
>     loops
>   rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops
>
>  arch/powerpc/include/asm/opal.h           |  3 +++
>  arch/powerpc/platforms/powernv/opal-rtc.c |  8 ++++---
>  drivers/rtc/rtc-opal.c                    | 37 +++++++++++++++++++------------
>  3 files changed, 31 insertions(+), 17 deletions(-)
>
Acked-by: Stefan Bader <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

cmnt: [SRU][Bionic][PATCH 2/2] rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

Khaled Elmously
In reply to this post by Joseph Salisbury-3
On 2018-06-13 11:10:08 , Joseph Salisbury wrote:

> From: Nicholas Piggin <[hidden email]>
>
> BugLink: http://bugs.launchpad.net/bugs/1773964
>
> The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
> OPAL_BUSY_EVENT from firmware, which causes large scheduling
> latencies, up to 50 seconds have been observed here when RTC stops
> responding (BMC reboot can do it).
>
> Fix this by converting it to the standard form OPAL_BUSY loop that
> sleeps.
>
> Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
> Cc: [hidden email] # v3.2+
> Signed-off-by: Nicholas Piggin <[hidden email]>
> Acked-by: Alexandre Belloni <[hidden email]>
> Signed-off-by: Michael Ellerman <[hidden email]>
> (cherry picked from commit 682e6b4da5cbe8e9a53f979a58c2a9d7dc997175)
> Signed-off-by: Joseph Salisbury <[hidden email]>
> ---
>  arch/powerpc/platforms/powernv/opal-rtc.c |  8 ++++---
>  drivers/rtc/rtc-opal.c                    | 37 +++++++++++++++++++------------
>  2 files changed, 28 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-rtc.c b/arch/powerpc/platforms/powernv/opal-rtc.c
> index f886886..aa2a513 100644
> --- a/arch/powerpc/platforms/powernv/opal-rtc.c
> +++ b/arch/powerpc/platforms/powernv/opal-rtc.c
> @@ -48,10 +48,12 @@ unsigned long __init opal_get_boot_time(void)
>  
>   while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>   rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
> - if (rc == OPAL_BUSY_EVENT)
> + if (rc == OPAL_BUSY_EVENT) {
> + mdelay(OPAL_BUSY_DELAY_MS);
>   opal_poll_events(NULL);
> - else if (rc == OPAL_BUSY)
> - mdelay(10);
> + } else if (rc == OPAL_BUSY) {
> + mdelay(OPAL_BUSY_DELAY_MS);
> + }
>   }
>   if (rc != OPAL_SUCCESS)
>   return 0;
> diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
> index 304e891..60f2250 100644
> --- a/drivers/rtc/rtc-opal.c
> +++ b/drivers/rtc/rtc-opal.c
> @@ -57,7 +57,7 @@ static void tm_to_opal(struct rtc_time *tm, u32 *y_m_d, u64 *h_m_s_ms)
>  
>  static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
>  {
> - long rc = OPAL_BUSY;
> + s64 rc = OPAL_BUSY;
>   int retries = 10;
>   u32 y_m_d;
>   u64 h_m_s_ms;
> @@ -66,13 +66,17 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
>  
>   while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>   rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
> - if (rc == OPAL_BUSY_EVENT)
> + if (rc == OPAL_BUSY_EVENT) {
> + msleep(OPAL_BUSY_DELAY_MS);
>   opal_poll_events(NULL);
> - else if (retries-- && (rc == OPAL_HARDWARE
> -       || rc == OPAL_INTERNAL_ERROR))
> - msleep(10);
> - else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
> - break;
> + } else if (rc == OPAL_BUSY) {
> + msleep(OPAL_BUSY_DELAY_MS);
> + } else if (rc == OPAL_HARDWARE || rc == OPAL_INTERNAL_ERROR) {
> + if (retries--) {
> + msleep(10); /* Wait 10ms before retry */

I was expecting this msleep() to use the new OPAL_BUSY_DELAY_MS instead of a hardcoded 10 .. wasn't that the whole point?

> + rc = OPAL_BUSY; /* go around again */
> + }
> + }
>   }
>  
>   if (rc != OPAL_SUCCESS)
> @@ -87,21 +91,26 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
>  
>  static int opal_set_rtc_time(struct device *dev, struct rtc_time *tm)
>  {
> - long rc = OPAL_BUSY;
> + s64 rc = OPAL_BUSY;
>   int retries = 10;
>   u32 y_m_d = 0;
>   u64 h_m_s_ms = 0;
>  
>   tm_to_opal(tm, &y_m_d, &h_m_s_ms);
> +
>   while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>   rc = opal_rtc_write(y_m_d, h_m_s_ms);
> - if (rc == OPAL_BUSY_EVENT)
> + if (rc == OPAL_BUSY_EVENT) {
> + msleep(OPAL_BUSY_DELAY_MS);
>   opal_poll_events(NULL);
> - else if (retries-- && (rc == OPAL_HARDWARE
> -       || rc == OPAL_INTERNAL_ERROR))
> - msleep(10);
> - else if (rc != OPAL_BUSY && rc != OPAL_BUSY_EVENT)
> - break;
> + } else if (rc == OPAL_BUSY) {
> + msleep(OPAL_BUSY_DELAY_MS);
> + } else if (rc == OPAL_HARDWARE || rc == OPAL_INTERNAL_ERROR) {
> + if (retries--) {
> + msleep(10); /* Wait 10ms before retry */

Same as above.

> + rc = OPAL_BUSY; /* go around again */
> + }
> + }
>   }
>  
>   return rc == OPAL_SUCCESS ? 0 : -EIO;
> --
> 2.7.4
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [SRU][Bionic][PATCH 0/2] Fixes for LP:1773964

Khaled Elmously
In reply to this post by Joseph Salisbury-3
On 2018-06-13 11:10:06 , Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1773964
>
> == SRU Justification ==
> IBM is requesting these two commits in powerpc to fix an error they are
> seeing:
> IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
> CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n
>
> The first commit reduces some existing high latency delays.  The second
> commit fixes the fact that the OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
> OPAL_BUSY_EVENT from firmware.
>
> == Fixes ==
> 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
> 682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops")
>
> == Regression Potential ==
> Low.  The first commit is specific to powerpc.  The second commit is, commit 628daa8d5abf
> and it been cc'd to upstream stable to fix a regression, so it has had
> additional upstream review.
>
> == Test Case ==
> A test kernel was built with these patches and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
>
> Nicholas Piggin (2):
>   powerpc/powernv: define a standard delay for OPAL_BUSY type retry
>     loops
>   rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops
>
>  arch/powerpc/include/asm/opal.h           |  3 +++
>  arch/powerpc/platforms/powernv/opal-rtc.c |  8 ++++---
>  drivers/rtc/rtc-opal.c                    | 37 +++++++++++++++++++------------
>  3 files changed, 31 insertions(+), 17 deletions(-)
>

Minor comment on patch #2 - but I'm OK with it as-is.

Acked-by: Khalid Elmously <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED: [SRU][Bionic][PATCH 0/2] Fixes for LP:1773964

Khaled Elmously
In reply to this post by Joseph Salisbury-3
Applied to Bionic

On 2018-06-13 11:10:06 , Joseph Salisbury wrote:

> BugLink: http://bugs.launchpad.net/bugs/1773964
>
> == SRU Justification ==
> IBM is requesting these two commits in powerpc to fix an error they are
> seeing:
> IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
> CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n
>
> The first commit reduces some existing high latency delays.  The second
> commit fixes the fact that the OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
> OPAL_BUSY_EVENT from firmware.
>
> == Fixes ==
> 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
> 682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops")
>
> == Regression Potential ==
> Low.  The first commit is specific to powerpc.  The second commit is, commit 628daa8d5abf
> and it been cc'd to upstream stable to fix a regression, so it has had
> additional upstream review.
>
> == Test Case ==
> A test kernel was built with these patches and tested by the original bug reporter.
> The bug reporter states the test kernel resolved the bug.
>
>
> Nicholas Piggin (2):
>   powerpc/powernv: define a standard delay for OPAL_BUSY type retry
>     loops
>   rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops
>
>  arch/powerpc/include/asm/opal.h           |  3 +++
>  arch/powerpc/platforms/powernv/opal-rtc.c |  8 ++++---
>  drivers/rtc/rtc-opal.c                    | 37 +++++++++++++++++++------------
>  3 files changed, 31 insertions(+), 17 deletions(-)
>
> --
> 2.7.4
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team