[SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

Khaled Elmously
BugLink: https://bugs.launchpad.net/bugs/1832151

"sched/fair: Fix bandwidth timer clock drift condition" enabled CPU bandwidth "slice expiry" behaviour in cpu-local silos (something that should have been in effect all along but was basically completely broken). The undesired side-effect of this expirty being enabled is that threads of highly-threaded non-CPU-bound applications get throttled even when the application isn't using its full quota.

This fix eliminates the problem by removing cpu-local slice expiry altogether. A small pre-requisite patch makes the fix apply nicely.

A derivative 4.15 cloud kernel was tested with this fix and approved by the cloud provider, and I've tested this fix on the master 4.15 with positive results.

More info in launchpad and salesforce.



Dave Chiluk (1):
  sched/fair: Fix low cpu usage with high throttling by removing
    expiration of cpu-local slices

Patrick Bellasi (1):
  sched/fair: Add lsub_positive() and use it consistently

 Documentation/scheduler/sched-bwc.txt | 74 +++++++++++++++++++-----
 kernel/sched/fair.c                   | 83 ++++++---------------------
 kernel/sched/sched.h                  |  4 --
 3 files changed, 79 insertions(+), 82 deletions(-)

--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][PATCH 1/2][Bionic] sched/fair: Add lsub_positive() and use it consistently

Khaled Elmously
From: Patrick Bellasi <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1832151

The following pattern:

   var -= min_t(typeof(var), var, val);

is used multiple times in fair.c.

The existing sub_positive() already captures that pattern, but it also
adds an explicit load-store to properly support lockless observations.
In other cases the pattern above is used to update local, and/or not
concurrently accessed, variables.

Let's add a simpler version of sub_positive(), targeted at local variables
updates, which gives the same readability benefits at calling sites,
without enforcing {READ,WRITE}_ONCE() barriers.

Signed-off-by: Patrick Bellasi <[hidden email]>
Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
Cc: Dietmar Eggemann <[hidden email]>
Cc: Juri Lelli <[hidden email]>
Cc: Linus Torvalds <[hidden email]>
Cc: Morten Rasmussen <[hidden email]>
Cc: Peter Zijlstra <[hidden email]>
Cc: Quentin Perret <[hidden email]>
Cc: Steve Muckle <[hidden email]>
Cc: Suren Baghdasaryan <[hidden email]>
Cc: Thomas Gleixner <[hidden email]>
Cc: Todd Kjos <[hidden email]>
Cc: Vincent Guittot <[hidden email]>
Link: https://lore.kernel.org/lkml/20181031184527.GA3178@...
Signed-off-by: Ingo Molnar <[hidden email]>
(backported picked from commit b5c0ce7bd1848892e2930f481828b6d7750231ed)
[ kmously: The changes to cpu_util_without() aren't applicable ]
Signed-off-by: Khalid Elmously <[hidden email]>
---
 kernel/sched/fair.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09401d5eb471..c3dfd4e28c46 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2743,6 +2743,17 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
  WRITE_ONCE(*ptr, res); \
 } while (0)
 
+/*
+ * Remove and clamp on negative, from a local variable.
+ *
+ * A variant of sub_positive(), which does not use explicit load-store
+ * and is thus optimized for local variable updates.
+ */
+#define lsub_positive(_ptr, _val) do { \
+ typeof(_ptr) ptr = (_ptr); \
+ *ptr -= min_t(typeof(*ptr), *ptr, _val); \
+} while (0)
+
 #ifdef CONFIG_SMP
 /*
  * XXX we want to get rid of these helpers and use the full load resolution.
@@ -4814,7 +4825,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
  cfs_b->distribute_running = 0;
  throttled = !list_empty(&cfs_b->throttled_cfs_rq);
 
- cfs_b->runtime -= min(runtime, cfs_b->runtime);
+ lsub_positive(&cfs_b->runtime, runtime);
  }
 
  /*
@@ -4948,7 +4959,7 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
 
  raw_spin_lock(&cfs_b->lock);
  if (expires == cfs_b->runtime_expires)
- cfs_b->runtime -= min(runtime, cfs_b->runtime);
+ lsub_positive(&cfs_b->runtime, runtime);
  cfs_b->distribute_running = 0;
  raw_spin_unlock(&cfs_b->lock);
 }
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][PATCH 2/2][Bionic] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices

Khaled Elmously
In reply to this post by Khaled Elmously
From: Dave Chiluk <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1832151

It has been observed, that highly-threaded, non-cpu-bound applications
running under cpu.cfs_quota_us constraints can hit a high percentage of
periods throttled while simultaneously not consuming the allocated
amount of quota. This use case is typical of user-interactive non-cpu
bound applications, such as those running in kubernetes or mesos when
run on multiple cpu cores.

This has been root caused to cpu-local run queue being allocated per cpu
bandwidth slices, and then not fully using that slice within the period.
At which point the slice and quota expires. This expiration of unused
slice results in applications not being able to utilize the quota for
which they are allocated.

The non-expiration of per-cpu slices was recently fixed by
'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift
condition")'. Prior to that it appears that this had been broken since
at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some
cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That
added the following conditional which resulted in slices never being
expired.

if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
        /* extend local deadline, drift is bounded above by 2 ticks */
        cfs_rq->runtime_expires += TICK_NSEC;

Because this was broken for nearly 5 years, and has recently been fixed
and is now being noticed by many users running kubernetes
(https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
that the mechanisms around expiring runtime should be removed
altogether.

This allows quota already allocated to per-cpu run-queues to live longer
than the period boundary. This allows threads on runqueues that do not
use much CPU to continue to use their remaining slice over a longer
period of time than cpu.cfs_period_us. However, this helps prevent the
above condition of hitting throttling while also not fully utilizing
your cpu quota.

This theoretically allows a machine to use slightly more than its
allotted quota in some periods. This overflow would be bounded by the
remaining quota left on each per-cpu runqueueu. This is typically no
more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will
change nothing, as they should theoretically fully utilize all of their
quota in each period. For user-interactive tasks as described above this
provides a much better user/application experience as their cpu
utilization will more closely match the amount they requested when they
hit throttling. This means that cpu limits no longer strictly apply per
period for non-cpu bound applications, but that they are still accurate
over longer timeframes.

This greatly improves performance of high-thread-count, non-cpu bound
applications with low cfs_quota_us allocation on high-core-count
machines. In the case of an artificial testcase (10ms/100ms of quota on
80 CPU machine), this commit resulted in almost 30x performance
improvement, while still maintaining correct cpu quota restrictions.
That testcase is available at https://github.com/indeedeng/fibtest.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Signed-off-by: Dave Chiluk <[hidden email]>
Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
Reviewed-by: Phil Auld <[hidden email]>
Reviewed-by: Ben Segall <[hidden email]>
Cc: Ingo Molnar <[hidden email]>
Cc: John Hammond <[hidden email]>
Cc: Jonathan Corbet <[hidden email]>
Cc: Kyle Anderson <[hidden email]>
Cc: Gabriel Munos <[hidden email]>
Cc: Peter Oskolkov <[hidden email]>
Cc: Cong Wang <[hidden email]>
Cc: Brendan Gregg <[hidden email]>
Link: https://lkml.kernel.org/r/1563900266-19734-2-git-send-email-chiluk+linux@...
(backported from commit de53fd7aedb100f03e5d2231cfce0e4993282425)
[ kmously: A few context adjustments. Mostly because the patch expects some
raw_spin_lockirqsave() functions in fair.c but the bionic code still uses raw_spin_lock().
Also 2 lines that needed to be removed from start_cfs_bandwidth() are non-existent in bionic.
Furthermore, needed to adjust for different whitespace/context in the structs in sched.h ]
Signed-off-by: Khalid Elmously <[hidden email]>
---
 Documentation/scheduler/sched-bwc.txt | 74 ++++++++++++++++++++++-----
 kernel/sched/fair.c                   | 70 +++----------------------
 kernel/sched/sched.h                  |  4 --
 3 files changed, 67 insertions(+), 81 deletions(-)

diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt
index f6b1873f68ab..3397995168bb 100644
--- a/Documentation/scheduler/sched-bwc.txt
+++ b/Documentation/scheduler/sched-bwc.txt
@@ -8,15 +8,16 @@ CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension which allows the
 specification of the maximum CPU bandwidth available to a group or hierarchy.
 
 The bandwidth allowed for a group is specified using a quota and period. Within
-each given "period" (microseconds), a group is allowed to consume only up to
-"quota" microseconds of CPU time.  When the CPU bandwidth consumption of a
-group exceeds this limit (for that period), the tasks belonging to its
-hierarchy will be throttled and are not allowed to run again until the next
-period.
-
-A group's unused runtime is globally tracked, being refreshed with quota units
-above at each period boundary.  As threads consume this bandwidth it is
-transferred to cpu-local "silos" on a demand basis.  The amount transferred
+each given "period" (microseconds), a task group is allocated up to "quota"
+microseconds of CPU time. That quota is assigned to per-cpu run queues in
+slices as threads in the cgroup become runnable. Once all quota has been
+assigned any additional requests for quota will result in those threads being
+throttled. Throttled threads will not be able to run again until the next
+period when the quota is replenished.
+
+A group's unassigned quota is globally tracked, being refreshed back to
+cfs_quota units at each period boundary. As threads consume this bandwidth it
+is transferred to cpu-local "silos" on a demand basis. The amount transferred
 within each of these updates is tunable and described as the "slice".
 
 Management
@@ -33,12 +34,12 @@ The default values are:
 
 A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
 bandwidth restriction in place, such a group is described as an unconstrained
-bandwidth group.  This represents the traditional work-conserving behavior for
+bandwidth group. This represents the traditional work-conserving behavior for
 CFS.
 
 Writing any (valid) positive value(s) will enact the specified bandwidth limit.
-The minimum quota allowed for the quota or period is 1ms.  There is also an
-upper bound on the period length of 1s.  Additional restrictions exist when
+The minimum quota allowed for the quota or period is 1ms. There is also an
+upper bound on the period length of 1s. Additional restrictions exist when
 bandwidth limits are used in a hierarchical fashion, these are explained in
 more detail below.
 
@@ -51,8 +52,8 @@ unthrottled if it is in a constrained state.
 System wide settings
 --------------------
 For efficiency run-time is transferred between the global pool and CPU local
-"silos" in a batch fashion.  This greatly reduces global accounting pressure
-on large systems.  The amount transferred each time such an update is required
+"silos" in a batch fashion. This greatly reduces global accounting pressure
+on large systems. The amount transferred each time such an update is required
 is described as the "slice".
 
 This is tunable via procfs:
@@ -90,6 +91,51 @@ There are two ways in which a group may become throttled:
 In case b) above, even though the child may have runtime remaining it will not
 be allowed to until the parent's runtime is refreshed.
 
+CFS Bandwidth Quota Caveats
+---------------------------
+Once a slice is assigned to a cpu it does not expire.  However all but 1ms of
+the slice may be returned to the global pool if all threads on that cpu become
+unrunnable. This is configured at compile time by the min_cfs_rq_runtime
+variable. This is a performance tweak that helps prevent added contention on
+the global lock.
+
+The fact that cpu-local slices do not expire results in some interesting corner
+cases that should be understood.
+
+For cgroup cpu constrained applications that are cpu limited this is a
+relatively moot point because they will naturally consume the entirety of their
+quota as well as the entirety of each cpu-local slice in each period. As a
+result it is expected that nr_periods roughly equal nr_throttled, and that
+cpuacct.usage will increase roughly equal to cfs_quota_us in each period.
+
+For highly-threaded, non-cpu bound applications this non-expiration nuance
+allows applications to briefly burst past their quota limits by the amount of
+unused slice on each cpu that the task group is running on (typically at most
+1ms per cpu or as defined by min_cfs_rq_runtime).  This slight burst only
+applies if quota had been assigned to a cpu and then not fully used or returned
+in previous periods. This burst amount will not be transferred between cores.
+As a result, this mechanism still strictly limits the task group to quota
+average usage, albeit over a longer time window than a single period.  This
+also limits the burst ability to no more than 1ms per cpu.  This provides
+better more predictable user experience for highly threaded applications with
+small quota limits on high core count machines. It also eliminates the
+propensity to throttle these applications while simultanously using less than
+quota amounts of cpu. Another way to say this, is that by allowing the unused
+portion of a slice to remain valid across periods we have decreased the
+possibility of wastefully expiring quota on cpu-local silos that don't need a
+full slice's amount of cpu time.
+
+The interaction between cpu-bound and non-cpu-bound-interactive applications
+should also be considered, especially when single core usage hits 100%. If you
+gave each of these applications half of a cpu-core and they both got scheduled
+on the same CPU it is theoretically possible that the non-cpu bound application
+will use up to 1ms additional quota in some periods, thereby preventing the
+cpu-bound application from fully using its quota by that same amount. In these
+instances it will be up to the CFS algorithm (see sched-design-CFS.rst) to
+decide which application is chosen to run, as they will both be runnable and
+have remaining quota. This runtime discrepancy will be made up in the following
+periods when the interactive application idles.
+
 Examples
 --------
 1. Limit a group to 1 CPU worth of runtime.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c3dfd4e28c46..b3e0159442d2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4444,8 +4444,6 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 
  now = sched_clock_cpu(smp_processor_id());
  cfs_b->runtime = cfs_b->quota;
- cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period);
- cfs_b->expires_seq++;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
@@ -4467,8 +4465,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 {
  struct task_group *tg = cfs_rq->tg;
  struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
- u64 amount = 0, min_amount, expires;
- int expires_seq;
+ u64 amount = 0, min_amount;
 
  /* note: this is a positive sum as runtime_remaining <= 0 */
  min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining;
@@ -4485,61 +4482,17 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
  cfs_b->idle = 0;
  }
  }
- expires_seq = cfs_b->expires_seq;
- expires = cfs_b->runtime_expires;
  raw_spin_unlock(&cfs_b->lock);
 
  cfs_rq->runtime_remaining += amount;
- /*
- * we may have advanced our local expiration to account for allowed
- * spread between our sched_clock and the one on which runtime was
- * issued.
- */
- if (cfs_rq->expires_seq != expires_seq) {
- cfs_rq->expires_seq = expires_seq;
- cfs_rq->runtime_expires = expires;
- }
 
  return cfs_rq->runtime_remaining > 0;
 }
 
-/*
- * Note: This depends on the synchronization provided by sched_clock and the
- * fact that rq->clock snapshots this value.
- */
-static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq)
-{
- struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
-
- /* if the deadline is ahead of our clock, nothing to do */
- if (likely((s64)(rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires) < 0))
- return;
-
- if (cfs_rq->runtime_remaining < 0)
- return;
-
- /*
- * If the local deadline has passed we have to consider the
- * possibility that our sched_clock is 'fast' and the global deadline
- * has not truly expired.
- *
- * Fortunately we can check determine whether this the case by checking
- * whether the global deadline(cfs_b->expires_seq) has advanced.
- */
- if (cfs_rq->expires_seq == cfs_b->expires_seq) {
- /* extend local deadline, drift is bounded above by 2 ticks */
- cfs_rq->runtime_expires += TICK_NSEC;
- } else {
- /* global deadline is ahead, expiration has passed */
- cfs_rq->runtime_remaining = 0;
- }
-}
-
 static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
 {
  /* dock delta_exec before expiring quota (as it could span periods) */
  cfs_rq->runtime_remaining -= delta_exec;
- expire_cfs_rq_runtime(cfs_rq);
 
  if (likely(cfs_rq->runtime_remaining > 0))
  return;
@@ -4725,8 +4678,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
  resched_curr(rq);
 }
 
-static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
- u64 remaining, u64 expires)
+static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b, u64 remaining)
 {
  struct cfs_rq *cfs_rq;
  u64 runtime;
@@ -4751,7 +4703,6 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
  remaining -= runtime;
 
  cfs_rq->runtime_remaining += runtime;
- cfs_rq->runtime_expires = expires;
 
  /* we check whether we're throttled above */
  if (cfs_rq->runtime_remaining > 0)
@@ -4776,7 +4727,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
  */
 static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
 {
- u64 runtime, runtime_expires;
+ u64 runtime;
  int throttled;
 
  /* no need to continue the timer with no bandwidth constraint */
@@ -4804,8 +4755,6 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
  /* account preceding periods in which throttling occurred */
  cfs_b->nr_throttled += overrun;
 
- runtime_expires = cfs_b->runtime_expires;
-
  /*
  * This check is repeated as we are holding onto the new bandwidth while
  * we unthrottle. This can potentially race with an unthrottled group
@@ -4818,8 +4767,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
  cfs_b->distribute_running = 1;
  raw_spin_unlock(&cfs_b->lock);
  /* we can't nest cfs_b->lock while distributing bandwidth */
- runtime = distribute_cfs_runtime(cfs_b, runtime,
- runtime_expires);
+ runtime = distribute_cfs_runtime(cfs_b, runtime);
  raw_spin_lock(&cfs_b->lock);
 
  cfs_b->distribute_running = 0;
@@ -4896,8 +4844,7 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
  return;
 
  raw_spin_lock(&cfs_b->lock);
- if (cfs_b->quota != RUNTIME_INF &&
-    cfs_rq->runtime_expires == cfs_b->runtime_expires) {
+ if (cfs_b->quota != RUNTIME_INF) {
  cfs_b->runtime += slack_runtime;
 
  /* we are under rq->lock, defer unthrottling using a timer */
@@ -4929,7 +4876,6 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
 {
  u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
- u64 expires;
 
  /* confirm we're still not at a refresh boundary */
  raw_spin_lock(&cfs_b->lock);
@@ -4946,7 +4892,6 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
  if (cfs_b->quota != RUNTIME_INF && cfs_b->runtime > slice)
  runtime = cfs_b->runtime;
 
- expires = cfs_b->runtime_expires;
  if (runtime)
  cfs_b->distribute_running = 1;
 
@@ -4955,11 +4900,10 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
  if (!runtime)
  return;
 
- runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
+ runtime = distribute_cfs_runtime(cfs_b, runtime);
 
  raw_spin_lock(&cfs_b->lock);
- if (expires == cfs_b->runtime_expires)
- lsub_positive(&cfs_b->runtime, runtime);
+ lsub_positive(&cfs_b->runtime, runtime);
  cfs_b->distribute_running = 0;
  raw_spin_unlock(&cfs_b->lock);
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5e47c67382a1..40fb09c91dc3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -279,8 +279,6 @@ struct cfs_bandwidth {
  ktime_t period;
  u64 quota, runtime;
  s64 hierarchical_quota;
- u64 runtime_expires;
- int expires_seq;
 
  short idle;
  short period_active;
@@ -494,8 +492,6 @@ struct cfs_rq {
 
 #ifdef CONFIG_CFS_BANDWIDTH
  int runtime_enabled;
- int expires_seq;
- u64 runtime_expires;
  s64 runtime_remaining;
 
  u64 throttled_clock, throttled_clock_task;
--
2.17.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU][PATCH 1/2][Bionic] sched/fair: Add lsub_positive() and use it consistently

Kleber Souza
In reply to this post by Khaled Elmously
On 16.10.19 23:15, Khalid Elmously wrote:

> From: Patrick Bellasi <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1832151
>
> The following pattern:
>
>    var -= min_t(typeof(var), var, val);
>
> is used multiple times in fair.c.
>
> The existing sub_positive() already captures that pattern, but it also
> adds an explicit load-store to properly support lockless observations.
> In other cases the pattern above is used to update local, and/or not
> concurrently accessed, variables.
>
> Let's add a simpler version of sub_positive(), targeted at local variables
> updates, which gives the same readability benefits at calling sites,
> without enforcing {READ,WRITE}_ONCE() barriers.
>
> Signed-off-by: Patrick Bellasi <[hidden email]>
> Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
> Cc: Dietmar Eggemann <[hidden email]>
> Cc: Juri Lelli <[hidden email]>
> Cc: Linus Torvalds <[hidden email]>
> Cc: Morten Rasmussen <[hidden email]>
> Cc: Peter Zijlstra <[hidden email]>
> Cc: Quentin Perret <[hidden email]>
> Cc: Steve Muckle <[hidden email]>
> Cc: Suren Baghdasaryan <[hidden email]>
> Cc: Thomas Gleixner <[hidden email]>
> Cc: Todd Kjos <[hidden email]>
> Cc: Vincent Guittot <[hidden email]>
> Link: https://lore.kernel.org/lkml/20181031184527.GA3178@...
> Signed-off-by: Ingo Molnar <[hidden email]>
> (backported picked from commit b5c0ce7bd1848892e2930f481828b6d7750231ed)

^ (backported from commit ...)

> [ kmously: The changes to cpu_util_without() aren't applicable ]
> Signed-off-by: Khalid Elmously <[hidden email]>
> ---
>  kernel/sched/fair.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 09401d5eb471..c3dfd4e28c46 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2743,6 +2743,17 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>   WRITE_ONCE(*ptr, res); \
>  } while (0)
>  
> +/*
> + * Remove and clamp on negative, from a local variable.
> + *
> + * A variant of sub_positive(), which does not use explicit load-store
> + * and is thus optimized for local variable updates.
> + */
> +#define lsub_positive(_ptr, _val) do { \
> + typeof(_ptr) ptr = (_ptr); \
> + *ptr -= min_t(typeof(*ptr), *ptr, _val); \
> +} while (0)
> +
>  #ifdef CONFIG_SMP
>  /*
>   * XXX we want to get rid of these helpers and use the full load resolution.
> @@ -4814,7 +4825,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>   cfs_b->distribute_running = 0;
>   throttled = !list_empty(&cfs_b->throttled_cfs_rq);
>  
> - cfs_b->runtime -= min(runtime, cfs_b->runtime);
> + lsub_positive(&cfs_b->runtime, runtime);
>   }
>  
>   /*
> @@ -4948,7 +4959,7 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
>  
>   raw_spin_lock(&cfs_b->lock);
>   if (expires == cfs_b->runtime_expires)
> - cfs_b->runtime -= min(runtime, cfs_b->runtime);
> + lsub_positive(&cfs_b->runtime, runtime);
>   cfs_b->distribute_running = 0;
>   raw_spin_unlock(&cfs_b->lock);
>  }
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

Kleber Souza
In reply to this post by Khaled Elmously
On 16.10.19 23:15, Khalid Elmously wrote:
> BugLink: https://bugs.launchpad.net/bugs/1832151

The bug report is outdated it seems. It says that 512ac999d275 (sched/fair: Fix bandwidth
timer clock drift condition) is the fix for the bug, but the patches submitted are follow
up fixes for it. Maybe it needs some update? It would be nice to also have it with the
SRU template.

>
> "sched/fair: Fix bandwidth timer clock drift condition" enabled CPU bandwidth "slice expiry" behaviour in cpu-local silos (something that should have been in effect all along but was basically completely broken). The undesired side-effect of this expirty being enabled is that threads of highly-threaded non-CPU-bound applications get throttled even when the application isn't using its full quota.
>
> This fix eliminates the problem by removing cpu-local slice expiry altogether. A small pre-requisite patch makes the fix apply nicely.
>
> A derivative 4.15 cloud kernel was tested with this fix and approved by the cloud provider, and I've tested this fix on the master 4.15 with positive results.

Do you have some more information on how this was tested for regressions on bionic
master kernel? What are the regression potentials?

Would this fix be needed for all newer series as well?
 

>
> More info in launchpad and salesforce.
>
>
>
> Dave Chiluk (1):
>   sched/fair: Fix low cpu usage with high throttling by removing
>     expiration of cpu-local slices
>
> Patrick Bellasi (1):
>   sched/fair: Add lsub_positive() and use it consistently
>
>  Documentation/scheduler/sched-bwc.txt | 74 +++++++++++++++++++-----
>  kernel/sched/fair.c                   | 83 ++++++---------------------
>  kernel/sched/sched.h                  |  4 --
>  3 files changed, 79 insertions(+), 82 deletions(-)
>

The backport looks good, fix is upstream and has been tested to fix
the issue.

I'm ACK'ing it for now, but I would like to get some more information about
regression tests and potential before applying it.

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: ACK/cmnt: [SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

Khaled Elmously

On 2019-10-17 14:57:00 , Kleber Souza wrote:

> On 16.10.19 23:15, Khalid Elmously wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1832151
>
> The bug report is outdated it seems. It says that 512ac999d275 (sched/fair: Fix bandwidth
> timer clock drift condition) is the fix for the bug, but the patches submitted are follow
> up fixes for it. Maybe it needs some update? It would be nice to also have it with the
> SRU template.
>
> >
> > "sched/fair: Fix bandwidth timer clock drift condition" enabled CPU bandwidth "slice expiry" behaviour in cpu-local silos (something that should have been in effect all along but was basically completely broken). The undesired side-effect of this expirty being enabled is that threads of highly-threaded non-CPU-bound applications get throttled even when the application isn't using its full quota.
> >
> > This fix eliminates the problem by removing cpu-local slice expiry altogether. A small pre-requisite patch makes the fix apply nicely.
> >
> > A derivative 4.15 cloud kernel was tested with this fix and approved by the cloud provider, and I've tested this fix on the master 4.15 with positive results.
>
> Do you have some more information on how this was tested for regressions on bionic
> master kernel? What are the regression potentials?
>
> Would this fix be needed for all newer series as well?
>

Thanks for the review. I've updated the bug with more information.

I have not been able to reproduce this problem on the disco kernel, though I'm not entirely sure why. I haven't heard problems of it on any later kernels either, so I just left it at that.

I have now targeted the bug to E and D to make sure to remember to investigate if they need the fix too.

For now, the patch is nominated for Bionic only.



 

> >
> > More info in launchpad and salesforce.
> >
> >
> >
> > Dave Chiluk (1):
> >   sched/fair: Fix low cpu usage with high throttling by removing
> >     expiration of cpu-local slices
> >
> > Patrick Bellasi (1):
> >   sched/fair: Add lsub_positive() and use it consistently
> >
> >  Documentation/scheduler/sched-bwc.txt | 74 +++++++++++++++++++-----
> >  kernel/sched/fair.c                   | 83 ++++++---------------------
> >  kernel/sched/sched.h                  |  4 --
> >  3 files changed, 79 insertions(+), 82 deletions(-)
> >
>
> The backport looks good, fix is upstream and has been tested to fix
> the issue.
>
> I'm ACK'ing it for now, but I would like to get some more information about
> regression tests and potential before applying it.
>
> Acked-by: Kleber Sacilotto de Souza <[hidden email]>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [SRU][PATCH 2/2][Bionic] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices

Juerg Haefliger
In reply to this post by Khaled Elmously
There is an additional follow-on fix for this commit:

commit 763a9ec06c409dcde2a761aac4bb83ff3938e0b3
Author: Qian Cai <[hidden email]>
Date:   Tue Aug 20 14:40:55 2019 -0400

    sched/fair: Fix -Wunused-but-set-variable warnings
   
    Commit:
   
       de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")
   
    introduced a few compilation warnings:
   
      kernel/sched/fair.c: In function '__refill_cfs_bandwidth_runtime':
      kernel/sched/fair.c:4365:6: warning: variable 'now' set but not used [-Wunused-but-set-variable]
      kernel/sched/fair.c: In function 'start_cfs_bandwidth':
      kernel/sched/fair.c:4992:6: warning: variable 'overrun' set but not used [-Wunused-but-set-variable]
   
    Also, __refill_cfs_bandwidth_runtime() does no longer update the
    expiration time, so fix the comments accordingly.
   
    Signed-off-by: Qian Cai <[hidden email]>
    Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
    Reviewed-by: Ben Segall <[hidden email]>
    Reviewed-by: Dave Chiluk <[hidden email]>
    Cc: Linus Torvalds <[hidden email]>
    Cc: Peter Zijlstra <[hidden email]>
    Cc: Thomas Gleixner <[hidden email]>
    Cc: [hidden email]
    Fixes: de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")
    Link: https://lkml.kernel.org/r/1566326455-8038-1-git-send-email-cai@...
    Signed-off-by: Ingo Molnar <[hidden email]>

It's benign and from my limited testing doesn't seem to be needed in
Bionic since we're compiling with -Wno-unused-but-set-variable but certainly
won't hurt to pull in as well. In general, when cherry-picking or backporting
patches, one always should check if there are follow-on fixes for those new
commits.

Patch coming up soon.

...Juerg


> From: Dave Chiluk <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1832151
>
> It has been observed, that highly-threaded, non-cpu-bound applications
> running under cpu.cfs_quota_us constraints can hit a high percentage of
> periods throttled while simultaneously not consuming the allocated
> amount of quota. This use case is typical of user-interactive non-cpu
> bound applications, such as those running in kubernetes or mesos when
> run on multiple cpu Fix low cpu usage with high throttling cores.
>
> This has been root caused to cpu-local run queue being allocated per cpu
> bandwidth slices, and then not fully using that slice within the period.
> At which point the slice and quota expires. This expiration of unused
> slice results in applications not being able to utilize the quota for
> which they are allocated.
>
> The non-expiration of per-cpu slices was recently fixed by
> 'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift
> condition")'. Prior to that it appears that this had been broken since
> at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some
> cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That
> added the following conditional which resulted in slices never being
> expired.
>
> if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
> /* extend local deadline, drift is bounded above by 2 ticks */
> cfs_rq->runtime_expires += TICK_NSEC;
>
> Because this was broken for nearly 5 years, and has recently been fixed
> and is now being noticed by many users running kubernetes
> (https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
> that the mechanisms around expiring runtime should be removed
> altogether.
>
> This allows quota already allocated to per-cpu run-queues to live longer
> than the period boundary. This allows threads on runqueues that do not
> use much CPU to continue to use their remaining slice over a longer
> period of time than cpu.cfs_period_us. However, this helps prevent the
> above condition of hitting throttling while also not fully utilizing
> your cpu quota.
>
> This theoretically allows a machine to use slightly more than its
> allotted quota in some periods. This overflow would be bounded by the
> remaining quota left on each per-cpu runqueueu. This is typically no
> more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will
> change nothing, as they should theoretically fully utilize all of their
> quota in each period. For user-interactive tasks as described above this
> provides a much better user/application experience as their cpu
> utilization will more closely match the amount they requested when they
> hit throttling. ThisFix low cpu usage with high throttling  means that cpu limits no longer strictly apply per
> period for non-cpu bound applications, but that they are still accurate
> over longer timeframes.
>
> This greatly improves performance of high-thread-count, non-cpu bound
> applications with low cfs_quota_us allocation on high-core-count
> machines. In the case of an artificial testcase (10ms/100ms of quota on
> 80 CPU machine), this commit resulted in almost 30x performance
> improvement, while still maintaining correct cpu quota restrictions.
> That testcase is available at https://github.com/indeedeng/fibtest.
>
> Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
> Signed-off-by: Dave Chiluk <[hidden email]>
> Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
> Reviewed-by: Phil Auld <[hidden email]>
> Reviewed-by: Ben Segall <[hidden email]>
> Cc: Ingo Molnar <[hidden email]>
> Cc: John Hammond <[hidden email]>
> Cc: Jonathan Corbet <[hidden email]>
> Cc: Kyle Anderson <[hidden email]>
> Cc: Gabriel Munos <[hidden email]>
> Cc: Peter Oskolkov <[hidden email]>
> Cc: Cong Wang <[hidden email]>
> Cc: Brendan Gregg <[hidden email]>
> Link: https://lkml.kernel.org/r/1563900266-19734-2-git-send-email-chiluk+linux@...
> (backported from commit de53fd7aedb100f03e5d2231cfce0e4993282425)
> [ kmously: A few context adjustments. Mostly because the patch expects some
> raw_spin_lockirqsave() functions in fair.c but the bionic code still uses raw_spin_lock().
> Also 2 lines that needed to be removed from start_cfs_bandwidth() are non-existent in bionic.
> Furthermore, needed to adjust for different whitespace/context in the structs in sched.h ]
> Signed-off-by: Khalid Elmously <[hidden email]>
> ---
>  Documentation/scheduler/sched-bwc.txt | 74 ++++++++++++++++++++++-----
>  kernel/sched/fair.c                   | 70 +++----------------------
>  kernel/sched/sched.h                  |  4 --
>  3 files changed, 67 insertions(+), 81 deletions(-)
>
> diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt
> index f6b1873f68ab..3397995168bb 100644
> --- a/Documentation/scheduler/sched-bwc.txt
> +++ b/Documentation/scheduler/sched-bwc.txt
> @@ -8,15 +8,16 @@ CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension which allows the
>  specification of the maximum CPU bandwidth available to a group or hierarchy.
>  
>  The bandwidth allowed for a group is specified using a quota and period. Within
> -each given "period" (microseconds), a group is allowed to consume only up to
> -"quota" microseconds of CPU time.  When the CPU bandwidth consumption of a
> -group exceeds this limit (for that period), the tasks belonging to its
> -hierarchy will be throttled and are not allowed to run again until the next
> -period.
> -
> -A group's unused runtime is globally tracked, being refreshed with quota units
> -above at each period boundary.  As threads consume this bandwidth it is
> -transferred to cpu-local "silos" on a demand basis.  The amount transferred
> +each given "period" (microseconds), a task group is allocated up to "quota"
> +microseconds of CPU time. That quota is assigned to per-cpu run queues in
> +slices as threads in the cgroup become runnable. Once all quota has been
> +assigned any additional requests for quota will result in those threads being
> +throttled. Throttled threads will not be able to run again until the next
> +period when the quota is replenished.
> +
> +A group's unassigned quota is globally tracked, being refreshed back to
> +cfs_quota units at each period boundary. As threads consume this bandwidth it
> +is transferred to cpu-local "silos" on a demand basis. The amount transferred
>  within each of these updates is tunable and described as the "slice".
>  
>  Management
> @@ -33,12 +34,12 @@ The default values are:
>  
>  A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
>  bandwidth restriction in place, such a group is described as an unconstrained
> -bandwidth group.  This represents the traditional work-conserving behavior for
> +bandwidth group. This represents the traditional work-conserving behavior for
>  CFS.
>  
>  Writing any (valid) positive value(s) will enact the specified bandwidth limit.
> -The minimum quota allowed for the quota or period is 1ms.  There is also an
> -upper bound on the period length of 1s.  Additional restrictions exist when
> +The minimum quota allowed for the quota or period is 1ms. There is also an
> +upper bound on the period length of 1s. Additional restrictions exist when
>  bandwidth limits are used in a hierarchical fashion, these are explained in
>  more detail below.
>  
> @@ -51,8 +52,8 @@ unthrottled if it is in a constrained state.
>  System wide settings
>  --------------------
>  For efficiency run-time is transferred between the global pool and CPU local
> -"silos" in a batch fashion.  This greatly reduces global accounting pressure
> -on large systems.  The amount transferred each time such an update is required
> +"silos" in a batch fashion. This greatly reduces global accounting pressure
> +on large systems. The amount transferred each time such an update is required
>  is described as the "slice".
>  
>  This is tunable via procfs:
> @@ -90,6 +91,51 @@ There are two ways in which a group may become throttled:
>  In case b) above, even though the child may have runtime remaining it will not
>  be allowed to until the parent's runtime is refreshed.
>  
> +CFS Bandwidth Quota Caveats
> +---------------------------
> +Once a slice is assigned to a cpu it does not expire.  However all but 1ms of
> +the slice may be returned to the global pool if all threads on that cpu become
> +unrunnable. This is configured at compile time by the min_cfs_rq_runtime
> +variable. This is a performance tweak that helps prevent added contention on
> +the global lock.
> +
> +The fact that cpu-local slices do not expire results in some interesting corner
> +cases that should be understood.
> +
> +For cgroup cpu constrained applications that are cpu limited this is a
> +relatively moot point because they will naturally consume the entirety of their
> +quota as well as the entirety of each cpu-local slice in each period. As a
> +result it is expected that nr_periods roughly equal nr_throttled, and that
> +cpuacct.usage will increase roughly equal to cfs_quota_us in each period.
> +
> +For highly-threaded, non-cpu bound applications this non-expiration nuance
> +allows applications to briefly burst past their quota limits by the amount of
> +unused slice on each cpu that the task group is running on (typically at most
> +1ms per cpu or as defined by min_cfs_rq_runtime).  This slight burst only
> +applies if quota had been assigned to a cpu and then not fully used or returned
> +in previous periods. This burst amount will not be transferred between cores.
> +As a result, this mechanism still strictly limits the task group to quota
> +average usage, albeit over a longer time window than a single period.  This
> +also limits the burst ability to no more than 1ms per cpu.  This provides
> +better more predictable user experience for highly threaded applications with
> +small quota limits on high core count machines. It also eliminates the
> +propensity to throttle these applications while simultanously using less than
> +quota amounts of cpu. Another way to say this, is that by allowing the unused
> +portion of a slice to remain valid across periods we have decreased the
> +possibility of wastefully expiring quota on cpu-local silos that don't need a
> +full slice's amount of cpu time.
> +
> +The interaction between cpu-bound and non-cpu-bound-interactive applications
> +should also be considered, especially when single core usage hits 100%. If you
> +gave each of these applications half of a cpu-core and they both got scheduled
> +on the same CPU it is theoretically possible that the non-cpu bound application
> +will use up to 1ms additional quota in some periods, thereby preventing the
> +cpu-bound application from fully using its quota by that same amount. In these
> +instances it will be up to the CFS algorithm (see sched-design-CFS.rst) to
> +decide which application is chosen to run, as they will both be runnable and
> +have remaining quota. This runtime discrepancy will be made up in the following
> +periods when the interactive application idles.
> +
>  Examples
>  --------
>  1. Limit a group to 1 CPU worth of runtime.
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c3dfd4e28c46..b3e0159442d2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4444,8 +4444,6 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>  
>   now = sched_clock_cpu(smp_processor_id());
>   cfs_b->runtime = cfs_b->quota;
> - cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period);
> - cfs_b->expires_seq++;
>  }
>  
>  static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
> @@ -4467,8 +4465,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>  {
>   struct task_group *tg = cfs_rq->tg;
>   struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg);
> - u64 amount = 0, min_amount, expires;
> - int expires_seq;
> + u64 amount = 0, min_amount;
>  
>   /* note: this is a positive sum as runtime_remaining <= 0 */
>   min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining;
> @@ -4485,61 +4482,17 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>   cfs_b->idle = 0;
>   }
>   }
> - expires_seq = cfs_b->expires_seq;
> - expires = cfs_b->runtime_expires;
>   raw_spin_unlock(&cfs_b->lock);
>  
>   cfs_rq->runtime_remaining += amount;
> - /*
> - * we may have advanced our local expiration to account for allowed
> - * spread between our sched_clock and the one on which runtime was
> - * issued.
> - */
> - if (cfs_rq->expires_seq != expires_seq) {
> - cfs_rq->expires_seq = expires_seq;
> - cfs_rq->runtime_expires = expires;
> - }
>  
>   return cfs_rq->runtime_remaining > 0;
>  }
>  
> -/*
> - * Note: This depends on the synchronization provided by sched_clock and the
> - * fact that rq->clock snapshots this value.
> - */
> -static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq)
> -{
> - struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
> -
> - /* if the deadline is ahead of our clock, nothing to do */
> - if (likely((s64)(rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires) < 0))
> - return;
> -
> - if (cfs_rq->runtime_remaining < 0)
> - return;
> -
> - /*
> - * If the local deadline has passed we have to consider the
> - * possibility that our sched_clock is 'fast' and the global deadline
> - * has not truly expired.
> - *
> - * Fortunately we can check determine whether this the case by checking
> - * whether the global deadline(cfs_b->expires_seq) has advanced.
> - */
> - if (cfs_rq->expires_seq == cfs_b->expires_seq) {
> - /* extend local deadline, drift is bounded above by 2 ticks */
> - cfs_rq->runtime_expires += TICK_NSEC;
> - } else {
> - /* global deadline is ahead, expiration has passed */
> - cfs_rq->runtime_remaining = 0;
> - }
> -}
> -
>  static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
>  {
>   /* dock delta_exec before expiring quota (as it could span periods) */
>   cfs_rq->runtime_remaining -= delta_exec;
> - expire_cfs_rq_runtime(cfs_rq);
>  
>   if (likely(cfs_rq->runtime_remaining > 0))
>   return;
> @@ -4725,8 +4678,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>   resched_curr(rq);
>  }
>  
> -static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
> - u64 remaining, u64 expires)
> +static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b, u64 remaining)
>  {
>   struct cfs_rq *cfs_rq;
>   u64 runtime;
> @@ -4751,7 +4703,6 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>   remaining -= runtime;
>  
>   cfs_rq->runtime_remaining += runtime;
> - cfs_rq->runtime_expires = expires;
>  
>   /* we check whether we're throttled above */
>   if (cfs_rq->runtime_remaining > 0)
> @@ -4776,7 +4727,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>   */
>  static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>  {
> - u64 runtime, runtime_expires;
> + u64 runtime;
>   int throttled;
>  
>   /* no need to continue the timer with no bandwidth constraint */
> @@ -4804,8 +4755,6 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>   /* account preceding periods in which throttling occurred */
>   cfs_b->nr_throttled += overrun;
>  
> - runtime_expires = cfs_b->runtime_expires;
> -
>   /*
>   * This check is repeated as we are holding onto the new bandwidth while
>   * we unthrottle. This can potentially race with an unthrottled group
> @@ -4818,8 +4767,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>   cfs_b->distribute_running = 1;
>   raw_spin_unlock(&cfs_b->lock);
>   /* we can't nest cfs_b->lock while distributing bandwidth */
> - runtime = distribute_cfs_runtime(cfs_b, runtime,
> - runtime_expires);
> + runtime = distribute_cfs_runtime(cfs_b, runtime);
>   raw_spin_lock(&cfs_b->lock);
>  
>   cfs_b->distribute_running = 0;
> @@ -4896,8 +4844,7 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>   return;
>  
>   raw_spin_lock(&cfs_b->lock);
> - if (cfs_b->quota != RUNTIME_INF &&
> -    cfs_rq->runtime_expires == cfs_b->runtime_expires) {
> + if (cfs_b->quota != RUNTIME_INF) {
>   cfs_b->runtime += slack_runtime;
>  
>   /* we are under rq->lock, defer unthrottling using a timer */
> @@ -4929,7 +4876,6 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>  static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
>  {
>   u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
> - u64 expires;
>  
>   /* confirm we're still not at a refresh boundary */
>   raw_spin_lock(&cfs_b->lock);
> @@ -4946,7 +4892,6 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
>   if (cfs_b->quota != RUNTIME_INF && cfs_b->runtime > slice)
>   runtime = cfs_b->runtime;
>  
> - expires = cfs_b->runtime_expires;
>   if (runtime)
>   cfs_b->distribute_running = 1;
>  
> @@ -4955,11 +4900,10 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
>   if (!runtime)
>   return;
>  
> - runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
> + runtime = distribute_cfs_runtime(cfs_b, runtime);
>  
>   raw_spin_lock(&cfs_b->lock);
> - if (expires == cfs_b->runtime_expires)
> - lsub_positive(&cfs_b->runtime, runtime);
> + lsub_positive(&cfs_b->runtime, runtime);
>   cfs_b->distribute_running = 0;
>   raw_spin_unlock(&cfs_b->lock);
>  }
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 5e47c67382a1..40fb09c91dc3 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -279,8 +279,6 @@ struct cfs_bandwidth {
>   ktime_t period;
>   u64 quota, runtime;
>   s64 hierarchical_quota;
> - u64 runtime_expires;
> - int expires_seq;
>  
>   short idle;
>   short period_active;
> @@ -494,8 +492,6 @@ struct cfs_rq {
>  
>  #ifdef CONFIG_CFS_BANDWIDTH
>   int runtime_enabled;
> - int expires_seq;
> - u64 runtime_expires;
>   s64 runtime_remaining;
>  
>   u64 throttled_clock, throttled_clock_task;

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ACK/cmnt: [SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

Juerg Haefliger
In reply to this post by Khaled Elmously
On Fri, 18 Oct 2019 01:34:54 -0400
Khalid Elmously <[hidden email]> wrote:

> On 2019-10-17 14:57:00 , Kleber Souza wrote:
> > On 16.10.19 23:15, Khalid Elmously wrote:  
> > > BugLink: https://bugs.launchpad.net/bugs/1832151 
> >
> > The bug report is outdated it seems. It says that 512ac999d275 (sched/fair: Fix bandwidth
> > timer clock drift condition) is the fix for the bug, but the patches submitted are follow
> > up fixes for it. Maybe it needs some update? It would be nice to also have it with the
> > SRU template.
> >  
> > >
> > > "sched/fair: Fix bandwidth timer clock drift condition" enabled CPU bandwidth "slice expiry" behaviour in cpu-local silos (something that should have been in effect all along but was basically completely broken). The undesired side-effect of this expirty being enabled is that threads of highly-threaded non-CPU-bound applications get throttled even when the application isn't using its full quota.
> > >
> > > This fix eliminates the problem by removing cpu-local slice expiry altogether. A small pre-requisite patch makes the fix apply nicely.
> > >
> > > A derivative 4.15 cloud kernel was tested with this fix and approved by the cloud provider, and I've tested this fix on the master 4.15 with positive results.  
> >
> > Do you have some more information on how this was tested for regressions on bionic
> > master kernel? What are the regression potentials?
> >
> > Would this fix be needed for all newer series as well?
> >  
>
> Thanks for the review. I've updated the bug with more information.
>
> I have not been able to reproduce this problem on the disco kernel, though I'm not entirely sure why. I haven't heard problems of it on any later kernels either, so I just left it at that.
>
> I have now targeted the bug to E and D to make sure to remember to investigate if they need the fix too.
>
> For now, the patch is nominated for Bionic only.
The commit contains a 'Fixes' tag for a commit that is in both E and D so we
should apply it there as well.

...Juerg

>
>
>  
> > >
> > > More info in launchpad and salesforce.
> > >
> > >
> > >
> > > Dave Chiluk (1):
> > >   sched/fair: Fix low cpu usage with high throttling by removing
> > >     expiration of cpu-local slices
> > >
> > > Patrick Bellasi (1):
> > >   sched/fair: Add lsub_positive() and use it consistently
> > >
> > >  Documentation/scheduler/sched-bwc.txt | 74 +++++++++++++++++++-----
> > >  kernel/sched/fair.c                   | 83 ++++++---------------------
> > >  kernel/sched/sched.h                  |  4 --
> > >  3 files changed, 79 insertions(+), 82 deletions(-)
> > >  
> >
> > The backport looks good, fix is upstream and has been tested to fix
> > the issue.
> >
> > I'm ACK'ing it for now, but I would like to get some more information about
> > regression tests and potential before applying it.
> >
> > Acked-by: Kleber Sacilotto de Souza <[hidden email]>  
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[SRU][Bionic][PATCH] sched/fair: Fix -Wunused-but-set-variable warnings

Juerg Haefliger
In reply to this post by Khaled Elmously
From: Qian Cai <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1832151

Commit:

   de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")

introduced a few compilation warnings:

  kernel/sched/fair.c: In function '__refill_cfs_bandwidth_runtime':
  kernel/sched/fair.c:4365:6: warning: variable 'now' set but not used [-Wunused-but-set-variable]
  kernel/sched/fair.c: In function 'start_cfs_bandwidth':
  kernel/sched/fair.c:4992:6: warning: variable 'overrun' set but not used [-Wunused-but-set-variable]

Also, __refill_cfs_bandwidth_runtime() does no longer update the
expiration time, so fix the comments accordingly.

Signed-off-by: Qian Cai <[hidden email]>
Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
Reviewed-by: Ben Segall <[hidden email]>
Reviewed-by: Dave Chiluk <[hidden email]>
Cc: Linus Torvalds <[hidden email]>
Cc: Peter Zijlstra <[hidden email]>
Cc: Thomas Gleixner <[hidden email]>
Cc: [hidden email]
Fixes: de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")
Link: https://lkml.kernel.org/r/1566326455-8038-1-git-send-email-cai@...
Signed-off-by: Ingo Molnar <[hidden email]>
(backported from commit 763a9ec06c409dcde2a761aac4bb83ff3938e0b3)
[juergh: Dropped modifications of start_cfs_bandwidth() (not applicable for
 Bionic 4.15).]
Signed-off-by: Juerg Haefliger <[hidden email]>
---
 kernel/sched/fair.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7c797c8c8071..6c5766f9e924 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4429,21 +4429,16 @@ static inline u64 sched_cfs_bandwidth_slice(void)
 }
 
 /*
- * Replenish runtime according to assigned quota and update expiration time.
- * We use sched_clock_cpu directly instead of rq->clock to avoid adding
- * additional synchronization around rq->lock.
+ * Replenish runtime according to assigned quota. We use sched_clock_cpu
+ * directly instead of rq->clock to avoid adding additional synchronization
+ * around rq->lock.
  *
  * requires cfs_b->lock
  */
 void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 {
- u64 now;
-
- if (cfs_b->quota == RUNTIME_INF)
- return;
-
- now = sched_clock_cpu(smp_processor_id());
- cfs_b->runtime = cfs_b->quota;
+ if (cfs_b->quota != RUNTIME_INF)
+ cfs_b->runtime = cfs_b->quota;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
--
2.20.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

Juerg Haefliger
In reply to this post by Khaled Elmously
On Wed, 16 Oct 2019 17:15:36 -0400
Khalid Elmously <[hidden email]> wrote:

> BugLink: https://bugs.launchpad.net/bugs/1832151
>
> "sched/fair: Fix bandwidth timer clock drift condition" enabled CPU bandwidth "slice expiry" behaviour in cpu-local silos (something that should have been in effect all along but was basically completely broken). The undesired side-effect of this expirty being enabled is that threads of highly-threaded non-CPU-bound applications get throttled even when the application isn't using its full quota.
>
> This fix eliminates the problem by removing cpu-local slice expiry altogether. A small pre-requisite patch makes the fix apply nicely.
>
> A derivative 4.15 cloud kernel was tested with this fix and approved by the cloud provider, and I've tested this fix on the master 4.15 with positive results.
>
> More info in launchpad and salesforce.

On condition that Kleber's feedback is addressed:
 
Acked-by: Juerg Haefliger <[hidden email]>


>
> Dave Chiluk (1):
>   sched/fair: Fix low cpu usage with high throttling by removing
>     expiration of cpu-local slices
>
> Patrick Bellasi (1):
>   sched/fair: Add lsub_positive() and use it consistently
>
>  Documentation/scheduler/sched-bwc.txt | 74 +++++++++++++++++++-----
>  kernel/sched/fair.c                   | 83 ++++++---------------------
>  kernel/sched/sched.h                  |  4 --
>  3 files changed, 79 insertions(+), 82 deletions(-)
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

attachment0 (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ACK/cmnt: [SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

Khaled Elmously
In reply to this post by Juerg Haefliger
On 2019-10-21 11:05:17 , Juerg Haefliger wrote:

> On Fri, 18 Oct 2019 01:34:54 -0400
> Khalid Elmously <[hidden email]> wrote:
>
> > On 2019-10-17 14:57:00 , Kleber Souza wrote:
> > > On 16.10.19 23:15, Khalid Elmously wrote:  
> > > > BugLink: https://bugs.launchpad.net/bugs/1832151 
> > >
> > > The bug report is outdated it seems. It says that 512ac999d275 (sched/fair: Fix bandwidth
> > > timer clock drift condition) is the fix for the bug, but the patches submitted are follow
> > > up fixes for it. Maybe it needs some update? It would be nice to also have it with the
> > > SRU template.
> > >  
> > > >
> > > > "sched/fair: Fix bandwidth timer clock drift condition" enabled CPU bandwidth "slice expiry" behaviour in cpu-local silos (something that should have been in effect all along but was basically completely broken). The undesired side-effect of this expirty being enabled is that threads of highly-threaded non-CPU-bound applications get throttled even when the application isn't using its full quota.
> > > >
> > > > This fix eliminates the problem by removing cpu-local slice expiry altogether. A small pre-requisite patch makes the fix apply nicely.
> > > >
> > > > A derivative 4.15 cloud kernel was tested with this fix and approved by the cloud provider, and I've tested this fix on the master 4.15 with positive results.  
> > >
> > > Do you have some more information on how this was tested for regressions on bionic
> > > master kernel? What are the regression potentials?
> > >
> > > Would this fix be needed for all newer series as well?
> > >  
> >
> > Thanks for the review. I've updated the bug with more information.
> >
> > I have not been able to reproduce this problem on the disco kernel, though I'm not entirely sure why. I haven't heard problems of it on any later kernels either, so I just left it at that.
> >
> > I have now targeted the bug to E and D to make sure to remember to investigate if they need the fix too.
> >
> > For now, the patch is nominated for Bionic only.
>
> The commit contains a 'Fixes' tag for a commit that is in both E and D so we
> should apply it there as well.
>

Sure. I've targeted the bug for D and E and will apply the fix there and test it.

But I don't see a reason to delay this patch/SRU for that. Reviewing the patch for just B for now would be helpful.

Thanks
Khaled


> ...Juerg
>
> >
> >
> >  
> > > >
> > > > More info in launchpad and salesforce.
> > > >
> > > >
> > > >
> > > > Dave Chiluk (1):
> > > >   sched/fair: Fix low cpu usage with high throttling by removing
> > > >     expiration of cpu-local slices
> > > >
> > > > Patrick Bellasi (1):
> > > >   sched/fair: Add lsub_positive() and use it consistently
> > > >
> > > >  Documentation/scheduler/sched-bwc.txt | 74 +++++++++++++++++++-----
> > > >  kernel/sched/fair.c                   | 83 ++++++---------------------
> > > >  kernel/sched/sched.h                  |  4 --
> > > >  3 files changed, 79 insertions(+), 82 deletions(-)
> > > >  
> > >
> > > The backport looks good, fix is upstream and has been tested to fix
> > > the issue.
> > >
> > > I'm ACK'ing it for now, but I would like to get some more information about
> > > regression tests and potential before applying it.
> > >
> > > Acked-by: Kleber Sacilotto de Souza <[hidden email]>  
> >
>



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Bionic][PATCH] sched/fair: Fix -Wunused-but-set-variable warnings

Kleber Souza
In reply to this post by Juerg Haefliger
On 10/21/19 11:06 AM, Juerg Haefliger wrote:

> From: Qian Cai <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1832151
>
> Commit:
>
>    de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")
>
> introduced a few compilation warnings:
>
>   kernel/sched/fair.c: In function '__refill_cfs_bandwidth_runtime':
>   kernel/sched/fair.c:4365:6: warning: variable 'now' set but not used [-Wunused-but-set-variable]
>   kernel/sched/fair.c: In function 'start_cfs_bandwidth':
>   kernel/sched/fair.c:4992:6: warning: variable 'overrun' set but not used [-Wunused-but-set-variable]
>
> Also, __refill_cfs_bandwidth_runtime() does no longer update the
> expiration time, so fix the comments accordingly.
>
> Signed-off-by: Qian Cai <[hidden email]>
> Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
> Reviewed-by: Ben Segall <[hidden email]>
> Reviewed-by: Dave Chiluk <[hidden email]>
> Cc: Linus Torvalds <[hidden email]>
> Cc: Peter Zijlstra <[hidden email]>
> Cc: Thomas Gleixner <[hidden email]>
> Cc: [hidden email]
> Fixes: de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")
> Link: https://lkml.kernel.org/r/1566326455-8038-1-git-send-email-cai@...
> Signed-off-by: Ingo Molnar <[hidden email]>
> (backported from commit 763a9ec06c409dcde2a761aac4bb83ff3938e0b3)
> [juergh: Dropped modifications of start_cfs_bandwidth() (not applicable for
>  Bionic 4.15).]
> Signed-off-by: Juerg Haefliger <[hidden email]>

Acked-by: Kleber Sacilotto de Souza <[hidden email]>

> ---
>  kernel/sched/fair.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7c797c8c8071..6c5766f9e924 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4429,21 +4429,16 @@ static inline u64 sched_cfs_bandwidth_slice(void)
>  }
>  
>  /*
> - * Replenish runtime according to assigned quota and update expiration time.
> - * We use sched_clock_cpu directly instead of rq->clock to avoid adding
> - * additional synchronization around rq->lock.
> + * Replenish runtime according to assigned quota. We use sched_clock_cpu
> + * directly instead of rq->clock to avoid adding additional synchronization
> + * around rq->lock.
>   *
>   * requires cfs_b->lock
>   */
>  void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>  {
> - u64 now;
> -
> - if (cfs_b->quota == RUNTIME_INF)
> - return;
> -
> - now = sched_clock_cpu(smp_processor_id());
> - cfs_b->runtime = cfs_b->quota;
> + if (cfs_b->quota != RUNTIME_INF)
> + cfs_b->runtime = cfs_b->quota;
>  }
>  
>  static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][Bionic][PATCH] sched/fair: Fix -Wunused-but-set-variable warnings

Khaled Elmously
In reply to this post by Juerg Haefliger



On 2019-10-21 11:06:40 , Juerg Haefliger wrote:

> From: Qian Cai <[hidden email]>
>
> BugLink: https://bugs.launchpad.net/bugs/1832151
>
> Commit:
>
>    de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")
>
> introduced a few compilation warnings:
>
>   kernel/sched/fair.c: In function '__refill_cfs_bandwidth_runtime':
>   kernel/sched/fair.c:4365:6: warning: variable 'now' set but not used [-Wunused-but-set-variable]
>   kernel/sched/fair.c: In function 'start_cfs_bandwidth':
>   kernel/sched/fair.c:4992:6: warning: variable 'overrun' set but not used [-Wunused-but-set-variable]
>
> Also, __refill_cfs_bandwidth_runtime() does no longer update the
> expiration time, so fix the comments accordingly.
>
> Signed-off-by: Qian Cai <[hidden email]>
> Signed-off-by: Peter Zijlstra (Intel) <[hidden email]>
> Reviewed-by: Ben Segall <[hidden email]>
> Reviewed-by: Dave Chiluk <[hidden email]>
> Cc: Linus Torvalds <[hidden email]>
> Cc: Peter Zijlstra <[hidden email]>
> Cc: Thomas Gleixner <[hidden email]>
> Cc: [hidden email]
> Fixes: de53fd7aedb1 ("sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices")
> Link: https://lkml.kernel.org/r/1566326455-8038-1-git-send-email-cai@...
> Signed-off-by: Ingo Molnar <[hidden email]>
> (backported from commit 763a9ec06c409dcde2a761aac4bb83ff3938e0b3)
> [juergh: Dropped modifications of start_cfs_bandwidth() (not applicable for
>  Bionic 4.15).]
> Signed-off-by: Juerg Haefliger <[hidden email]>
> ---
>  kernel/sched/fair.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7c797c8c8071..6c5766f9e924 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4429,21 +4429,16 @@ static inline u64 sched_cfs_bandwidth_slice(void)
>  }
>  
>  /*
> - * Replenish runtime according to assigned quota and update expiration time.
> - * We use sched_clock_cpu directly instead of rq->clock to avoid adding
> - * additional synchronization around rq->lock.
> + * Replenish runtime according to assigned quota. We use sched_clock_cpu
> + * directly instead of rq->clock to avoid adding additional synchronization
> + * around rq->lock.
>   *
>   * requires cfs_b->lock
>   */
>  void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>  {
> - u64 now;
> -
> - if (cfs_b->quota == RUNTIME_INF)
> - return;
> -
> - now = sched_clock_cpu(smp_processor_id());
> - cfs_b->runtime = cfs_b->quota;
> + if (cfs_b->quota != RUNTIME_INF)
> + cfs_b->runtime = cfs_b->quota;
>  }
>  
>  static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
> --
> 2.20.1
>
>

Thanks for catching that Juerg!
 
Acked-by: Khalid Elmously <[hidden email]>



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED/cmt: [SRU][PATCH 0/2][Bionic] Fix aggressive CFS throttling

Khaled Elmously
In reply to this post by Khaled Elmously
Applied 3 patches - thanks Juerge and Kleber

Also fixed the "backported picked from" -> "backported from"

Thanks


On 2019-10-16 17:15:36 , Khalid Elmously wrote:

> BugLink: https://bugs.launchpad.net/bugs/1832151
>
> "sched/fair: Fix bandwidth timer clock drift condition" enabled CPU bandwidth "slice expiry" behaviour in cpu-local silos (something that should have been in effect all along but was basically completely broken). The undesired side-effect of this expirty being enabled is that threads of highly-threaded non-CPU-bound applications get throttled even when the application isn't using its full quota.
>
> This fix eliminates the problem by removing cpu-local slice expiry altogether. A small pre-requisite patch makes the fix apply nicely.
>
> A derivative 4.15 cloud kernel was tested with this fix and approved by the cloud provider, and I've tested this fix on the master 4.15 with positive results.
>
> More info in launchpad and salesforce.
>
>
>
> Dave Chiluk (1):
>   sched/fair: Fix low cpu usage with high throttling by removing
>     expiration of cpu-local slices
>
> Patrick Bellasi (1):
>   sched/fair: Add lsub_positive() and use it consistently
>
>  Documentation/scheduler/sched-bwc.txt | 74 +++++++++++++++++++-----
>  kernel/sched/fair.c                   | 83 ++++++---------------------
>  kernel/sched/sched.h                  |  4 --
>  3 files changed, 79 insertions(+), 82 deletions(-)
>
> --
> 2.17.1
>

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team