Yakkety/Zesty SRU - Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs caused KVM host hung and all KVM guests down.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Yakkety/Zesty SRU - Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs caused KVM host hung and all KVM guests down.

Tim Gardner-2
https://bugs.launchpad.net/bugs/1651248

[PATCH 1/5] KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT
[PATCH 2/5] KVM: PPC: Book 3S: XICS: correct the real mode ICP
[PATCH 3/5] KVM: PPC: Book 3S: XICS: Fix potential issue with
[PATCH 4/5] KVM: PPC: Book 3S: XICS: Implement ICS P/Q states
[PATCH 5/5] KVM: PPC: Book 3S: XICS: Don't lock twice when checking

rtg

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/5] KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT

Tim Gardner-2
From: Li Zhong <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1651248

Commit b0221556dbd3 ("KVM: PPC: Book3S HV: Move virtual mode ICP functions
 to real-mode") removed the setting of the XICS_RM_REJECT flag. And
since that commit, nothing else sets the flag any more, so we can remove
the flag and the remaining code that handles it, including the counter
that counts how many times it get set.

Signed-off-by: Li Zhong <[hidden email]>
Signed-off-by: Paul Mackerras <[hidden email]>
(cherry picked from linux-next commit 5efa6605151b84029edeb2e07f2d2d74b52c106f)
Signed-off-by: Tim Gardner <[hidden email]>
---
 arch/powerpc/kvm/book3s_xics.c | 12 +++---------
 arch/powerpc/kvm/book3s_xics.h |  2 --
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 3bdc639..6f14401 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -832,10 +832,6 @@ int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
  icp->n_rm_check_resend++;
  icp_check_resend(xics, icp->rm_resend_icp);
  }
- if (icp->rm_action & XICS_RM_REJECT) {
- icp->n_rm_reject++;
- icp_deliver_irq(xics, icp, icp->rm_reject);
- }
  if (icp->rm_action & XICS_RM_NOTIFY_EOI) {
  icp->n_rm_notify_eoi++;
  kvm_notify_acked_irq(vcpu->kvm, 0, icp->rm_eoied_irq);
@@ -920,7 +916,7 @@ static int xics_debug_show(struct seq_file *m, void *private)
  int icsid, i;
  unsigned long flags;
  unsigned long t_rm_kick_vcpu, t_rm_check_resend;
- unsigned long t_rm_reject, t_rm_notify_eoi;
+ unsigned long t_rm_notify_eoi;
  unsigned long t_reject, t_check_resend;
 
  if (!kvm)
@@ -929,7 +925,6 @@ static int xics_debug_show(struct seq_file *m, void *private)
  t_rm_kick_vcpu = 0;
  t_rm_notify_eoi = 0;
  t_rm_check_resend = 0;
- t_rm_reject = 0;
  t_check_resend = 0;
  t_reject = 0;
 
@@ -952,14 +947,13 @@ static int xics_debug_show(struct seq_file *m, void *private)
  t_rm_kick_vcpu += icp->n_rm_kick_vcpu;
  t_rm_notify_eoi += icp->n_rm_notify_eoi;
  t_rm_check_resend += icp->n_rm_check_resend;
- t_rm_reject += icp->n_rm_reject;
  t_check_resend += icp->n_check_resend;
  t_reject += icp->n_reject;
  }
 
- seq_printf(m, "ICP Guest->Host totals: kick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n",
+ seq_printf(m, "ICP Guest->Host totals: kick_vcpu=%lu check_resend=%lu notify_eoi=%lu\n",
  t_rm_kick_vcpu, t_rm_check_resend,
- t_rm_reject, t_rm_notify_eoi);
+ t_rm_notify_eoi);
  seq_printf(m, "ICP Real Mode totals: check_resend=%lu resend=%lu\n",
  t_check_resend, t_reject);
  for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 2a50320..1d5fac8 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -73,7 +73,6 @@ struct kvmppc_icp {
  */
 #define XICS_RM_KICK_VCPU 0x1
 #define XICS_RM_CHECK_RESEND 0x2
-#define XICS_RM_REJECT 0x4
 #define XICS_RM_NOTIFY_EOI 0x8
  u32 rm_action;
  struct kvm_vcpu *rm_kick_target;
@@ -84,7 +83,6 @@ struct kvmppc_icp {
  /* Counters for each reason we exited real mode */
  unsigned long n_rm_kick_vcpu;
  unsigned long n_rm_check_resend;
- unsigned long n_rm_reject;
  unsigned long n_rm_notify_eoi;
  /* Counters for handling ICP processing in real mode */
  unsigned long n_check_resend;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/5] KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter

Tim Gardner-2
In reply to this post by Tim Gardner-2
From: Li Zhong <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1651248

Some counters are added in Commit 6e0365b78273 ("KVM: PPC: Book3S HV:
Add ICP real mode counters"), to provide some performance statistics to
determine whether further optimizing is needed for real mode functions.

The n_reject counter counts how many times ICP rejects an irq because of
priority in real mode. The redelivery of an lsi that is still asserted
after eoi doesn't fall into this category, so the increasement there is
removed.

Also, it needs to be increased in icp_rm_deliver_irq() if it rejects
another one.

Signed-off-by: Li Zhong <[hidden email]>
Signed-off-by: Paul Mackerras <[hidden email]>
(cherry picked from linux-next commit 37451bc95dee0e666927d6ffdda302dbbaaae6fa)
Signed-off-by: Tim Gardner <[hidden email]>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 06edc43..9f6c8fe 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -380,6 +380,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  */
  if (reject && reject != XICS_IPI) {
  arch_spin_unlock(&ics->lock);
+ icp->n_reject++;
  new_irq = reject;
  goto again;
  }
@@ -711,10 +712,8 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
  state = &ics->irq_state[src];
 
  /* Still asserted, resend it */
- if (state->asserted) {
- icp->n_reject++;
+ if (state->asserted)
  icp_rm_deliver_irq(xics, icp, irq);
- }
 
  if (!hlist_empty(&vcpu->kvm->irq_ack_notifier_list)) {
  icp->rm_action |= XICS_RM_NOTIFY_EOI;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/5] KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resends

Tim Gardner-2
In reply to this post by Tim Gardner-2
From: Li Zhong <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1651248

It is possible that in the following order, one irq is resent twice:

        CPU 1                                   CPU 2

ics_check_resend()
  lock ics_lock
    see resend set
  unlock ics_lock
                                       /* change affinity of the irq */
                                       kvmppc_xics_set_xive()
                                         write_xive()
                                           lock ics_lock
                                             see resend set
                                           unlock ics_lock

                                         icp_deliver_irq() /* resend */

  icp_deliver_irq() /* resend again */

It doesn't have any user-visible effect at present, but needs to be avoided
when the following patch implementing the P/Q stuff is applied.

This patch clears the resend flag before releasing the ics lock, when we
know we will do a re-delivery after checking the flag, or setting the flag.

Signed-off-by: Li Zhong <[hidden email]>
Signed-off-by: Paul Mackerras <[hidden email]>
(cherry picked from linux-next commit bf5a71d53835110d46d33eb5335713ffdbff9ab6)
Signed-off-by: Tim Gardner <[hidden email]>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 3 +++
 arch/powerpc/kvm/book3s_xics.c       | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 9f6c8fe..16349c9 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -52,6 +52,8 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
  if (!state->resend)
  continue;
 
+ state->resend = 0;
+
  arch_spin_unlock(&ics->lock);
  icp_rm_deliver_irq(xics, icp, state->number);
  arch_spin_lock(&ics->lock);
@@ -400,6 +402,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  */
  smp_mb();
  if (!icp->state.need_resend) {
+ state->resend = 0;
  arch_spin_unlock(&ics->lock);
  goto again;
  }
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 6f14401..44fda52 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -125,6 +125,8 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
  if (!state->resend)
  continue;
 
+ state->resend = 0;
+
  XICS_DBG("resend %#x prio %#x\n", state->number,
       state->priority);
 
@@ -155,6 +157,7 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
  deliver = false;
  if ((state->masked_pending || state->resend) && priority != MASKED) {
  state->masked_pending = 0;
+ state->resend = 0;
  deliver = true;
  }
 
@@ -488,6 +491,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  */
  smp_mb();
  if (!icp->state.need_resend) {
+ state->resend = 0;
  arch_spin_unlock(&ics->lock);
  local_irq_restore(flags);
  goto again;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/5] KVM: PPC: Book 3S: XICS: Implement ICS P/Q states

Tim Gardner-2
In reply to this post by Tim Gardner-2
From: Li Zhong <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1651248

This patch implements P(Presented)/Q(Queued) states for ICS irqs.

When the interrupt is presented, set P. Present if P was not set.
If P is already set, don't present again, set Q.
When the interrupt is EOI'ed, move Q into P (and clear Q). If it is
set, re-present.

The asserted flag used by LSI is also incorporated into the P bit.

When the irq state is saved, P/Q bits are also saved, they need some
qemu modifications to be recognized and passed around to be restored.
KVM_XICS_PENDING bit set and saved should also indicate
KVM_XICS_PRESENTED bit set and saved. But it is possible some old
code doesn't have/recognize the P bit, so when we restore, we set P
for PENDING bit, too.

The idea and much of the code come from Ben.

Signed-off-by: Li Zhong <[hidden email]>
Signed-off-by: Paul Mackerras <[hidden email]>
(cherry picked from linux-next commit 17d48610ae0fa218aa386b16a538c792991a3652)
Signed-off-by: Tim Gardner <[hidden email]>
---
 arch/powerpc/include/uapi/asm/kvm.h  |   2 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 100 +++++++++++++++++++---------
 arch/powerpc/kvm/book3s_xics.c       | 125 ++++++++++++++++++++++++-----------
 arch/powerpc/kvm/book3s_xics.h       |   5 +-
 4 files changed, 161 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 3603b6f..e3db3a5 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -613,5 +613,7 @@ struct kvm_get_htab_header {
 #define  KVM_XICS_LEVEL_SENSITIVE (1ULL << 40)
 #define  KVM_XICS_MASKED (1ULL << 41)
 #define  KVM_XICS_PENDING (1ULL << 42)
+#define  KVM_XICS_PRESENTED (1ULL << 43)
+#define  KVM_XICS_QUEUED (1ULL << 44)
 
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 16349c9..30f82c7 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -672,51 +672,39 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
  return check_too_hard(xics, icp);
 }
 
-int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+static int ics_rm_eoi(struct kvm_vcpu *vcpu, u32 irq)
 {
  struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
  struct kvmppc_icp *icp = vcpu->arch.icp;
  struct kvmppc_ics *ics;
  struct ics_irq_state *state;
- u32 irq = xirr & 0x00ffffff;
  u16 src;
-
- if (!xics || !xics->real_mode)
- return H_TOO_HARD;
+ u32 pq_old, pq_new;
 
  /*
- * ICP State: EOI
- *
- * Note: If EOI is incorrectly used by SW to lower the CPPR
- * value (ie more favored), we do not check for rejection of
- * a pending interrupt, this is a SW error and PAPR sepcifies
- * that we don't have to deal with it.
+ * ICS EOI handling: For LSI, if P bit is still set, we need to
+ * resend it.
  *
- * The sending of an EOI to the ICS is handled after the
- * CPPR update
- *
- * ICP State: Down_CPPR which we handle
- * in a separate function as it's shared with H_CPPR.
+ * For MSI, we move Q bit into P (and clear Q). If it is set,
+ * resend it.
  */
- icp_rm_down_cppr(xics, icp, xirr >> 24);
 
- /* IPIs have no EOI */
- if (irq == XICS_IPI)
- goto bail;
- /*
- * EOI handling: If the interrupt is still asserted, we need to
- * resend it. We can take a lockless "peek" at the ICS state here.
- *
- * "Message" interrupts will never have "asserted" set
- */
  ics = kvmppc_xics_find_ics(xics, irq, &src);
  if (!ics)
  goto bail;
+
  state = &ics->irq_state[src];
 
- /* Still asserted, resend it */
- if (state->asserted)
- icp_rm_deliver_irq(xics, icp, irq);
+ if (state->lsi)
+ pq_new = state->pq_state;
+ else
+ do {
+ pq_old = state->pq_state;
+ pq_new = pq_old >> 1;
+ } while (cmpxchg(&state->pq_state, pq_old, pq_new) != pq_old);
+
+ if (pq_new & PQ_PRESENTED)
+ icp_rm_deliver_irq(xics, NULL, irq);
 
  if (!hlist_empty(&vcpu->kvm->irq_ack_notifier_list)) {
  icp->rm_action |= XICS_RM_NOTIFY_EOI;
@@ -737,10 +725,43 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
  state->intr_cpu = -1;
  }
  }
+
  bail:
  return check_too_hard(xics, icp);
 }
 
+int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+{
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ u32 irq = xirr & 0x00ffffff;
+
+ if (!xics || !xics->real_mode)
+ return H_TOO_HARD;
+
+ /*
+ * ICP State: EOI
+ *
+ * Note: If EOI is incorrectly used by SW to lower the CPPR
+ * value (ie more favored), we do not check for rejection of
+ * a pending interrupt, this is a SW error and PAPR specifies
+ * that we don't have to deal with it.
+ *
+ * The sending of an EOI to the ICS is handled after the
+ * CPPR update
+ *
+ * ICP State: Down_CPPR which we handle
+ * in a separate function as it's shared with H_CPPR.
+ */
+ icp_rm_down_cppr(xics, icp, xirr >> 24);
+
+ /* IPIs have no EOI */
+ if (irq == XICS_IPI)
+ return check_too_hard(xics, icp);
+
+ return ics_rm_eoi(vcpu, irq);
+}
+
 unsigned long eoi_rc;
 
 static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
@@ -827,14 +848,33 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
 {
  struct kvmppc_xics *xics;
  struct kvmppc_icp *icp;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *state;
  u32 irq;
+ u16 src;
+ u32 pq_old, pq_new;
 
  irq = irq_map->v_hwirq;
  xics = vcpu->kvm->arch.xics;
  icp = vcpu->arch.icp;
 
  kvmppc_rm_handle_irq_desc(irq_map->desc);
- icp_rm_deliver_irq(xics, icp, irq);
+
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics)
+ return 2;
+
+ state = &ics->irq_state[src];
+
+ /* only MSIs register bypass producers, so it must be MSI here */
+ do {
+ pq_old = state->pq_state;
+ pq_new = ((pq_old << 1) & 3) | PQ_PRESENTED;
+ } while (cmpxchg(&state->pq_state, pq_old, pq_new) != pq_old);
+
+ /* Test P=1, Q=0, this is the only case where we present */
+ if (pq_new == PQ_PRESENTED)
+ icp_rm_deliver_irq(xics, icp, irq);
 
  /* EOI the interrupt */
  icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 44fda52..e9ba11e 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -75,6 +75,7 @@ static int ics_deliver_irq(struct kvmppc_xics *xics, u32 irq, u32 level)
  struct ics_irq_state *state;
  struct kvmppc_ics *ics;
  u16 src;
+ u32 pq_old, pq_new;
 
  XICS_DBG("ics deliver %#x (level: %d)\n", irq, level);
 
@@ -87,25 +88,41 @@ static int ics_deliver_irq(struct kvmppc_xics *xics, u32 irq, u32 level)
  if (!state->exists)
  return -EINVAL;
 
+ if (level == KVM_INTERRUPT_SET_LEVEL || level == KVM_INTERRUPT_SET)
+ level = 1;
+ else if (level == KVM_INTERRUPT_UNSET)
+ level = 0;
  /*
- * We set state->asserted locklessly. This should be fine as
- * we are the only setter, thus concurrent access is undefined
- * to begin with.
+ * Take other values the same as 1, consistent with original code.
+ * maybe WARN here?
  */
- if ((level == 1 && state->lsi) || level == KVM_INTERRUPT_SET_LEVEL)
- state->asserted = 1;
- else if (level == 0 || level == KVM_INTERRUPT_UNSET) {
- state->asserted = 0;
+
+ if (!state->lsi && level == 0) /* noop for MSI */
  return 0;
- }
+
+ do {
+ pq_old = state->pq_state;
+ if (state->lsi) {
+ if (level) {
+ if (pq_old & PQ_PRESENTED)
+ /* Setting already set LSI ... */
+ return 0;
+
+ pq_new = PQ_PRESENTED;
+ } else
+ pq_new = 0;
+ } else
+ pq_new = ((pq_old << 1) & 3) | PQ_PRESENTED;
+ } while (cmpxchg(&state->pq_state, pq_old, pq_new) != pq_old);
+
+ /* Test P=1, Q=0, this is the only case where we present */
+ if (pq_new == PQ_PRESENTED)
+ icp_deliver_irq(xics, NULL, irq);
 
  /* Record which CPU this arrived on for passed-through interrupts */
  if (state->host_irq)
  state->intr_cpu = raw_smp_processor_id();
 
- /* Attempt delivery */
- icp_deliver_irq(xics, NULL, irq);
-
  return 0;
 }
 
@@ -768,14 +785,51 @@ static noinline void kvmppc_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
  icp_deliver_irq(xics, icp, reject);
 }
 
-static noinline int kvmppc_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+static int ics_eoi(struct kvm_vcpu *vcpu, u32 irq)
 {
  struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
  struct kvmppc_icp *icp = vcpu->arch.icp;
  struct kvmppc_ics *ics;
  struct ics_irq_state *state;
- u32 irq = xirr & 0x00ffffff;
  u16 src;
+ u32 pq_old, pq_new;
+
+ /*
+ * ICS EOI handling: For LSI, if P bit is still set, we need to
+ * resend it.
+ *
+ * For MSI, we move Q bit into P (and clear Q). If it is set,
+ * resend it.
+ */
+
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics) {
+ XICS_DBG("ios_eoi: IRQ 0x%06x not found !\n", irq);
+ return H_PARAMETER;
+ }
+ state = &ics->irq_state[src];
+
+ if (state->lsi)
+ pq_new = state->pq_state;
+ else
+ do {
+ pq_old = state->pq_state;
+ pq_new = pq_old >> 1;
+ } while (cmpxchg(&state->pq_state, pq_old, pq_new) != pq_old);
+
+ if (pq_new & PQ_PRESENTED)
+ icp_deliver_irq(xics, icp, irq);
+
+ kvm_notify_acked_irq(vcpu->kvm, 0, irq);
+
+ return H_SUCCESS;
+}
+
+static noinline int kvmppc_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+{
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ u32 irq = xirr & 0x00ffffff;
 
  XICS_DBG("h_eoi vcpu %d eoi %#lx\n", vcpu->vcpu_id, xirr);
 
@@ -798,26 +852,8 @@ static noinline int kvmppc_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
  /* IPIs have no EOI */
  if (irq == XICS_IPI)
  return H_SUCCESS;
- /*
- * EOI handling: If the interrupt is still asserted, we need to
- * resend it. We can take a lockless "peek" at the ICS state here.
- *
- * "Message" interrupts will never have "asserted" set
- */
- ics = kvmppc_xics_find_ics(xics, irq, &src);
- if (!ics) {
- XICS_DBG("h_eoi: IRQ 0x%06x not found !\n", irq);
- return H_PARAMETER;
- }
- state = &ics->irq_state[src];
-
- /* Still asserted, resend it */
- if (state->asserted)
- icp_deliver_irq(xics, icp, irq);
-
- kvm_notify_acked_irq(vcpu->kvm, 0, irq);
 
- return H_SUCCESS;
+ return ics_eoi(vcpu, irq);
 }
 
 int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
@@ -975,9 +1011,9 @@ static int xics_debug_show(struct seq_file *m, void *private)
  for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
  struct ics_irq_state *irq = &ics->irq_state[i];
 
- seq_printf(m, "irq 0x%06x: server %#x prio %#x save prio %#x asserted %d resend %d masked pending %d\n",
+ seq_printf(m, "irq 0x%06x: server %#x prio %#x save prio %#x pq_state %d resend %d masked pending %d\n",
    irq->number, irq->server, irq->priority,
-   irq->saved_priority, irq->asserted,
+   irq->saved_priority, irq->pq_state,
    irq->resend, irq->masked_pending);
 
  }
@@ -1196,10 +1232,17 @@ static int xics_get_source(struct kvmppc_xics *xics, long irq, u64 addr)
  val |= prio << KVM_XICS_PRIORITY_SHIFT;
  if (irqp->lsi) {
  val |= KVM_XICS_LEVEL_SENSITIVE;
- if (irqp->asserted)
+ if (irqp->pq_state & PQ_PRESENTED)
  val |= KVM_XICS_PENDING;
  } else if (irqp->masked_pending || irqp->resend)
  val |= KVM_XICS_PENDING;
+
+ if (irqp->pq_state & PQ_PRESENTED)
+ val |= KVM_XICS_PRESENTED;
+
+ if (irqp->pq_state & PQ_QUEUED)
+ val |= KVM_XICS_QUEUED;
+
  ret = 0;
  }
  arch_spin_unlock(&ics->lock);
@@ -1251,12 +1294,14 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
  irqp->resend = 0;
  irqp->masked_pending = 0;
  irqp->lsi = 0;
- irqp->asserted = 0;
- if (val & KVM_XICS_LEVEL_SENSITIVE) {
+ irqp->pq_state = 0;
+ if (val & KVM_XICS_LEVEL_SENSITIVE)
  irqp->lsi = 1;
- if (val & KVM_XICS_PENDING)
- irqp->asserted = 1;
- }
+ /* If PENDING, set P in case P is not saved because of old code */
+ if (val & KVM_XICS_PRESENTED || val & KVM_XICS_PENDING)
+ irqp->pq_state |= PQ_PRESENTED;
+ if (val & KVM_XICS_QUEUED)
+ irqp->pq_state |= PQ_QUEUED;
  irqp->exists = 1;
  arch_spin_unlock(&ics->lock);
  local_irq_restore(flags);
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 1d5fac8..ec5474c 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -31,16 +31,19 @@
 /* Priority value to use for disabling an interrupt */
 #define MASKED 0xff
 
+#define PQ_PRESENTED 1
+#define PQ_QUEUED 2
+
 /* State for one irq source */
 struct ics_irq_state {
  u32 number;
  u32 server;
+ u32 pq_state;
  u8  priority;
  u8  saved_priority;
  u8  resend;
  u8  masked_pending;
  u8  lsi; /* level-sensitive interrupt */
- u8  asserted; /* Only for LSI */
  u8  exists;
  int intr_cpu;
  u32 host_irq;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[PATCH 5/5] KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend

Tim Gardner-2
In reply to this post by Tim Gardner-2
From: Li Zhong <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1651248

This patch improves the code that takes lock twice to check the resend flag
and do the actual resending, by checking the resend flag locklessly, and
add a boolean parameter check_resend to icp_[rm_]deliver_irq(), so the
resend flag can be checked in the lock when doing the delivery.

We need make sure when we clear the ics's bit in the icp's resend_map, we
don't miss the resend flag of the irqs that set the bit. It could be
ordered through the barrier in test_and_clear_bit(), and a newly added
wmb between setting irq's resend flag, and icp's resend_map.

Signed-off-by: Li Zhong <[hidden email]>
Signed-off-by: Paul Mackerras <[hidden email]>
(cherry picked from linux-next commit 21acd0e4df04f02176e773468658c3cebff096bb)
Signed-off-by: Tim Gardner <[hidden email]>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 40 ++++++++++++------------
 arch/powerpc/kvm/book3s_xics.c       | 59 +++++++++++++++++-------------------
 2 files changed, 48 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 30f82c7..44cfdd2 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -35,7 +35,7 @@ int kvm_irq_bypass = 1;
 EXPORT_SYMBOL(kvm_irq_bypass);
 
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
-    u32 new_irq);
+    u32 new_irq, bool check_resend);
 static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu);
 
 /* -- ICS routines -- */
@@ -44,22 +44,12 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
 {
  int i;
 
- arch_spin_lock(&ics->lock);
-
  for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
  struct ics_irq_state *state = &ics->irq_state[i];
-
- if (!state->resend)
- continue;
-
- state->resend = 0;
-
- arch_spin_unlock(&ics->lock);
- icp_rm_deliver_irq(xics, icp, state->number);
- arch_spin_lock(&ics->lock);
+ if (state->resend)
+ icp_rm_deliver_irq(xics, icp, state->number, true);
  }
 
- arch_spin_unlock(&ics->lock);
 }
 
 /* -- ICP routines -- */
@@ -292,7 +282,7 @@ static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
 }
 
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
-    u32 new_irq)
+    u32 new_irq, bool check_resend)
 {
  struct ics_irq_state *state;
  struct kvmppc_ics *ics;
@@ -337,6 +327,10 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  }
  }
 
+ if (check_resend)
+ if (!state->resend)
+ goto out;
+
  /* Clear the resend bit of that interrupt */
  state->resend = 0;
 
@@ -384,6 +378,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  arch_spin_unlock(&ics->lock);
  icp->n_reject++;
  new_irq = reject;
+ check_resend = 0;
  goto again;
  }
  } else {
@@ -391,10 +386,16 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  * We failed to deliver the interrupt we need to set the
  * resend map bit and mark the ICS state as needing a resend
  */
- set_bit(ics->icsid, icp->resend_map);
  state->resend = 1;
 
  /*
+ * Make sure when checking resend, we don't miss the resend
+ * if resend_map bit is seen and cleared.
+ */
+ smp_wmb();
+ set_bit(ics->icsid, icp->resend_map);
+
+ /*
  * If the need_resend flag got cleared in the ICP some time
  * between icp_rm_try_to_deliver() atomic update and now, then
  * we know it might have missed the resend_map bit. So we
@@ -404,6 +405,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  if (!icp->state.need_resend) {
  state->resend = 0;
  arch_spin_unlock(&ics->lock);
+ check_resend = 0;
  goto again;
  }
  }
@@ -598,7 +600,7 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
  /* Handle reject in real mode */
  if (reject && reject != XICS_IPI) {
  this_icp->n_reject++;
- icp_rm_deliver_irq(xics, icp, reject);
+ icp_rm_deliver_irq(xics, icp, reject, false);
  }
 
  /* Handle resends in real mode */
@@ -666,7 +668,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
  */
  if (reject && reject != XICS_IPI) {
  icp->n_reject++;
- icp_rm_deliver_irq(xics, icp, reject);
+ icp_rm_deliver_irq(xics, icp, reject, false);
  }
  bail:
  return check_too_hard(xics, icp);
@@ -704,7 +706,7 @@ static int ics_rm_eoi(struct kvm_vcpu *vcpu, u32 irq)
  } while (cmpxchg(&state->pq_state, pq_old, pq_new) != pq_old);
 
  if (pq_new & PQ_PRESENTED)
- icp_rm_deliver_irq(xics, NULL, irq);
+ icp_rm_deliver_irq(xics, NULL, irq, false);
 
  if (!hlist_empty(&vcpu->kvm->irq_ack_notifier_list)) {
  icp->rm_action |= XICS_RM_NOTIFY_EOI;
@@ -874,7 +876,7 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
 
  /* Test P=1, Q=0, this is the only case where we present */
  if (pq_new == PQ_PRESENTED)
- icp_rm_deliver_irq(xics, icp, irq);
+ icp_rm_deliver_irq(xics, icp, irq, false);
 
  /* EOI the interrupt */
  icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index e9ba11e..c931fdc 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -63,7 +63,7 @@
 /* -- ICS routines -- */
 
 static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
-    u32 new_irq);
+    u32 new_irq, bool check_resend);
 
 /*
  * Return value ideally indicates how the interrupt was handled, but no
@@ -117,7 +117,7 @@ static int ics_deliver_irq(struct kvmppc_xics *xics, u32 irq, u32 level)
 
  /* Test P=1, Q=0, this is the only case where we present */
  if (pq_new == PQ_PRESENTED)
- icp_deliver_irq(xics, NULL, irq);
+ icp_deliver_irq(xics, NULL, irq, false);
 
  /* Record which CPU this arrived on for passed-through interrupts */
  if (state->host_irq)
@@ -131,31 +131,14 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 {
  int i;
 
- unsigned long flags;
-
- local_irq_save(flags);
- arch_spin_lock(&ics->lock);
-
  for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
  struct ics_irq_state *state = &ics->irq_state[i];
-
- if (!state->resend)
- continue;
-
- state->resend = 0;
-
- XICS_DBG("resend %#x prio %#x\n", state->number,
-      state->priority);
-
- arch_spin_unlock(&ics->lock);
- local_irq_restore(flags);
- icp_deliver_irq(xics, icp, state->number);
- local_irq_save(flags);
- arch_spin_lock(&ics->lock);
+ if (state->resend) {
+ XICS_DBG("resend %#x prio %#x\n", state->number,
+      state->priority);
+ icp_deliver_irq(xics, icp, state->number, true);
+ }
  }
-
- arch_spin_unlock(&ics->lock);
- local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -209,7 +192,7 @@ int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
  state->masked_pending, state->resend);
 
  if (write_xive(xics, ics, state, server, priority, priority))
- icp_deliver_irq(xics, icp, irq);
+ icp_deliver_irq(xics, icp, irq, false);
 
  return 0;
 }
@@ -262,7 +245,7 @@ int kvmppc_xics_int_on(struct kvm *kvm, u32 irq)
 
  if (write_xive(xics, ics, state, state->server, state->saved_priority,
        state->saved_priority))
- icp_deliver_irq(xics, icp, irq);
+ icp_deliver_irq(xics, icp, irq, false);
 
  return 0;
 }
@@ -396,7 +379,7 @@ static bool icp_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
 }
 
 static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
-    u32 new_irq)
+    u32 new_irq, bool check_resend)
 {
  struct ics_irq_state *state;
  struct kvmppc_ics *ics;
@@ -442,6 +425,10 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  }
  }
 
+ if (check_resend)
+ if (!state->resend)
+ goto out;
+
  /* Clear the resend bit of that interrupt */
  state->resend = 0;
 
@@ -490,6 +477,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  arch_spin_unlock(&ics->lock);
  local_irq_restore(flags);
  new_irq = reject;
+ check_resend = 0;
  goto again;
  }
  } else {
@@ -497,10 +485,16 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  * We failed to deliver the interrupt we need to set the
  * resend map bit and mark the ICS state as needing a resend
  */
- set_bit(ics->icsid, icp->resend_map);
  state->resend = 1;
 
  /*
+ * Make sure when checking resend, we don't miss the resend
+ * if resend_map bit is seen and cleared.
+ */
+ smp_wmb();
+ set_bit(ics->icsid, icp->resend_map);
+
+ /*
  * If the need_resend flag got cleared in the ICP some time
  * between icp_try_to_deliver() atomic update and now, then
  * we know it might have missed the resend_map bit. So we
@@ -511,6 +505,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
  state->resend = 0;
  arch_spin_unlock(&ics->lock);
  local_irq_restore(flags);
+ check_resend = 0;
  goto again;
  }
  }
@@ -702,7 +697,7 @@ static noinline int kvmppc_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 
  /* Handle reject */
  if (reject && reject != XICS_IPI)
- icp_deliver_irq(xics, icp, reject);
+ icp_deliver_irq(xics, icp, reject, false);
 
  /* Handle resend */
  if (resend)
@@ -782,7 +777,7 @@ static noinline void kvmppc_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
  * attempt (see comments in icp_deliver_irq).
  */
  if (reject && reject != XICS_IPI)
- icp_deliver_irq(xics, icp, reject);
+ icp_deliver_irq(xics, icp, reject, false);
 }
 
 static int ics_eoi(struct kvm_vcpu *vcpu, u32 irq)
@@ -818,7 +813,7 @@ static int ics_eoi(struct kvm_vcpu *vcpu, u32 irq)
  } while (cmpxchg(&state->pq_state, pq_old, pq_new) != pq_old);
 
  if (pq_new & PQ_PRESENTED)
- icp_deliver_irq(xics, icp, irq);
+ icp_deliver_irq(xics, icp, irq, false);
 
  kvm_notify_acked_irq(vcpu->kvm, 0, irq);
 
@@ -1307,7 +1302,7 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
  local_irq_restore(flags);
 
  if (val & KVM_XICS_PENDING)
- icp_deliver_irq(xics, NULL, irqp->number);
+ icp_deliver_irq(xics, NULL, irqp->number, false);
 
  return 0;
 }
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: Yakkety/Zesty SRU - Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs caused KVM host hung and all KVM guests down.

Stefan Bader-2
In reply to this post by Tim Gardner-2
On 08.02.2017 23:00, Tim Gardner wrote:

> https://bugs.launchpad.net/bugs/1651248
>
> [PATCH 1/5] KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT
> [PATCH 2/5] KVM: PPC: Book 3S: XICS: correct the real mode ICP
> [PATCH 3/5] KVM: PPC: Book 3S: XICS: Fix potential issue with
> [PATCH 4/5] KVM: PPC: Book 3S: XICS: Implement ICS P/Q states
> [PATCH 5/5] KVM: PPC: Book 3S: XICS: Don't lock twice when checking
>
> rtg
>
Though #5 and #6 do not necessarily look like bug fixes. At least it sounds like
the end result might be testable...

-Stefan


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

ACK: Yakkety/Zesty SRU - Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs caused KVM host hung and all KVM guests down.

brad.figg
In reply to this post by Tim Gardner-2
Reply | Threaded
Open this post in threaded view
|

APPLIED: Yakkety/Zesty SRU - Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs caused KVM host hung and all KVM guests down.

Thadeu Lima de Souza Cascardo-3
In reply to this post by Tim Gardner-2
Applied to yakkety master-next branch.

Thanks.
Cascardo.

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team