Quantcast

[SRU][Xenial][PATCH 0/4] net/mlx4_core: Fixes for LP:1650058

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[SRU][Xenial][PATCH 0/4] net/mlx4_core: Fixes for LP:1650058

Joseph Salisbury-3
BugLink: http://bugs.launchpad.net/bugs/1650058

== SRU Justification ==
In order to have the correct VF driver to support SR-IOV in Azure, three
Mellanox patches are needed in Xenial.  The three commits are included in
mainline as of the following versions:

d585df1c5ccf - v4.10-rc7
7c3945bc2073 - v4.10-rc5
291c566a2891 - v4.10-rc5
6496bbf0ec48 - v4.10-rc3

Only 64-bit support for Ubuntu 16.04's HWE kernel is needed.


== Fixes ==
commit d585df1c5ccf995fcee910705ad7a9cdd11d4152
Author: Jack Morgenstein <[hidden email]>
Date:   Mon Jan 30 15:11:45 2017 +0200

    net/mlx4_core: Avoid command timeouts during VF driver device shutdown

commit 7c3945bc2073554bb2ecf983e073dee686679c53
Author: Jack Morgenstein <[hidden email]>
Date:   Mon Jan 16 18:31:38 2017 +0200

    net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions

commit 291c566a28910614ce42d0ffe82196eddd6346f4
Author: Jack Morgenstein <[hidden email]>
Date:   Mon Jan 16 18:31:37 2017 +0200

    net/mlx4_core: Fix racy CQ (Completion Queue) free

commit 6496bbf0ec481966ef9ffe5b6660d8d1b55c60cc
Author: Eugenia Emantayev <[hidden email]>
Date:   Thu Dec 29 18:37:10 2016 +0200

    net/mlx4_en: Fix bad WQE issue





Eugenia Emantayev (1):
  net/mlx4_en: Fix bad WQE issue

Jack Morgenstein (3):
  net/mlx4_core: Fix racy CQ (Completion Queue) free
  net/mlx4_core: Fix when to save some qp context flags for dynamic VST
    to VGT transitions
  net/mlx4_core: Avoid command timeouts during VF driver device shutdown

 drivers/net/ethernet/mellanox/mlx4/catas.c         |  2 +-
 drivers/net/ethernet/mellanox/mlx4/cq.c            | 38 ++++++++++++----------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |  8 ++++-
 drivers/net/ethernet/mellanox/mlx4/intf.c          | 12 +++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |  1 +
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  5 +--
 6 files changed, 44 insertions(+), 22 deletions(-)

--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[SRU][Xenial][PATCH 1/4] net/mlx4_en: Fix bad WQE issue

Joseph Salisbury-3
From: Eugenia Emantayev <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1650058

Single send WQE in RX buffer should be stamped with software
ownership in order to prevent the flow of QP in error in FW
once UPDATE_QP is called.

Fixes: 9f519f68cfff ('mlx4_en: Not using Shared Receive Queues')
Signed-off-by: Eugenia Emantayev <[hidden email]>
Signed-off-by: Tariq Toukan <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 6496bbf0ec481966ef9ffe5b6660d8d1b55c60cc)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index bbff8ec..3d7c597 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -439,8 +439,14 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
  ring->cqn = priv->rx_cq[ring_ind]->mcq.cqn;
 
  ring->stride = stride;
- if (ring->stride <= TXBB_SIZE)
+ if (ring->stride <= TXBB_SIZE) {
+ /* Stamp first unused send wqe */
+ __be32 *ptr = (__be32 *)ring->buf;
+ __be32 stamp = cpu_to_be32(1 << STAMP_SHIFT);
+ *ptr = stamp;
+ /* Move pointer to start of rx section */
  ring->buf += TXBB_SIZE;
+ }
 
  ring->log_stride = ffs(ring->stride) - 1;
  ring->buf_size = ring->size * ring->stride;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[SRU][Xenial][PATCH 2/4] net/mlx4_core: Fix racy CQ (Completion Queue) free

Joseph Salisbury-3
In reply to this post by Joseph Salisbury-3
From: Jack Morgenstein <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1650058

In function mlx4_cq_completion() and mlx4_cq_event(), the
radix_tree_lookup requires a rcu_read_lock.
This is mandatory: if another core frees the CQ, it could
run the radix_tree_node_rcu_free() call_rcu() callback while
its being used by the radix tree lookup function.

Additionally, in function mlx4_cq_event(), since we are adding
the rcu lock around the radix-tree lookup, we no longer need to take
the spinlock. Also, the synchronize_irq() call for the async event
eliminates the need for incrementing the cq reference count in
mlx4_cq_event().

Other changes:
1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
   we no longer take this spinlock in the interrupt context.
   The spinlock here, therefore, simply protects against different
   threads simultaneously invoking mlx4_cq_free() for different cq's.

2. In function mlx4_cq_free(), we move the radix tree delete to before
   the synchronize_irq() calls. This guarantees that we will not
   access this cq during any subsequent interrupts, and therefore can
   safely free the CQ after the synchronize_irq calls. The rcu_read_lock
   in the interrupt handlers only needs to protect against corrupting the
   radix tree; the interrupt handlers may access the cq outside the
   rcu_read_lock due to the synchronize_irq calls which protect against
   premature freeing of the cq.

3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.

4. We leave the cq reference count mechanism in place, because it is
   still needed for the cq completion tasklet mechanism.

Fixes: 6d90aa5cf17b ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <[hidden email]>
Signed-off-by: Matan Barak <[hidden email]>
Signed-off-by: Tariq Toukan <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 291c566a28910614ce42d0ffe82196eddd6346f4)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 drivers/net/ethernet/mellanox/mlx4/cq.c | 38 +++++++++++++++++----------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
index a849da9..6b86353 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -101,13 +101,19 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn)
 {
  struct mlx4_cq *cq;
 
+ rcu_read_lock();
  cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree,
        cqn & (dev->caps.num_cqs - 1));
+ rcu_read_unlock();
+
  if (!cq) {
  mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn);
  return;
  }
 
+ /* Acessing the CQ outside of rcu_read_lock is safe, because
+ * the CQ is freed only after interrupt handling is completed.
+ */
  ++cq->arm_sn;
 
  cq->comp(cq);
@@ -118,23 +124,19 @@ void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)
  struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table;
  struct mlx4_cq *cq;
 
- spin_lock(&cq_table->lock);
-
+ rcu_read_lock();
  cq = radix_tree_lookup(&cq_table->tree, cqn & (dev->caps.num_cqs - 1));
- if (cq)
- atomic_inc(&cq->refcount);
-
- spin_unlock(&cq_table->lock);
+ rcu_read_unlock();
 
  if (!cq) {
- mlx4_warn(dev, "Async event for bogus CQ %08x\n", cqn);
+ mlx4_dbg(dev, "Async event for bogus CQ %08x\n", cqn);
  return;
  }
 
+ /* Acessing the CQ outside of rcu_read_lock is safe, because
+ * the CQ is freed only after interrupt handling is completed.
+ */
  cq->event(cq, event_type);
-
- if (atomic_dec_and_test(&cq->refcount))
- complete(&cq->free);
 }
 
 static int mlx4_SW2HW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
@@ -301,9 +303,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent,
  if (err)
  return err;
 
- spin_lock_irq(&cq_table->lock);
+ spin_lock(&cq_table->lock);
  err = radix_tree_insert(&cq_table->tree, cq->cqn, cq);
- spin_unlock_irq(&cq_table->lock);
+ spin_unlock(&cq_table->lock);
  if (err)
  goto err_icm;
 
@@ -349,9 +351,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent,
  return 0;
 
 err_radix:
- spin_lock_irq(&cq_table->lock);
+ spin_lock(&cq_table->lock);
  radix_tree_delete(&cq_table->tree, cq->cqn);
- spin_unlock_irq(&cq_table->lock);
+ spin_unlock(&cq_table->lock);
 
 err_icm:
  mlx4_cq_free_icm(dev, cq->cqn);
@@ -370,15 +372,15 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq)
  if (err)
  mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn);
 
+ spin_lock(&cq_table->lock);
+ radix_tree_delete(&cq_table->tree, cq->cqn);
+ spin_unlock(&cq_table->lock);
+
  synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq);
  if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq !=
     priv->eq_table.eq[MLX4_EQ_ASYNC].irq)
  synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq);
 
- spin_lock_irq(&cq_table->lock);
- radix_tree_delete(&cq_table->tree, cq->cqn);
- spin_unlock_irq(&cq_table->lock);
-
  if (atomic_dec_and_test(&cq->refcount))
  complete(&cq->free);
  wait_for_completion(&cq->free);
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[SRU][Xenial][PATCH 3/4] net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions

Joseph Salisbury-3
In reply to this post by Joseph Salisbury-3
From: Jack Morgenstein <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1650058

Save the qp context flags byte containing the flag disabling vlan stripping
in the RESET to INIT qp transition, rather than in the INIT to RTR
transition. Per the firmware spec, the flags in this byte are active
in the RESET to INIT transition.

As a result of saving the flags in the incorrect qp transition, when
switching dynamically from VGT to VST and back to VGT, the vlan
remained stripped (as is required for VST) and did not return to
not-stripped (as is required for VGT).

Fixes: f0f829bf42cd ("net/mlx4_core: Add immediate activate for VGT->VST->VGT")
Signed-off-by: Jack Morgenstein <[hidden email]>
Signed-off-by: Tariq Toukan <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 7c3945bc2073554bb2ecf983e073dee686679c53)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index d314d96..d1fc7fa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -2955,6 +2955,9 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4_dev *dev, int slave,
  put_res(dev, slave, srqn, RES_SRQ);
  qp->srq = srq;
  }
+
+ /* Save param3 for dynamic changes from VST back to VGT */
+ qp->param3 = qpc->param3;
  put_res(dev, slave, rcqn, RES_CQ);
  put_res(dev, slave, mtt_base, RES_MTT);
  res_end_move(dev, slave, RES_QP, qpn);
@@ -3747,7 +3750,6 @@ int mlx4_INIT2RTR_QP_wrapper(struct mlx4_dev *dev, int slave,
  int qpn = vhcr->in_modifier & 0x7fffff;
  struct res_qp *qp;
  u8 orig_sched_queue;
- __be32 orig_param3 = qpc->param3;
  u8 orig_vlan_control = qpc->pri_path.vlan_control;
  u8 orig_fvl_rx = qpc->pri_path.fvl_rx;
  u8 orig_pri_path_fl = qpc->pri_path.fl;
@@ -3789,7 +3791,6 @@ out:
  */
  if (!err) {
  qp->sched_queue = orig_sched_queue;
- qp->param3 = orig_param3;
  qp->vlan_control = orig_vlan_control;
  qp->fvl_rx =  orig_fvl_rx;
  qp->pri_path_fl = orig_pri_path_fl;
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[SRU][Xenial][PATCH 4/4] net/mlx4_core: Avoid command timeouts during VF driver device shutdown

Joseph Salisbury-3
In reply to this post by Joseph Salisbury-3
From: Jack Morgenstein <[hidden email]>

BugLink: http://bugs.launchpad.net/bugs/1650058

Some Hypervisors detach VFs from VMs by instantly causing an FLR event
to be generated for a VF.

In the mlx4 case, this will cause that VF's comm channel to be disabled
before the VM has an opportunity to invoke the VF device's "shutdown"
method.

The result is that the VF driver on the VM will experience a command
timeout during the shutdown process when the Hypervisor does not deliver
a command-completion event to the VM.

To avoid FW command timeouts on the VM when the driver's shutdown method
is invoked, we detect the absence of the VF's comm channel at the very
start of the shutdown process. If the comm-channel has already been
disabled, we cause all FW commands during the device shutdown process to
immediately return success (and thus avoid all command timeouts).

Signed-off-by: Jack Morgenstein <[hidden email]>
Signed-off-by: Tariq Toukan <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit d585df1c5ccf995fcee910705ad7a9cdd11d4152)
Signed-off-by: Joseph Salisbury <[hidden email]>
---
 drivers/net/ethernet/mellanox/mlx4/catas.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/intf.c  | 12 ++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |  1 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c
index 715de8a..e203d0c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -158,7 +158,7 @@ static int mlx4_reset_slave(struct mlx4_dev *dev)
  return -ETIMEDOUT;
 }
 
-static int mlx4_comm_internal_err(u32 slave_read)
+int mlx4_comm_internal_err(u32 slave_read)
 {
  return (u32)COMM_CHAN_EVENT_INTERNAL_ERR ==
  (slave_read & (u32)COMM_CHAN_EVENT_INTERNAL_ERR) ? 1 : 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 0472941..1a134e0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -218,6 +218,18 @@ void mlx4_unregister_device(struct mlx4_dev *dev)
  struct mlx4_interface *intf;
 
  mlx4_stop_catas_poll(dev);
+ if (dev->persist->interface_state & MLX4_INTERFACE_STATE_DELETION &&
+    mlx4_is_slave(dev)) {
+ /* In mlx4_remove_one on a VF */
+ u32 slave_read =
+ swab32(readl(&mlx4_priv(dev)->mfunc.comm->slave_read));
+
+ if (mlx4_comm_internal_err(slave_read)) {
+ mlx4_dbg(dev, "%s: comm channel is down, entering error state.\n",
+ __func__);
+ mlx4_enter_error_state(dev->persist);
+ }
+ }
  mutex_lock(&intf_mutex);
 
  list_for_each_entry(intf, &intf_list, list)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index e1cf903..f5fdbd5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -1205,6 +1205,7 @@ void mlx4_qp_event(struct mlx4_dev *dev, u32 qpn, int event_type);
 void mlx4_srq_event(struct mlx4_dev *dev, u32 srqn, int event_type);
 
 void mlx4_enter_error_state(struct mlx4_dev_persistent *persist);
+int mlx4_comm_internal_err(u32 slave_read);
 
 int mlx4_SENSE_PORT(struct mlx4_dev *dev, int port,
     enum mlx4_port_type *type);
--
2.7.4


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ACK: [SRU][Xenial][PATCH 0/4] net/mlx4_core: Fixes for LP:1650058

Tim Gardner-2
In reply to this post by Joseph Salisbury-3
Clean cherry-picks requested by the vendor. Isolated to one driver.

--
Tim Gardner [hidden email]

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ACK: [SRU][Xenial][PATCH 0/4] net/mlx4_core: Fixes for LP:1650058

Brad Figg-2
In reply to this post by Joseph Salisbury-3
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

APPLIED: [SRU][Xenial][PATCH 0/4] net/mlx4_core: Fixes for LP:1650058

Thadeu Lima de Souza Cascardo-3
In reply to this post by Joseph Salisbury-3
Applied to xenial master-next branch.

Thanks.
Cascardo.

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Loading...