[SRU][FOCAL][PATCH v2 0/1] net/mlx5: Fix a race when moving command interface to polling mode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU][FOCAL][PATCH v2 0/1] net/mlx5: Fix a race when moving command interface to polling mode

William Breathitt Gray
Changes in v2:
 - Fix typo in commit "cherry pick" line

[Impact]

As part of driver unload, it destroys the commands EQ (via FW command).
As the commands EQ is destroyed, FW will not generate EQEs for any command
that driver sends afterwards. Driver should poll for later commands status.

Driver commands mode metadata is updated before the commands EQ is
actually destroyed. This can lead for double completion handle by the
driver (polling and interrupt), if a command is executed and completed by
FW after the mode was changed, but before the EQ was destroyed.

[Fix]

Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee
that only DESTROY_EQ command can be executed during this time period.

[Where problems could occur]

The scope of the changes in this patch is narrow: only the
destroy_async_eqs() function is touched. If a problem occurs, it will
occur during the small window when the driver unloads. Regression
potential is low however because only the DESTROY_EQ command should
execute during this time period.

Eran Ben Elisha (1):
  net/mlx5: Fix a race when moving command interface to polling mode

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 ++
 1 file changed, 2 insertions(+)

--
2.27.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][FOCAL][PATCH v2 1/1] net/mlx5: Fix a race when moving command interface to polling mode

William Breathitt Gray
From: Eran Ben Elisha <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1905574

As part of driver unload, it destroys the commands EQ (via FW command).
As the commands EQ is destroyed, FW will not generate EQEs for any command
that driver sends afterwards. Driver should poll for later commands status.

Driver commands mode metadata is updated before the commands EQ is
actually destroyed. This can lead for double completion handle by the
driver (polling and interrupt), if a command is executed and completed by
FW after the mode was changed, but before the EQ was destroyed.

Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee
that only DESTROY_EQ command can be executed during this time period.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <[hidden email]>
Reviewed-by: Moshe Shemesh <[hidden email]>
Signed-off-by: Saeed Mahameed <[hidden email]>
(cherry picked from commit 432161ea26d6d5e5c3f7306d9407d26ed1e1953e)
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 0a20938b4aad..938c4a46f9de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -695,8 +695,10 @@ static void destroy_async_eqs(struct mlx5_core_dev *dev)
 
  cleanup_async_eq(dev, &table->pages_eq, "pages");
  cleanup_async_eq(dev, &table->async_eq, "async");
+ mlx5_cmd_allowed_opcode(dev, MLX5_CMD_OP_DESTROY_EQ);
  mlx5_cmd_use_polling(dev);
  cleanup_async_eq(dev, &table->cmd_eq, "cmd");
+ mlx5_cmd_allowed_opcode(dev, CMD_ALLOWED_OPCODE_ALL);
  mlx5_eq_notifier_unregister(dev, &table->cq_err_nb);
 }
 
--
2.27.0


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK: [SRU][FOCAL][PATCH v2 0/1] net/mlx5: Fix a race when moving command interface to polling mode

Marcelo Henrique Cerri
In reply to this post by William Breathitt Gray
It looks good for me. Bionic is not affected and Groovy already has
the same fix.

Acked-by: Marcelo Henrique Cerri <[hidden email]>

On Tue, Jan 19, 2021 at 11:07:26AM -0500, William Breathitt Gray wrote:

> Changes in v2:
>  - Fix typo in commit "cherry pick" line
>
> [Impact]
>
> As part of driver unload, it destroys the commands EQ (via FW command).
> As the commands EQ is destroyed, FW will not generate EQEs for any command
> that driver sends afterwards. Driver should poll for later commands status.
>
> Driver commands mode metadata is updated before the commands EQ is
> actually destroyed. This can lead for double completion handle by the
> driver (polling and interrupt), if a command is executed and completed by
> FW after the mode was changed, but before the EQ was destroyed.
>
> [Fix]
>
> Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee
> that only DESTROY_EQ command can be executed during this time period.
>
> [Where problems could occur]
>
> The scope of the changes in this patch is narrow: only the
> destroy_async_eqs() function is touched. If a problem occurs, it will
> occur during the small window when the driver unloads. Regression
> potential is low however because only the DESTROY_EQ command should
> execute during this time period.
>
> Eran Ben Elisha (1):
>   net/mlx5: Fix a race when moving command interface to polling mode
>
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
--
Regards,
Marcelo


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (673 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED/ACK: [SRU][FOCAL][PATCH v2 0/1] net/mlx5: Fix a race when moving command interface to polling mode

Kelsey Skunberg
In reply to this post by William Breathitt Gray

Applied to Focal/master-next. Thank you!

Acked-by: Kelsey Skunberg <[hidden email]>

On 2021-01-19 11:07:26 , William Breathitt Gray wrote:

> Changes in v2:
>  - Fix typo in commit "cherry pick" line
>
> [Impact]
>
> As part of driver unload, it destroys the commands EQ (via FW command).
> As the commands EQ is destroyed, FW will not generate EQEs for any command
> that driver sends afterwards. Driver should poll for later commands status.
>
> Driver commands mode metadata is updated before the commands EQ is
> actually destroyed. This can lead for double completion handle by the
> driver (polling and interrupt), if a command is executed and completed by
> FW after the mode was changed, but before the EQ was destroyed.
>
> [Fix]
>
> Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee
> that only DESTROY_EQ command can be executed during this time period.
>
> [Where problems could occur]
>
> The scope of the changes in this patch is narrow: only the
> destroy_async_eqs() function is touched. If a problem occurs, it will
> occur during the small window when the driver unloads. Regression
> potential is low however because only the DESTROY_EQ command should
> execute during this time period.
>
> Eran Ben Elisha (1):
>   net/mlx5: Fix a race when moving command interface to polling mode
>
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team