[SRU][B:linux-azure-4.15][PATCH 00/40] hv_netvsc: Add XDP support

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 00/40] hv_netvsc: Add XDP support

William Breathitt Gray
BugLink: https://bugs.launchpad.net/bugs/1877654

[Impact]

Microsoft would like to request the following three patches in all releases supported on Azure:

351e1581395fc (“hv_netvsc: Add XDP support”)
12fa74383ed4d (“hv_netvsc: Update document for XDP support”)
184367dce4f7 (“hv_netvsc: Fix XDP refcnt for synthetic and VF NICs”)

These patches add support of XDP in native mode to the hv_netvsc driver, and
transparently sets the XDP program on the associated VF NIC as well.

[Regression Potential]

The backport to Bionic:linux-azure-4.15 required a lot of context
adjustments. Regression potentials are spread across a number of files
and drivers. Most adjustments were trivial, but the changes in
netvsc_drv.c should be kept an eye on. In particular, if a regression
occurs it will likely happen in the netvsc_alloc_recv_skb(),
netvsc_recv_callback(), or netvsc_devinfo_get() functions due to the
context difference of these areas compared to the original patch.

In general, Linux 4.15 was missing key code for XDP. As such, many
prelimary commit backports were required to pull in that support before
it could be added to hv_netvsc driver in the primary backports.

[Miscellaneous]

The 5.4 version is already released so we do not need to worry about
that series.

Backports of the "xdp: new XDP rx-queue info concept" patchset and "XDP
redirect memory return API" patchset patches are included to add support
for the XDP API:
https://patchwork.ozlabs.org/project/netdev/cover/151497504273.18176.10177133999720101758.stgit@firesoul/
https://patchwork.ozlabs.org/project/netdev/cover/152397622657.20272.10121948713784224943.stgit@firesoul/

Björn Töpel (5):
  i40e: add support for XDP_REDIRECT
  xsk: add user memory registration support sockopt
  xsk: add Rx queue setup and mmap support
  xsk: add Rx receive functions and poll support
  xdp: add MEM_TYPE_ZERO_COPY

Haiyang Zhang (5):
  hv_netvsc: Add support for LRO/RSC in the vSwitch
  hv_netvsc: Refactor assignments of struct netvsc_device_info
  hv_netvsc: Add XDP support
  hv_netvsc: Update document for XDP support
  hv_netvsc: Fix XDP refcnt for synthetic and VF NICs

Jason Wang (3):
  tun/tap: use ptr_ring instead of skb_array
  tuntap: XDP transmission
  tuntap: XDP_TX can use native XDP

Jesper Dangaard Brouer (24):
  xdp: base API for new XDP rx-queue info concept
  ixgbe: setup xdp_rxq_info
  xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_reg
  tun: setup xdp_rxq_info
  virtio_net: setup xdp_rxq_info
  xdp: generic XDP handling of xdp_rxq_info
  net: avoid including xdp.h in filter.h
  virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP
  xdp: introduce xdp_return_frame API and use in cpumap
  ixgbe: use xdp_return_frame API
  xdp: move struct xdp_buff from filter.h to xdp.h
  xdp: introduce a new xdp_frame type
  tun: convert to use generic xdp_frame and xdp_return_frame API
  virtio_net: convert to use generic xdp_frame and xdp_return_frame API
  bpf: cpumap convert to use generic xdp_frame
  i40e: convert to use generic xdp_frame and xdp_return_frame API
  xdp: rhashtable with allocator ID to pointer mapping
  page_pool: refurbish version of page_pool code
  xdp: allow page_pool as an allocator type in xdp_return_frame
  xdp: transition into using xdp_frame for return API
  xdp: transition into using xdp_frame for ndo_xdp_xmit
  bpf: devmap introduce dev_map_enqueue
  bpf: devmap prepare xdp frames for bulking
  xdp: introduce xdp_return_frame_rx_napi

Magnus Karlsson (2):
  xsk: add umem fill queue support and mmap
  xsk: add support for bind for Rx

Stephen Hemminger (1):
  hv_netvsc: pass netvsc_device to receive callback

 Documentation/networking/netvsc.txt           |  21 +
 debian.azure-4.15/config/config.common.ubuntu |   1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   2 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 103 +++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   3 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   5 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |   4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  48 +-
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |   1 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   1 +
 drivers/net/ethernet/qlogic/qede/qede.h       |   2 +
 drivers/net/ethernet/qlogic/qede/qede_fp.c    |   1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c  |  10 +
 drivers/net/hyperv/Makefile                   |   2 +-
 drivers/net/hyperv/hyperv_net.h               |  70 ++-
 drivers/net/hyperv/netvsc.c                   |  48 +-
 drivers/net/hyperv/netvsc_bpf.c               | 218 ++++++++
 drivers/net/hyperv/netvsc_drv.c               | 335 ++++++++----
 drivers/net/hyperv/rndis_filter.c             |  96 +++-
 drivers/net/tap.c                             |  42 +-
 drivers/net/tun.c                             | 291 +++++++++--
 drivers/net/virtio_net.c                      |  89 +++-
 drivers/vhost/net.c                           |  53 +-
 include/linux/bpf.h                           |  16 +-
 include/linux/filter.h                        |  24 +-
 include/linux/if_tap.h                        |   6 +-
 include/linux/if_tun.h                        |  21 +-
 include/linux/netdevice.h                     |   6 +-
 include/net/page_pool.h                       | 144 +++++
 include/net/xdp.h                             | 143 +++++
 include/net/xdp_sock.h                        |  58 ++
 include/trace/events/xdp.h                    |   9 +-
 include/uapi/linux/if_xdp.h                   |  76 +++
 kernel/bpf/cpumap.c                           | 132 ++---
 kernel/bpf/devmap.c                           | 103 +++-
 net/Kconfig                                   |   3 +
 net/Makefile                                  |   1 +
 net/core/Makefile                             |   3 +-
 net/core/dev.c                                |  69 ++-
 net/core/filter.c                             |  14 +-
 net/core/page_pool.c                          | 317 +++++++++++
 net/core/xdp.c                                | 372 +++++++++++++
 net/xdp/Makefile                              |   2 +
 net/xdp/xdp_umem.c                            | 255 +++++++++
 net/xdp/xdp_umem.h                            |  66 +++
 net/xdp/xdp_umem_props.h                      |  23 +
 net/xdp/xsk.c                                 | 494 ++++++++++++++++++
 net/xdp/xsk_queue.c                           |  73 +++
 net/xdp/xsk_queue.h                           | 151 ++++++
 49 files changed, 3610 insertions(+), 417 deletions(-)
 create mode 100644 drivers/net/hyperv/netvsc_bpf.c
 create mode 100644 include/net/page_pool.h
 create mode 100644 include/net/xdp.h
 create mode 100644 include/net/xdp_sock.h
 create mode 100644 include/uapi/linux/if_xdp.h
 create mode 100644 net/core/page_pool.c
 create mode 100644 net/core/xdp.c
 create mode 100644 net/xdp/Makefile
 create mode 100644 net/xdp/xdp_umem.c
 create mode 100644 net/xdp/xdp_umem.h
 create mode 100644 net/xdp/xdp_umem_props.h
 create mode 100644 net/xdp/xsk.c
 create mode 100644 net/xdp/xsk_queue.c
 create mode 100644 net/xdp/xsk_queue.h

--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 01/40] hv_netvsc: pass netvsc_device to receive callback

William Breathitt Gray
From: Stephen Hemminger <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

The netvsc_receive_callback function was using RCU to find the
appropriate underlying netvsc_device. Since calling function already
had that pointer, this was unnecessary.

Signed-off-by: Stephen Hemminger <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit 345ac08990b8365294f9756da806f357c239d758)
[ vilhelmgray: context adjustments ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/hyperv/hyperv_net.h   |  1 +
 drivers/net/hyperv/netvsc_drv.c   | 13 ++-----------
 drivers/net/hyperv/rndis_filter.c |  3 ++-
 3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 83e040359037..5d1c91500df0 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -202,6 +202,7 @@ int netvsc_send(struct net_device *net,
 void netvsc_linkstatus_callback(struct net_device *net,
  struct rndis_message *resp);
 int netvsc_recv_callback(struct net_device *net,
+ struct netvsc_device *nvdev,
  struct vmbus_channel *channel,
  void  *data, u32 len,
  const struct ndis_tcp_ip_checksum_info *csum_info,
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 74e00120850c..3f1b3e6ae4aa 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -831,33 +831,25 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
  * "wire" on the specified device.
  */
 int netvsc_recv_callback(struct net_device *net,
+ struct netvsc_device *net_device,
  struct vmbus_channel *channel,
  void  *data, u32 len,
  const struct ndis_tcp_ip_checksum_info *csum_info,
  const struct ndis_pkt_8021q_info *vlan)
 {
  struct net_device_context *net_device_ctx = netdev_priv(net);
- struct netvsc_device *net_device;
  u16 q_idx = channel->offermsg.offer.sub_channel_index;
- struct netvsc_channel *nvchan;
+ struct netvsc_channel *nvchan = &net_device->chan_table[q_idx];
  struct sk_buff *skb;
  struct netvsc_stats *rx_stats;
 
  if (net->reg_state != NETREG_REGISTERED)
  return NVSP_STAT_FAIL;
 
- rcu_read_lock();
- net_device = rcu_dereference(net_device_ctx->nvdev);
- if (unlikely(!net_device))
- goto drop;
-
- nvchan = &net_device->chan_table[q_idx];
-
  /* Allocate a skb - TODO direct I/O to pages? */
  skb = netvsc_alloc_recv_skb(net, &nvchan->napi,
     csum_info, vlan, data, len);
  if (unlikely(!skb)) {
-drop:
  ++net->stats.rx_dropped;
  rcu_read_unlock();
  return NVSP_STAT_FAIL;
@@ -882,7 +874,6 @@ int netvsc_recv_callback(struct net_device *net,
  u64_stats_update_end(&rx_stats->syncp);
 
  napi_gro_receive(&nvchan->napi, skb);
- rcu_read_unlock();
 
  return NVSP_STAT_SUCCESS;
 }
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 29e8741e1891..4d2eceb3a694 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -397,7 +397,8 @@ static int rndis_filter_receive_data(struct net_device *ndev,
  */
  data = (void *)((unsigned long)data + data_offset);
  csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO);
- return netvsc_recv_callback(ndev, channel,
+
+ return netvsc_recv_callback(ndev, nvdev, channel,
     data, rndis_pkt->data_len,
     csum_info, vlan);
 }
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 02/40] xdp: base API for new XDP rx-queue info concept

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

This patch only introduce the core data structures and API functions.
All XDP enabled drivers must use the API before this info can used.

There is a need for XDP to know more about the RX-queue a given XDP
frames have arrived on.  For both the XDP bpf-prog and kernel side.

Instead of extending xdp_buff each time new info is needed, the patch
creates a separate read-mostly struct xdp_rxq_info, that contains this
info.  We stress this data/cache-line is for read-only info.  This is
NOT for dynamic per packet info, use the data_meta for such use-cases.

The performance advantage is this info can be setup at RX-ring init
time, instead of updating N-members in xdp_buff.  A possible (driver
level) micro optimization is that xdp_buff->rxq assignment could be
done once per XDP/NAPI loop.  The extra pointer deref only happens for
program needing access to this info (thus, no slowdown to existing
use-cases).

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: Alexei Starovoitov <[hidden email]>
(backported from commit aecd67b60722dd24353b0bc50e78a55b30707dcd)
[ vilhelmgray: context adjustment ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 include/linux/filter.h |  2 ++
 include/net/xdp.h      | 47 +++++++++++++++++++++++++++++
 net/core/Makefile      |  2 +-
 net/core/xdp.c         | 67 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 117 insertions(+), 1 deletion(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 net/core/xdp.c

diff --git a/include/linux/filter.h b/include/linux/filter.h
index baec2c269602..158fb795cba7 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -19,6 +19,7 @@
 #include <linux/cryptohash.h>
 #include <linux/set_memory.h>
 
+#include <net/xdp.h>
 #include <net/sch_generic.h>
 
 #include <uapi/linux/filter.h>
@@ -493,6 +494,7 @@ struct xdp_buff {
  void *data_end;
  void *data_meta;
  void *data_hard_start;
+ struct xdp_rxq_info *rxq;
 };
 
 /* Compute the linear packet data range [data, data_end) which
diff --git a/include/net/xdp.h b/include/net/xdp.h
new file mode 100644
index 000000000000..86c41631a908
--- /dev/null
+++ b/include/net/xdp.h
@@ -0,0 +1,47 @@
+/* include/net/xdp.h
+ *
+ * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
+ * Released under terms in GPL version 2.  See COPYING.
+ */
+#ifndef __LINUX_NET_XDP_H__
+#define __LINUX_NET_XDP_H__
+
+/**
+ * DOC: XDP RX-queue information
+ *
+ * The XDP RX-queue info (xdp_rxq_info) is associated with the driver
+ * level RX-ring queues.  It is information that is specific to how
+ * the driver have configured a given RX-ring queue.
+ *
+ * Each xdp_buff frame received in the driver carry a (pointer)
+ * reference to this xdp_rxq_info structure.  This provides the XDP
+ * data-path read-access to RX-info for both kernel and bpf-side
+ * (limited subset).
+ *
+ * For now, direct access is only safe while running in NAPI/softirq
+ * context.  Contents is read-mostly and must not be updated during
+ * driver NAPI/softirq poll.
+ *
+ * The driver usage API is a register and unregister API.
+ *
+ * The struct is not directly tied to the XDP prog.  A new XDP prog
+ * can be attached as long as it doesn't change the underlying
+ * RX-ring.  If the RX-ring does change significantly, the NIC driver
+ * naturally need to stop the RX-ring before purging and reallocating
+ * memory.  In that process the driver MUST call unregistor (which
+ * also apply for driver shutdown and unload).  The register API is
+ * also mandatory during RX-ring setup.
+ */
+
+struct xdp_rxq_info {
+ struct net_device *dev;
+ u32 queue_index;
+ u32 reg_state;
+} ____cacheline_aligned; /* perf critical, avoid false-sharing */
+
+int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
+     struct net_device *dev, u32 queue_index);
+void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
+void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
+
+#endif /* __LINUX_NET_XDP_H__ */
diff --git a/net/core/Makefile b/net/core/Makefile
index 1fd0a9c88b1b..6dbbba8c57ae 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -11,7 +11,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 obj-y     += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \
  neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
  sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
- fib_notifier.o
+ fib_notifier.o xdp.o
 
 obj-y += net-sysfs.o
 obj-$(CONFIG_PROC_FS) += net-procfs.o
diff --git a/net/core/xdp.c b/net/core/xdp.c
new file mode 100644
index 000000000000..229bc5a0ee04
--- /dev/null
+++ b/net/core/xdp.c
@@ -0,0 +1,67 @@
+/* net/core/xdp.c
+ *
+ * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc.
+ * Released under terms in GPL version 2.  See COPYING.
+ */
+#include <linux/types.h>
+#include <linux/mm.h>
+
+#include <net/xdp.h>
+
+#define REG_STATE_NEW 0x0
+#define REG_STATE_REGISTERED 0x1
+#define REG_STATE_UNREGISTERED 0x2
+#define REG_STATE_UNUSED 0x3
+
+void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq)
+{
+ /* Simplify driver cleanup code paths, allow unreg "unused" */
+ if (xdp_rxq->reg_state == REG_STATE_UNUSED)
+ return;
+
+ WARN(!(xdp_rxq->reg_state == REG_STATE_REGISTERED), "Driver BUG");
+
+ xdp_rxq->reg_state = REG_STATE_UNREGISTERED;
+ xdp_rxq->dev = NULL;
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg);
+
+static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
+{
+ memset(xdp_rxq, 0, sizeof(*xdp_rxq));
+}
+
+/* Returns 0 on success, negative on failure */
+int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
+     struct net_device *dev, u32 queue_index)
+{
+ if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
+ WARN(1, "Driver promised not to register this");
+ return -EINVAL;
+ }
+
+ if (xdp_rxq->reg_state == REG_STATE_REGISTERED) {
+ WARN(1, "Missing unregister, handled but fix driver");
+ xdp_rxq_info_unreg(xdp_rxq);
+ }
+
+ if (!dev) {
+ WARN(1, "Missing net_device from driver");
+ return -ENODEV;
+ }
+
+ /* State either UNREGISTERED or NEW */
+ xdp_rxq_info_init(xdp_rxq);
+ xdp_rxq->dev = dev;
+ xdp_rxq->queue_index = queue_index;
+
+ xdp_rxq->reg_state = REG_STATE_REGISTERED;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_reg);
+
+void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq)
+{
+ xdp_rxq->reg_state = REG_STATE_UNUSED;
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_unused);
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 03/40] ixgbe: setup xdp_rxq_info

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

Driver hook points for xdp_rxq_info:
 * reg  : ixgbe_setup_rx_resources()
 * unreg: ixgbe_free_rx_resources()

Tested on actual hardware.

V2: Fix ixgbe_set_ringparam, clear xdp_rxq_info in temp_ring

Cc: [hidden email]
Cc: Jeff Kirsher <[hidden email]>
Cc: Alexander Duyck <[hidden email]>
Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Acked-by: John Fastabend <[hidden email]>
Signed-off-by: Alexei Starovoitov <[hidden email]>
(backported from commit 99ffc5ade4e8703c3bc56fa6bb8e25437da09ee9)
[ vilhelmgray: context adjustment ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |  2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |  4 ++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 10 +++++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 468c3555a629..8611763d6129 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -53,6 +53,7 @@
 #include <linux/dca.h>
 #endif
 
+#include <net/xdp.h>
 #include <net/busy_poll.h>
 
 /* common prefix used by pr_<> macros */
@@ -371,6 +372,7 @@ struct ixgbe_ring {
  struct ixgbe_tx_queue_stats tx_stats;
  struct ixgbe_rx_queue_stats rx_stats;
  };
+ struct xdp_rxq_info xdp_rxq;
 } ____cacheline_internodealigned_in_smp;
 
 enum ixgbe_ring_f_enum {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 372835dc144c..3decd446524c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1156,6 +1156,10 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
  memcpy(&temp_ring[i], adapter->rx_ring[i],
        sizeof(struct ixgbe_ring));
 
+ /* Clear copied XDP RX-queue info */
+ memset(&temp_ring[i].xdp_rxq, 0,
+       sizeof(temp_ring[i].xdp_rxq));
+
  temp_ring[i].count = new_rx_count;
  err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]);
  if (err) {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b2826f7e945c..49028d005eae 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2330,12 +2330,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 #endif /* IXGBE_FCOE */
  u16 cleaned_count = ixgbe_desc_unused(rx_ring);
  unsigned int xdp_xmit = 0;
+ struct xdp_buff xdp;
+
+ xdp.rxq = &rx_ring->xdp_rxq;
 
  while (likely(total_rx_packets < budget)) {
  union ixgbe_adv_rx_desc *rx_desc;
  struct ixgbe_rx_buffer *rx_buffer;
  struct sk_buff *skb;
- struct xdp_buff xdp;
  unsigned int size;
 
  /* return some buffers to hardware, one at a time is too slow */
@@ -6499,6 +6501,11 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
  rx_ring->next_to_clean = 0;
  rx_ring->next_to_use = 0;
 
+ /* XDP RX-queue info */
+ if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
+     rx_ring->queue_index) < 0)
+ goto err;
+
  rx_ring->xdp_prog = adapter->xdp_prog;
 
  return 0;
@@ -6596,6 +6603,7 @@ void ixgbe_free_rx_resources(struct ixgbe_ring *rx_ring)
  ixgbe_clean_rx_ring(rx_ring);
 
  rx_ring->xdp_prog = NULL;
+ xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
  vfree(rx_ring->rx_buffer_info);
  rx_ring->rx_buffer_info = NULL;
 
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 04/40] xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_reg

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

The driver code qede_free_fp_array() depend on kfree() can be called
with a NULL pointer. This stems from the qede_alloc_fp_array()
function which either (kz)alloc memory for fp->txq or fp->rxq.
This also simplifies error handling code in case of memory allocation
failures, but xdp_rxq_info_unreg need to know the difference.

Introduce xdp_rxq_info_is_reg() to handle if a memory allocation fails
and detect this is the failure path by seeing that xdp_rxq_info was
not registred yet, which first happens after successful alloaction in
qede_init_fp().

Driver hook points for xdp_rxq_info:
 * reg  : qede_init_fp
 * unreg: qede_free_fp_array

Tested on actual hardware with samples/bpf program.

V2: Driver have no proper error path for failed XDP RX-queue info reg, as
qede_init_fp() is a void function.

Cc: [hidden email]
Cc: Ariel Elior <[hidden email]>
Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: Alexei Starovoitov <[hidden email]>
(cherry picked from commit c0124f327e5cabd844a10d7e1fc5aa2a81e796a9)
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/ethernet/qlogic/qede/qede.h      |  2 ++
 drivers/net/ethernet/qlogic/qede/qede_fp.c   |  1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c | 10 ++++++++++
 include/net/xdp.h                            |  1 +
 net/core/xdp.c                               |  6 ++++++
 5 files changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 2a8535f1e966..93981490cc84 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -40,6 +40,7 @@
 #include <linux/kernel.h>
 #include <linux/mutex.h>
 #include <linux/bpf.h>
+#include <net/xdp.h>
 #include <linux/qed/qede_rdma.h>
 #include <linux/io.h>
 #ifdef CONFIG_RFS_ACCEL
@@ -349,6 +350,7 @@ struct qede_rx_queue {
  u64 xdp_no_pass;
 
  void *handle;
+ struct xdp_rxq_info xdp_rxq;
 };
 
 union db_prod {
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index da80852b5ce0..14941303189d 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -1004,6 +1004,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,
  xdp.data = xdp.data_hard_start + *data_offset;
  xdp_set_data_meta_invalid(&xdp);
  xdp.data_end = xdp.data + *len;
+ xdp.rxq = &rxq->xdp_rxq;
 
  /* Queues always have a full reset currently, so for the time
  * being until there's atomic program replace just mark read
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 60d59b057b9b..43ed50cb0030 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -770,6 +770,12 @@ static void qede_free_fp_array(struct qede_dev *edev)
  fp = &edev->fp_array[i];
 
  kfree(fp->sb_info);
+ /* Handle mem alloc failure case where qede_init_fp
+ * didn't register xdp_rxq_info yet.
+ * Implicit only (fp->type & QEDE_FASTPATH_RX)
+ */
+ if (fp->rxq && xdp_rxq_info_is_reg(&fp->rxq->xdp_rxq))
+ xdp_rxq_info_unreg(&fp->rxq->xdp_rxq);
  kfree(fp->rxq);
  kfree(fp->xdp_tx);
  kfree(fp->txq);
@@ -1518,6 +1524,10 @@ static void qede_init_fp(struct qede_dev *edev)
  else
  fp->rxq->data_direction = DMA_FROM_DEVICE;
  fp->rxq->dev = &edev->pdev->dev;
+
+ /* Driver have no error path from here */
+ WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
+ fp->rxq->rxq_id) < 0);
  }
 
  if (fp->type & QEDE_FASTPATH_TX) {
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 86c41631a908..b2362ddfa694 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -43,5 +43,6 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
      struct net_device *dev, u32 queue_index);
 void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
 void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
+bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
 
 #endif /* __LINUX_NET_XDP_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 229bc5a0ee04..097a0f74e004 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -65,3 +65,9 @@ void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq)
  xdp_rxq->reg_state = REG_STATE_UNUSED;
 }
 EXPORT_SYMBOL_GPL(xdp_rxq_info_unused);
+
+bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq)
+{
+ return (xdp_rxq->reg_state == REG_STATE_REGISTERED);
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg);
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 05/40] tun: setup xdp_rxq_info

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

Driver hook points for xdp_rxq_info:
 * reg  : tun_attach
 * unreg: __tun_detach

I've done some manual testing of this tun driver, but I would
appriciate good review and someone else running their use-case tests,
as I'm not 100% sure I understand the tfile->detached semantics.

V2: Removed the skb_array_cleanup() call from V1 by request from Jason Wang.

Cc: Jason Wang <[hidden email]>
Cc: "Michael S. Tsirkin" <[hidden email]>
Cc: Willem de Bruijn <[hidden email]>
Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Reviewed-by: Jason Wang <[hidden email]>
Signed-off-by: Alexei Starovoitov <[hidden email]>
(backported from commit 8bf5c4ee1889308ccd396fdfd40ac94129ee419f)
[ vilhelmgray: context adjustments ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/tun.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e9c7317618fa..2434b2bd902f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -181,6 +181,7 @@ struct tun_file {
  struct list_head next;
  struct tun_struct *detached;
  struct skb_array tx_array;
+ struct xdp_rxq_info xdp_rxq;
 };
 
 struct tun_flow_entry {
@@ -658,7 +659,10 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
     tun->dev->reg_state == NETREG_REGISTERED)
  unregister_netdevice(tun->dev);
  }
- skb_array_cleanup(&tfile->tx_array);
+ if (tun) {
+ skb_array_cleanup(&tfile->tx_array);
+ xdp_rxq_info_unreg(&tfile->xdp_rxq);
+ }
  sock_put(&tfile->sk);
  }
 }
@@ -699,11 +703,13 @@ static void tun_detach_all(struct net_device *dev)
  tun_napi_del(tfile);
  /* Drop read queue */
  tun_queue_purge(tfile);
+ xdp_rxq_info_unreg(&tfile->xdp_rxq);
  sock_put(&tfile->sk);
  }
  list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
  tun_enable_queue(tfile);
  tun_queue_purge(tfile);
+ xdp_rxq_info_unreg(&tfile->xdp_rxq);
  sock_put(&tfile->sk);
  }
  BUG_ON(tun->numdisabled != 0);
@@ -759,6 +765,22 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 
  tfile->queue_index = tun->numqueues;
  tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
+
+ if (tfile->detached) {
+ /* Re-attach detached tfile, updating XDP queue_index */
+ WARN_ON(!xdp_rxq_info_is_reg(&tfile->xdp_rxq));
+
+ if (tfile->xdp_rxq.queue_index    != tfile->queue_index)
+ tfile->xdp_rxq.queue_index = tfile->queue_index;
+ } else {
+ /* Setup XDP RX-queue info, for new tfile getting attached */
+ err = xdp_rxq_info_reg(&tfile->xdp_rxq,
+       tun->dev, tfile->queue_index);
+ if (err < 0)
+ goto out;
+ err = 0;
+ }
+
  if (tfile->detached) {
  tun_enable_queue(tfile);
  } else {
@@ -1498,6 +1520,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
  xdp.data = buf + pad;
  xdp_set_data_meta_invalid(&xdp);
  xdp.data_end = xdp.data + len;
+ xdp.rxq = &tfile->xdp_rxq;
  orig_data = xdp.data;
  act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 06/40] virtio_net: setup xdp_rxq_info

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

The virtio_net driver doesn't dynamically change the RX-ring queue
layout and backing pages, but instead reject XDP setup if all the
conditions for XDP is not meet.  Thus, the xdp_rxq_info also remains
fairly static.  This allow us to simply add the reg/unreg to
net_device open/close functions.

Driver hook points for xdp_rxq_info:
 * reg  : virtnet_open
 * unreg: virtnet_close

V3:
 - bugfix, also setup xdp.rxq in receive_mergeable()
 - Tested bpf-sample prog inside guest on a virtio_net device

Cc: "Michael S. Tsirkin" <[hidden email]>
Cc: Jason Wang <[hidden email]>
Cc: [hidden email]
Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Reviewed-by: Jason Wang <[hidden email]>
Signed-off-by: Alexei Starovoitov <[hidden email]>
(backported from commit 754b8a21a96d5f11712245aef907149606b323ae)
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/virtio_net.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2b6916c012d2..76a1eb622a04 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -31,6 +31,7 @@
 #include <linux/average.h>
 #include <linux/filter.h>
 #include <net/route.h>
+#include <net/xdp.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -116,6 +117,8 @@ struct receive_queue {
 
  /* Name of this receive queue: input.$index */
  char name[40];
+
+ struct xdp_rxq_info xdp_rxq;
 };
 
 /* Control VQ buffers: protected by the rtnl lock */
@@ -565,6 +568,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
  xdp.data = xdp.data_hard_start + xdp_headroom;
  xdp_set_data_meta_invalid(&xdp);
  xdp.data_end = xdp.data + len;
+ xdp.rxq = &rq->xdp_rxq;
  orig_data = xdp.data;
  act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
@@ -702,6 +706,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
  xdp.data = data + vi->hdr_len;
  xdp_set_data_meta_invalid(&xdp);
  xdp.data_end = xdp.data + (len - vi->hdr_len);
+ xdp.rxq = &rq->xdp_rxq;
+
  act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
  if (act != XDP_PASS)
@@ -1257,13 +1263,18 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 static int virtnet_open(struct net_device *dev)
 {
  struct virtnet_info *vi = netdev_priv(dev);
- int i;
+ int i, err;
 
  for (i = 0; i < vi->max_queue_pairs; i++) {
  if (i < vi->curr_queue_pairs)
  /* Make sure we have some buffers: if oom use wq. */
  if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
  schedule_delayed_work(&vi->refill, 0);
+
+ err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
+ if (err < 0)
+ return err;
+
  virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
  virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
  }
@@ -1601,6 +1612,7 @@ static int virtnet_close(struct net_device *dev)
  cancel_delayed_work_sync(&vi->refill);
 
  for (i = 0; i < vi->max_queue_pairs; i++) {
+ xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
  napi_disable(&vi->rq[i].napi);
  virtnet_napi_tx_disable(&vi->sq[i].napi);
  }
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 07/40] xdp: generic XDP handling of xdp_rxq_info

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

Hook points for xdp_rxq_info:
 * reg  : netif_alloc_rx_queues
 * unreg: netif_free_rx_queues

The net_device have some members (num_rx_queues + real_num_rx_queues)
and data-area (dev->_rx with struct netdev_rx_queue's) that were
primarily used for exporting information about RPS (CONFIG_RPS) queues
to sysfs (CONFIG_SYSFS).

For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info,
and remove some of the CONFIG_SYSFS ifdefs.

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: Alexei Starovoitov <[hidden email]>
(cherry picked from commit e817f85652c14d78f170b18797e4c477c78949e0)
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 69 +++++++++++++++++++++++++++++++++------
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ebe9a5bc648a..0482941b1647 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -44,6 +44,7 @@
 #include <net/dcbnl.h>
 #endif
 #include <net/netprio_cgroup.h>
+#include <net/xdp.h>
 
 #include <linux/netdev_features.h>
 #include <linux/neighbour.h>
@@ -686,6 +687,7 @@ struct netdev_rx_queue {
 #endif
  struct kobject kobj;
  struct net_device *dev;
+ struct xdp_rxq_info xdp_rxq;
 } ____cacheline_aligned_in_smp;
 
 /*
diff --git a/net/core/dev.c b/net/core/dev.c
index 0e37c4da15b5..d84ba4b0587a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3931,9 +3931,33 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
  return NET_RX_DROP;
 }
 
+static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct netdev_rx_queue *rxqueue;
+
+ rxqueue = dev->_rx;
+
+ if (skb_rx_queue_recorded(skb)) {
+ u16 index = skb_get_rx_queue(skb);
+
+ if (unlikely(index >= dev->real_num_rx_queues)) {
+ WARN_ONCE(dev->real_num_rx_queues > 1,
+  "%s received packet on queue %u, but number "
+  "of RX queues is %u\n",
+  dev->name, index, dev->real_num_rx_queues);
+
+ return rxqueue; /* Return first rxqueue */
+ }
+ rxqueue += index;
+ }
+ return rxqueue;
+}
+
 static u32 netif_receive_generic_xdp(struct sk_buff *skb,
      struct bpf_prog *xdp_prog)
 {
+ struct netdev_rx_queue *rxqueue;
  u32 metalen, act = XDP_DROP;
  struct xdp_buff xdp;
  void *orig_data;
@@ -3977,6 +4001,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
  xdp.data_hard_start = skb->data - skb_headroom(skb);
  orig_data = xdp.data;
 
+ rxqueue = netif_get_rxqueue(skb);
+ xdp.rxq = &rxqueue->xdp_rxq;
+
  act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
  off = xdp.data - orig_data;
@@ -7752,12 +7779,12 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 }
 EXPORT_SYMBOL(netif_stacked_transfer_operstate);
 
-#ifdef CONFIG_SYSFS
 static int netif_alloc_rx_queues(struct net_device *dev)
 {
  unsigned int i, count = dev->num_rx_queues;
  struct netdev_rx_queue *rx;
  size_t sz = count * sizeof(*rx);
+ int err = 0;
 
  BUG_ON(count < 1);
 
@@ -7767,11 +7794,39 @@ static int netif_alloc_rx_queues(struct net_device *dev)
 
  dev->_rx = rx;
 
- for (i = 0; i < count; i++)
+ for (i = 0; i < count; i++) {
  rx[i].dev = dev;
+
+ /* XDP RX-queue setup */
+ err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
+ if (err < 0)
+ goto err_rxq_info;
+ }
  return 0;
+
+err_rxq_info:
+ /* Rollback successful reg's and free other resources */
+ while (i--)
+ xdp_rxq_info_unreg(&rx[i].xdp_rxq);
+ kfree(dev->_rx);
+ dev->_rx = NULL;
+ return err;
+}
+
+static void netif_free_rx_queues(struct net_device *dev)
+{
+ unsigned int i, count = dev->num_rx_queues;
+ struct netdev_rx_queue *rx;
+
+ /* netif_alloc_rx_queues alloc failed, resources have been unreg'ed */
+ if (!dev->_rx)
+ return;
+
+ rx = dev->_rx;
+
+ for (i = 0; i < count; i++)
+ xdp_rxq_info_unreg(&rx[i].xdp_rxq);
 }
-#endif
 
 static void netdev_init_one_queue(struct net_device *dev,
   struct netdev_queue *queue, void *_unused)
@@ -8363,12 +8418,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
  return NULL;
  }
 
-#ifdef CONFIG_SYSFS
  if (rxqs < 1) {
  pr_err("alloc_netdev: Unable to allocate device with zero RX queues\n");
  return NULL;
  }
-#endif
 
  alloc_size = sizeof(struct net_device);
  if (sizeof_priv) {
@@ -8427,12 +8480,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
  if (netif_alloc_netdev_queues(dev))
  goto free_all;
 
-#ifdef CONFIG_SYSFS
  dev->num_rx_queues = rxqs;
  dev->real_num_rx_queues = rxqs;
  if (netif_alloc_rx_queues(dev))
  goto free_all;
-#endif
 
  strcpy(dev->name, name);
  dev->name_assign_type = name_assign_type;
@@ -8472,9 +8523,7 @@ void free_netdev(struct net_device *dev)
 
  might_sleep();
  netif_free_tx_queues(dev);
-#ifdef CONFIG_SYSFS
- kvfree(dev->_rx);
-#endif
+ netif_free_rx_queues(dev);
 
  kfree(rcu_dereference_protected(dev->ingress_queue, 1));
 
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 08/40] tun/tap: use ptr_ring instead of skb_array

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jason Wang <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

This patch switches to use ptr_ring instead of skb_array. This will be
used to enqueue different types of pointers by encoding type into
lower bits.

Signed-off-by: Jason Wang <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit 5990a30510ed1c37a769d3a035ad2d030b843528)
[ vilhelmgray: context adjustments ]
[ vilhelmgray: use ptr_ring_resize() in tun_attach() ]
[ vilhelmgray: use ptr_ring_init() in tun_chr_open() ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 debian.azure-4.15/config/config.common.ubuntu |  1 +
 drivers/net/tap.c                             | 42 ++++++++++--------
 drivers/net/tun.c                             | 44 ++++++++++---------
 drivers/vhost/net.c                           | 39 ++++++++--------
 include/linux/if_tap.h                        |  6 +--
 include/linux/if_tun.h                        |  4 +-
 6 files changed, 72 insertions(+), 64 deletions(-)

diff --git a/debian.azure-4.15/config/config.common.ubuntu b/debian.azure-4.15/config/config.common.ubuntu
index fbb0a87534ec..5d866c752057 100644
--- a/debian.azure-4.15/config/config.common.ubuntu
+++ b/debian.azure-4.15/config/config.common.ubuntu
@@ -3402,6 +3402,7 @@ CONFIG_PAGE_COUNTER=y
 # CONFIG_PAGE_EXTENSION is not set
 # CONFIG_PAGE_OWNER is not set
 # CONFIG_PAGE_POISONING is not set
+CONFIG_PAGE_POOL=y
 CONFIG_PAGE_TABLE_ISOLATION=y
 # CONFIG_PANASONIC_LAPTOP is not set
 CONFIG_PANEL=m
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index f8b44d395c2f..658235e0204f 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -330,6 +330,9 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
  if (!q)
  return RX_HANDLER_PASS;
 
+ if (__ptr_ring_full(&q->ring))
+ goto drop;
+
  skb_push(skb, ETH_HLEN);
 
  /* Apply the forward feature mask so that we perform segmentation
@@ -345,7 +348,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
  goto drop;
 
  if (!segs) {
- if (skb_array_produce(&q->skb_array, skb))
+ if (ptr_ring_produce(&q->ring, skb))
  goto drop;
  goto wake_up;
  }
@@ -355,7 +358,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
  struct sk_buff *nskb = segs->next;
 
  segs->next = NULL;
- if (skb_array_produce(&q->skb_array, segs)) {
+ if (ptr_ring_produce(&q->ring, segs)) {
  kfree_skb(segs);
  kfree_skb_list(nskb);
  break;
@@ -372,7 +375,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
     !(features & NETIF_F_CSUM_MASK) &&
     skb_checksum_help(skb))
  goto drop;
- if (skb_array_produce(&q->skb_array, skb))
+ if (ptr_ring_produce(&q->ring, skb))
  goto drop;
  }
 
@@ -494,7 +497,7 @@ static void tap_sock_destruct(struct sock *sk)
 {
  struct tap_queue *q = container_of(sk, struct tap_queue, sk);
 
- skb_array_cleanup(&q->skb_array);
+ ptr_ring_cleanup(&q->ring, __skb_array_destroy_skb);
 }
 
 static int tap_open(struct inode *inode, struct file *file)
@@ -514,7 +517,7 @@ static int tap_open(struct inode *inode, struct file *file)
      &tap_proto, 0);
  if (!q)
  goto err;
- if (skb_array_init(&q->skb_array, tap->dev->tx_queue_len, GFP_KERNEL)) {
+ if (ptr_ring_init(&q->ring, tap->dev->tx_queue_len, GFP_KERNEL)) {
  sk_free(&q->sk);
  goto err;
  }
@@ -543,7 +546,7 @@ static int tap_open(struct inode *inode, struct file *file)
 
  err = tap_set_queue(tap, file, q);
  if (err) {
- /* tap_sock_destruct() will take care of freeing skb_array */
+ /* tap_sock_destruct() will take care of freeing ptr_ring */
  goto err_put;
  }
 
@@ -580,7 +583,7 @@ static unsigned int tap_poll(struct file *file, poll_table *wait)
  mask = 0;
  poll_wait(file, &q->wq.wait, wait);
 
- if (!skb_array_empty(&q->skb_array))
+ if (!ptr_ring_empty(&q->ring))
  mask |= POLLIN | POLLRDNORM;
 
  if (sock_writeable(&q->sk) ||
@@ -844,7 +847,7 @@ static ssize_t tap_do_read(struct tap_queue *q,
  TASK_INTERRUPTIBLE);
 
  /* Read frames from the queue */
- skb = skb_array_consume(&q->skb_array);
+ skb = ptr_ring_consume(&q->ring);
  if (skb)
  break;
  if (noblock) {
@@ -1176,7 +1179,7 @@ static int tap_peek_len(struct socket *sock)
 {
  struct tap_queue *q = container_of(sock, struct tap_queue,
        sock);
- return skb_array_peek_len(&q->skb_array);
+ return PTR_RING_PEEK_CALL(&q->ring, __skb_array_len_with_tag);
 }
 
 /* Ops structure to mimic raw sockets with tun */
@@ -1202,7 +1205,7 @@ struct socket *tap_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tap_get_socket);
 
-struct skb_array *tap_get_skb_array(struct file *file)
+struct ptr_ring *tap_get_ptr_ring(struct file *file)
 {
  struct tap_queue *q;
 
@@ -1211,29 +1214,30 @@ struct skb_array *tap_get_skb_array(struct file *file)
  q = file->private_data;
  if (!q)
  return ERR_PTR(-EBADFD);
- return &q->skb_array;
+ return &q->ring;
 }
-EXPORT_SYMBOL_GPL(tap_get_skb_array);
+EXPORT_SYMBOL_GPL(tap_get_ptr_ring);
 
 int tap_queue_resize(struct tap_dev *tap)
 {
  struct net_device *dev = tap->dev;
  struct tap_queue *q;
- struct skb_array **arrays;
+ struct ptr_ring **rings;
  int n = tap->numqueues;
  int ret, i = 0;
 
- arrays = kmalloc_array(n, sizeof(*arrays), GFP_KERNEL);
- if (!arrays)
+ rings = kmalloc_array(n, sizeof(*rings), GFP_KERNEL);
+ if (!rings)
  return -ENOMEM;
 
  list_for_each_entry(q, &tap->queue_list, next)
- arrays[i++] = &q->skb_array;
+ rings[i++] = &q->ring;
 
- ret = skb_array_resize_multiple(arrays, n,
- dev->tx_queue_len, GFP_KERNEL);
+ ret = ptr_ring_resize_multiple(rings, n,
+       dev->tx_queue_len, GFP_KERNEL,
+       __skb_array_destroy_skb);
 
- kfree(arrays);
+ kfree(rings);
  return ret;
 }
 EXPORT_SYMBOL_GPL(tap_queue_resize);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2434b2bd902f..d1f2f25a748f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -180,7 +180,7 @@ struct tun_file {
  struct mutex napi_mutex; /* Protects access to the above napi */
  struct list_head next;
  struct tun_struct *detached;
- struct skb_array tx_array;
+ struct ptr_ring tx_ring;
  struct xdp_rxq_info xdp_rxq;
 };
 
@@ -606,7 +606,7 @@ static void tun_queue_purge(struct tun_file *tfile)
 {
  struct sk_buff *skb;
 
- while ((skb = skb_array_consume(&tfile->tx_array)) != NULL)
+ while ((skb = ptr_ring_consume(&tfile->tx_ring)) != NULL)
  kfree_skb(skb);
 
  skb_queue_purge(&tfile->sk.sk_write_queue);
@@ -660,7 +660,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
  unregister_netdevice(tun->dev);
  }
  if (tun) {
- skb_array_cleanup(&tfile->tx_array);
+ ptr_ring_cleanup(&tfile->tx_ring,
+ __skb_array_destroy_skb);
  xdp_rxq_info_unreg(&tfile->xdp_rxq);
  }
  sock_put(&tfile->sk);
@@ -758,7 +759,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
  }
 
  if (!tfile->detached &&
-    skb_array_resize(&tfile->tx_array, dev->tx_queue_len, GFP_KERNEL)) {
+    ptr_ring_resize(&tfile->tx_ring, dev->tx_queue_len, GFP_KERNEL, __skb_array_destroy_skb)) {
  err = -ENOMEM;
  goto out;
  }
@@ -1012,7 +1013,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
  nf_reset(skb);
 
- if (skb_array_produce(&tfile->tx_array, skb))
+ if (ptr_ring_produce(&tfile->tx_ring, skb))
  goto drop;
 
  /* Notify and wake up reader process */
@@ -1302,7 +1303,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait)
 
  poll_wait(file, sk_sleep(sk), wait);
 
- if (!skb_array_empty(&tfile->tx_array))
+ if (!ptr_ring_empty(&tfile->tx_ring))
  mask |= POLLIN | POLLRDNORM;
 
  /* Make sure SOCKWQ_ASYNC_NOSPACE is set if not writable to
@@ -1981,7 +1982,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock,
  struct sk_buff *skb = NULL;
  int error = 0;
 
- skb = skb_array_consume(&tfile->tx_array);
+ skb = ptr_ring_consume(&tfile->tx_ring);
  if (skb)
  goto out;
  if (noblock) {
@@ -1993,7 +1994,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock,
 
  while (1) {
  set_current_state(TASK_INTERRUPTIBLE);
- skb = skb_array_consume(&tfile->tx_array);
+ skb = ptr_ring_consume(&tfile->tx_ring);
  if (skb)
  break;
  if (signal_pending(current)) {
@@ -2191,7 +2192,7 @@ static int tun_peek_len(struct socket *sock)
  if (!tun)
  return 0;
 
- ret = skb_array_peek_len(&tfile->tx_array);
+ ret = PTR_RING_PEEK_CALL(&tfile->tx_ring, __skb_array_len_with_tag);
  tun_put(tun);
 
  return ret;
@@ -2921,7 +2922,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
     &tun_proto, 0);
  if (!tfile)
  return -ENOMEM;
- if (skb_array_init(&tfile->tx_array, 0, GFP_KERNEL)) {
+ if (ptr_ring_init(&tfile->tx_ring, 0, GFP_KERNEL)) {
  sk_free(&tfile->sk);
  return -ENOMEM;
  }
@@ -3094,25 +3095,26 @@ static int tun_queue_resize(struct tun_struct *tun)
 {
  struct net_device *dev = tun->dev;
  struct tun_file *tfile;
- struct skb_array **arrays;
+ struct ptr_ring **rings;
  int n = tun->numqueues + tun->numdisabled;
  int ret, i;
 
- arrays = kmalloc_array(n, sizeof(*arrays), GFP_KERNEL);
- if (!arrays)
+ rings = kmalloc_array(n, sizeof(*rings), GFP_KERNEL);
+ if (!rings)
  return -ENOMEM;
 
  for (i = 0; i < tun->numqueues; i++) {
  tfile = rtnl_dereference(tun->tfiles[i]);
- arrays[i] = &tfile->tx_array;
+ rings[i] = &tfile->tx_ring;
  }
  list_for_each_entry(tfile, &tun->disabled, next)
- arrays[i++] = &tfile->tx_array;
+ rings[i++] = &tfile->tx_ring;
 
- ret = skb_array_resize_multiple(arrays, n,
- dev->tx_queue_len, GFP_KERNEL);
+ ret = ptr_ring_resize_multiple(rings, n,
+       dev->tx_queue_len, GFP_KERNEL,
+       __skb_array_destroy_skb);
 
- kfree(arrays);
+ kfree(rings);
  return ret;
 }
 
@@ -3207,7 +3209,7 @@ struct socket *tun_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tun_get_socket);
 
-struct skb_array *tun_get_skb_array(struct file *file)
+struct ptr_ring *tun_get_tx_ring(struct file *file)
 {
  struct tun_file *tfile;
 
@@ -3216,9 +3218,9 @@ struct skb_array *tun_get_skb_array(struct file *file)
  tfile = file->private_data;
  if (!tfile)
  return ERR_PTR(-EBADFD);
- return &tfile->tx_array;
+ return &tfile->tx_ring;
 }
-EXPORT_SYMBOL_GPL(tun_get_skb_array);
+EXPORT_SYMBOL_GPL(tun_get_tx_ring);
 
 module_init(tun_init);
 module_exit(tun_cleanup);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 17606d9688c7..0f1e31ee29b3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -95,7 +95,7 @@ struct vhost_net_ubuf_ref {
 
 #define VHOST_RX_BATCH 64
 struct vhost_net_buf {
- struct sk_buff **queue;
+ void **queue;
  int tail;
  int head;
 };
@@ -114,7 +114,7 @@ struct vhost_net_virtqueue {
  /* Reference counting for outstanding ubufs.
  * Protected by vq mutex. Writers must also take device mutex. */
  struct vhost_net_ubuf_ref *ubufs;
- struct skb_array *rx_array;
+ struct ptr_ring *rx_ring;
  struct vhost_net_buf rxq;
 };
 
@@ -164,7 +164,7 @@ static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq)
  struct vhost_net_buf *rxq = &nvq->rxq;
 
  rxq->head = 0;
- rxq->tail = skb_array_consume_batched(nvq->rx_array, rxq->queue,
+ rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue,
       VHOST_RX_BATCH);
  return rxq->tail;
 }
@@ -173,9 +173,10 @@ static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
 {
  struct vhost_net_buf *rxq = &nvq->rxq;
 
- if (nvq->rx_array && !vhost_net_buf_is_empty(rxq)) {
- skb_array_unconsume(nvq->rx_array, rxq->queue + rxq->head,
-    vhost_net_buf_get_size(rxq));
+ if (nvq->rx_ring && !vhost_net_buf_is_empty(rxq)) {
+ ptr_ring_unconsume(nvq->rx_ring, rxq->queue + rxq->head,
+   vhost_net_buf_get_size(rxq),
+   __skb_array_destroy_skb);
  rxq->head = rxq->tail = 0;
  }
 }
@@ -593,7 +594,7 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
  int len = 0;
  unsigned long flags;
 
- if (rvq->rx_array)
+ if (rvq->rx_ring)
  return vhost_net_buf_peek(rvq);
 
  spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
@@ -806,7 +807,7 @@ static void handle_rx(struct vhost_net *net)
  * they refilled. */
  goto out;
  }
- if (nvq->rx_array)
+ if (nvq->rx_ring)
  msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
  /* On overrun, truncate and discard */
  if (unlikely(headcount > UIO_MAXIOV)) {
@@ -910,7 +911,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
  struct vhost_net *n;
  struct vhost_dev *dev;
  struct vhost_virtqueue **vqs;
- struct sk_buff **queue;
+ void **queue;
  int i;
 
  n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
@@ -922,7 +923,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
  return -ENOMEM;
  }
 
- queue = kmalloc_array(VHOST_RX_BATCH, sizeof(struct sk_buff *),
+ queue = kmalloc_array(VHOST_RX_BATCH, sizeof(void *),
       GFP_KERNEL);
  if (!queue) {
  kfree(vqs);
@@ -1052,23 +1053,23 @@ static struct socket *get_raw_socket(int fd)
  return ERR_PTR(r);
 }
 
-static struct skb_array *get_tap_skb_array(int fd)
+static struct ptr_ring *get_tap_ptr_ring(int fd)
 {
- struct skb_array *array;
+ struct ptr_ring *ring;
  struct file *file = fget(fd);
 
  if (!file)
  return NULL;
- array = tun_get_skb_array(file);
- if (!IS_ERR(array))
+ ring = tun_get_tx_ring(file);
+ if (!IS_ERR(ring))
  goto out;
- array = tap_get_skb_array(file);
- if (!IS_ERR(array))
+ ring = tap_get_ptr_ring(file);
+ if (!IS_ERR(ring))
  goto out;
- array = NULL;
+ ring = NULL;
 out:
  fput(file);
- return array;
+ return ring;
 }
 
 static struct socket *get_tap_socket(int fd)
@@ -1149,7 +1150,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
  vq->private_data = sock;
  vhost_net_buf_unproduce(nvq);
  if (index == VHOST_NET_VQ_RX)
- nvq->rx_array = get_tap_skb_array(fd);
+ nvq->rx_ring = get_tap_ptr_ring(fd);
  r = vhost_vq_init_access(vq);
  if (r)
  goto err_used;
diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
index 3ecef57c31e3..8e66866c11be 100644
--- a/include/linux/if_tap.h
+++ b/include/linux/if_tap.h
@@ -4,7 +4,7 @@
 
 #if IS_ENABLED(CONFIG_TAP)
 struct socket *tap_get_socket(struct file *);
-struct skb_array *tap_get_skb_array(struct file *file);
+struct ptr_ring *tap_get_ptr_ring(struct file *file);
 #else
 #include <linux/err.h>
 #include <linux/errno.h>
@@ -14,7 +14,7 @@ static inline struct socket *tap_get_socket(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
-static inline struct skb_array *tap_get_skb_array(struct file *f)
+static inline struct ptr_ring *tap_get_ptr_ring(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
@@ -70,7 +70,7 @@ struct tap_queue {
  u16 queue_index;
  bool enabled;
  struct list_head next;
- struct skb_array skb_array;
+ struct ptr_ring ring;
 };
 
 rx_handler_result_t tap_handle_frame(struct sk_buff **pskb);
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index bf9bdf42d577..bdee9b83baf6 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -19,7 +19,7 @@
 
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);
-struct skb_array *tun_get_skb_array(struct file *file);
+struct ptr_ring *tun_get_tx_ring(struct file *file);
 #else
 #include <linux/err.h>
 #include <linux/errno.h>
@@ -29,7 +29,7 @@ static inline struct socket *tun_get_socket(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
-static inline struct skb_array *tun_get_skb_array(struct file *f)
+static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 09/40] tuntap: XDP transmission

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jason Wang <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

This patch implements XDP transmission for TAP. Since we can't create
new queues for TAP during XDP set, exist ptr_ring was reused for
queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
(0x1UL) was encoded into lowest bit of xpd_buff pointer during
ptr_ring_produce, and was decoded during consuming. XDP metadata was
stored in the headroom of the packet which should work in most of
cases since driver usually reserve enough headroom. Very minor changes
were done for vhost_net: it just need to peek the length depends on
the type of pointer.

Tests were done on two Intel E5-2630 2.40GHz machines connected back
to back through two 82599ES. Traffic were generated/received through
MoonGen/testpmd(rxonly). It reports ~20% improvements when
xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps
to 3.05Mpps)

Cc: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: Jason Wang <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit fc72d1d54dd9ffe2552c76b17e9129803ca7b255)
[ vilhelmgray: context adjustments ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/tun.c      | 211 ++++++++++++++++++++++++++++++++++-------
 drivers/vhost/net.c    |  13 ++-
 include/linux/if_tun.h |  17 ++++
 3 files changed, 208 insertions(+), 33 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d1f2f25a748f..6d827c0ef8cc 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -236,6 +236,24 @@ struct tun_struct {
  struct bpf_prog __rcu *xdp_prog;
 };
 
+bool tun_is_xdp_buff(void *ptr)
+{
+ return (unsigned long)ptr & TUN_XDP_FLAG;
+}
+EXPORT_SYMBOL(tun_is_xdp_buff);
+
+void *tun_xdp_to_ptr(void *ptr)
+{
+ return (void *)((unsigned long)ptr | TUN_XDP_FLAG);
+}
+EXPORT_SYMBOL(tun_xdp_to_ptr);
+
+void *tun_ptr_to_xdp(void *ptr)
+{
+ return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG);
+}
+EXPORT_SYMBOL(tun_ptr_to_xdp);
+
 static int tun_napi_receive(struct napi_struct *napi, int budget)
 {
  struct tun_file *tfile = container_of(napi, struct tun_file, napi);
@@ -602,12 +620,25 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile)
  return tun;
 }
 
+static void tun_ptr_free(void *ptr)
+{
+ if (!ptr)
+ return;
+ if (tun_is_xdp_buff(ptr)) {
+ struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+
+ put_page(virt_to_head_page(xdp->data));
+ } else {
+ __skb_array_destroy_skb(ptr);
+ }
+}
+
 static void tun_queue_purge(struct tun_file *tfile)
 {
- struct sk_buff *skb;
+ void *ptr;
 
- while ((skb = ptr_ring_consume(&tfile->tx_ring)) != NULL)
- kfree_skb(skb);
+ while ((ptr = ptr_ring_consume(&tfile->tx_ring)) != NULL)
+ tun_ptr_free(ptr);
 
  skb_queue_purge(&tfile->sk.sk_write_queue);
  skb_queue_purge(&tfile->sk.sk_error_queue);
@@ -660,8 +691,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
  unregister_netdevice(tun->dev);
  }
  if (tun) {
- ptr_ring_cleanup(&tfile->tx_ring,
- __skb_array_destroy_skb);
+ ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free);
  xdp_rxq_info_unreg(&tfile->xdp_rxq);
  }
  sock_put(&tfile->sk);
@@ -1200,6 +1230,67 @@ static const struct net_device_ops tun_netdev_ops = {
  .ndo_change_carrier = tun_net_change_carrier,
 };
 
+static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
+{
+ struct tun_struct *tun = netdev_priv(dev);
+ struct xdp_buff *buff = xdp->data_hard_start;
+ int headroom = xdp->data - xdp->data_hard_start;
+ struct tun_file *tfile;
+ u32 numqueues;
+ int ret = 0;
+
+ /* Assure headroom is available and buff is properly aligned */
+ if (unlikely(headroom < sizeof(*xdp) || tun_is_xdp_buff(xdp)))
+ return -ENOSPC;
+
+ *buff = *xdp;
+
+ rcu_read_lock();
+
+ numqueues = READ_ONCE(tun->numqueues);
+ if (!numqueues) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
+ tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
+    numqueues]);
+ /* Encode the XDP flag into lowest bit for consumer to differ
+ * XDP buffer from sk_buff.
+ */
+ if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(buff))) {
+ this_cpu_inc(tun->pcpu_stats->tx_dropped);
+ ret = -ENOSPC;
+ }
+
+out:
+ rcu_read_unlock();
+ return ret;
+}
+
+static void tun_xdp_flush(struct net_device *dev)
+{
+ struct tun_struct *tun = netdev_priv(dev);
+ struct tun_file *tfile;
+ u32 numqueues;
+
+ rcu_read_lock();
+
+ numqueues = READ_ONCE(tun->numqueues);
+ if (!numqueues)
+ goto out;
+
+ tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
+    numqueues]);
+ /* Notify and wake up reader process */
+ if (tfile->flags & TUN_FASYNC)
+ kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
+ tfile->socket.sk->sk_data_ready(tfile->socket.sk);
+
+out:
+ rcu_read_unlock();
+}
+
 static const struct net_device_ops tap_netdev_ops = {
  .ndo_uninit = tun_net_uninit,
  .ndo_open = tun_net_open,
@@ -1217,6 +1308,8 @@ static const struct net_device_ops tap_netdev_ops = {
  .ndo_set_rx_headroom = tun_set_headroom,
  .ndo_get_stats64 = tun_net_get_stats64,
  .ndo_bpf = tun_xdp,
+ .ndo_xdp_xmit = tun_xdp_xmit,
+ .ndo_xdp_flush = tun_xdp_flush,
  .ndo_change_carrier = tun_net_change_carrier,
 };
 
@@ -1877,6 +1970,40 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from)
  return result;
 }
 
+static ssize_t tun_put_user_xdp(struct tun_struct *tun,
+ struct tun_file *tfile,
+ struct xdp_buff *xdp,
+ struct iov_iter *iter)
+{
+ int vnet_hdr_sz = 0;
+ size_t size = xdp->data_end - xdp->data;
+ struct tun_pcpu_stats *stats;
+ size_t ret;
+
+ if (tun->flags & IFF_VNET_HDR) {
+ struct virtio_net_hdr gso = { 0 };
+
+ vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
+ if (unlikely(iov_iter_count(iter) < vnet_hdr_sz))
+ return -EINVAL;
+ if (unlikely(copy_to_iter(&gso, sizeof(gso), iter) !=
+     sizeof(gso)))
+ return -EFAULT;
+ iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso));
+ }
+
+ ret = copy_to_iter(xdp->data, size, iter) + vnet_hdr_sz;
+
+ stats = get_cpu_ptr(tun->pcpu_stats);
+ u64_stats_update_begin(&stats->syncp);
+ stats->tx_packets++;
+ stats->tx_bytes += ret;
+ u64_stats_update_end(&stats->syncp);
+ put_cpu_ptr(tun->pcpu_stats);
+
+ return ret;
+}
+
 /* Put packet to the user space buffer */
 static ssize_t tun_put_user(struct tun_struct *tun,
     struct tun_file *tfile,
@@ -1975,15 +2102,14 @@ static ssize_t tun_put_user(struct tun_struct *tun,
  return total;
 }
 
-static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock,
-     int *err)
+static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
 {
  DECLARE_WAITQUEUE(wait, current);
- struct sk_buff *skb = NULL;
+ void *ptr = NULL;
  int error = 0;
 
- skb = ptr_ring_consume(&tfile->tx_ring);
- if (skb)
+ ptr = ptr_ring_consume(&tfile->tx_ring);
+ if (ptr)
  goto out;
  if (noblock) {
  error = -EAGAIN;
@@ -1994,8 +2120,8 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock,
 
  while (1) {
  set_current_state(TASK_INTERRUPTIBLE);
- skb = ptr_ring_consume(&tfile->tx_ring);
- if (skb)
+ ptr = ptr_ring_consume(&tfile->tx_ring);
+ if (ptr)
  break;
  if (signal_pending(current)) {
  error = -ERESTARTSYS;
@@ -2014,12 +2140,12 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock,
 
 out:
  *err = error;
- return skb;
+ return ptr;
 }
 
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
    struct iov_iter *to,
-   int noblock, struct sk_buff *skb)
+   int noblock, void *ptr)
 {
  ssize_t ret;
  int err;
@@ -2027,23 +2153,31 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
  tun_debug(KERN_INFO, tun, "tun_do_read\n");
 
  if (!iov_iter_count(to)) {
- if (skb)
- kfree_skb(skb);
+ tun_ptr_free(ptr);
  return 0;
  }
 
- if (!skb) {
+ if (!ptr) {
  /* Read frames from ring */
- skb = tun_ring_recv(tfile, noblock, &err);
- if (!skb)
+ ptr = tun_ring_recv(tfile, noblock, &err);
+ if (!ptr)
  return err;
  }
 
- ret = tun_put_user(tun, tfile, skb, to);
- if (unlikely(ret < 0))
- kfree_skb(skb);
- else
- consume_skb(skb);
+ if (tun_is_xdp_buff(ptr)) {
+ struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+
+ ret = tun_put_user_xdp(tun, tfile, xdp, to);
+ put_page(virt_to_head_page(xdp->data));
+ } else {
+ struct sk_buff *skb = ptr;
+
+ ret = tun_put_user(tun, tfile, skb, to);
+ if (unlikely(ret < 0))
+ kfree_skb(skb);
+ else
+ consume_skb(skb);
+ }
 
  return ret;
 }
@@ -2148,12 +2282,12 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 {
  struct tun_file *tfile = container_of(sock, struct tun_file, socket);
  struct tun_struct *tun = tun_get(tfile);
- struct sk_buff *skb = m->msg_control;
+ void *ptr = m->msg_control;
  int ret;
 
  if (!tun) {
  ret = -EBADFD;
- goto out_free_skb;
+ goto out_free;
  }
 
  if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) {
@@ -2165,7 +2299,7 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
  SOL_PACKET, TUN_TX_TIMESTAMP);
  goto out;
  }
- ret = tun_do_read(tun, tfile, &m->msg_iter, flags & MSG_DONTWAIT, skb);
+ ret = tun_do_read(tun, tfile, &m->msg_iter, flags & MSG_DONTWAIT, ptr);
  if (ret > (ssize_t)total_len) {
  m->msg_flags |= MSG_TRUNC;
  ret = flags & MSG_TRUNC ? ret : total_len;
@@ -2176,12 +2310,25 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 
 out_put_tun:
  tun_put(tun);
-out_free_skb:
- if (skb)
- kfree_skb(skb);
+out_free:
+ tun_ptr_free(ptr);
  return ret;
 }
 
+static int tun_ptr_peek_len(void *ptr)
+{
+ if (likely(ptr)) {
+ if (tun_is_xdp_buff(ptr)) {
+ struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+
+ return xdp->data_end - xdp->data;
+ }
+ return __skb_array_len_with_tag(ptr);
+ } else {
+ return 0;
+ }
+}
+
 static int tun_peek_len(struct socket *sock)
 {
  struct tun_file *tfile = container_of(sock, struct tun_file, socket);
@@ -2192,7 +2339,7 @@ static int tun_peek_len(struct socket *sock)
  if (!tun)
  return 0;
 
- ret = PTR_RING_PEEK_CALL(&tfile->tx_ring, __skb_array_len_with_tag);
+ ret = PTR_RING_PEEK_CALL(&tfile->tx_ring, tun_ptr_peek_len);
  tun_put(tun);
 
  return ret;
@@ -3112,7 +3259,7 @@ static int tun_queue_resize(struct tun_struct *tun)
 
  ret = ptr_ring_resize_multiple(rings, n,
        dev->tx_queue_len, GFP_KERNEL,
-       __skb_array_destroy_skb);
+       tun_ptr_free);
 
  kfree(rings);
  return ret;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0f1e31ee29b3..2f3d5298d630 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -181,6 +181,17 @@ static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
  }
 }
 
+static int vhost_net_buf_peek_len(void *ptr)
+{
+ if (tun_is_xdp_buff(ptr)) {
+ struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+
+ return xdp->data_end - xdp->data;
+ }
+
+ return __skb_array_len_with_tag(ptr);
+}
+
 static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq)
 {
  struct vhost_net_buf *rxq = &nvq->rxq;
@@ -192,7 +203,7 @@ static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq)
  return 0;
 
 out:
- return __skb_array_len_with_tag(vhost_net_buf_get_ptr(rxq));
+ return vhost_net_buf_peek_len(vhost_net_buf_get_ptr(rxq));
 }
 
 static void vhost_net_buf_init(struct vhost_net_buf *rxq)
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index bdee9b83baf6..08e66827ad8e 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -17,9 +17,14 @@
 
 #include <uapi/linux/if_tun.h>
 
+#define TUN_XDP_FLAG 0x1UL
+
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);
 struct ptr_ring *tun_get_tx_ring(struct file *file);
+bool tun_is_xdp_buff(void *ptr);
+void *tun_xdp_to_ptr(void *ptr);
+void *tun_ptr_to_xdp(void *ptr);
 #else
 #include <linux/err.h>
 #include <linux/errno.h>
@@ -33,5 +38,17 @@ static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
+static inline bool tun_is_xdp_buff(void *ptr)
+{
+ return false;
+}
+void *tun_xdp_to_ptr(void *ptr)
+{
+ return NULL;
+}
+void *tun_ptr_to_xdp(void *ptr)
+{
+ return NULL;
+}
 #endif /* CONFIG_TUN */
 #endif /* __IF_TUN_H */
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 10/40] net: avoid including xdp.h in filter.h

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

If is sufficient with a forward declaration of struct xdp_rxq_info in
linux/filter.h, which avoids including net/xdp.h.  This was originally
suggested by John Fastabend during the review phase, but wasn't
included in the final patchset revision.  Thus, this followup.

Suggested-by: John Fastabend <[hidden email]>
Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: Alexei Starovoitov <[hidden email]>
(backported from commit 297dd12cb104151797fd649433a2157b585f1718)
[ vilhelmgray: context adjustment ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 include/linux/filter.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 158fb795cba7..449091818f43 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -19,7 +19,6 @@
 #include <linux/cryptohash.h>
 #include <linux/set_memory.h>
 
-#include <net/xdp.h>
 #include <net/sch_generic.h>
 
 #include <uapi/linux/filter.h>
@@ -29,6 +28,7 @@ struct sk_buff;
 struct sock;
 struct seccomp_data;
 struct bpf_prog_aux;
+struct xdp_rxq_info;
 
 /* ArgX, context and stack frame pointer register positions. Note,
  * Arg1, Arg2, Arg3, etc are used as argument mappings of function
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 11/40] virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

When a driver implements the ndo_xdp_xmit() function, there is
(currently) no generic way to determine whether it is safe to call.

It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not
allocated the needed XDP TX queues yet.  This is the case for
virtio_net, which first allocates the XDP TX queues once an XDP/bpf
prog is attached (in virtnet_xdp_set()).

Thus, a crash will occur for virtio_net when redirecting to another
virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog.
The sample xdp_redirect_map tries to attach a dummy XDP prog to take
this into account, but it can also easily fail if the virtio_net (or
actually underlying vhost driver) have not allocated enough extra
queues for the device.

Allocating more queue this is currently a manual config.
Hint for libvirt XML add:

  <driver name='vhost' queues='16'>
    <host mrg_rxbuf='off'/>
    <guest tso4='off' tso6='off' ecn='off' ufo='off'/>
  </driver>

The solution in this patch is to check that the device have loaded an
XDP/bpf prog before proceeding.  This is similar to the check
performed in driver ixgbe.

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Acked-by: John Fastabend <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 8dcc5b0ab0ec9a2efb3362d380272546b8b2ee26)
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/virtio_net.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 76a1eb622a04..80aad47c8c97 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -435,8 +435,18 @@ static bool __virtnet_xdp_xmit(struct virtnet_info *vi,
 static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
 {
  struct virtnet_info *vi = netdev_priv(dev);
- bool sent = __virtnet_xdp_xmit(vi, xdp);
+ struct receive_queue *rq = vi->rq;
+ struct bpf_prog *xdp_prog;
+ bool sent;
+
+ /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
+ * indicate XDP resources have been successfully allocated.
+ */
+ xdp_prog = rcu_dereference(rq->xdp_prog);
+ if (!xdp_prog)
+ return -ENXIO;
 
+ sent = __virtnet_xdp_xmit(vi, xdp);
  if (!sent)
  return -ENOSPC;
  return 0;
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 12/40] tuntap: XDP_TX can use native XDP

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jason Wang <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

Now we have ndo_xdp_xmit, switch to use it instead of the slow generic
XDP TX routine. XDP_TX on TAP gets ~20% improvements from ~1.5Mpps to
~1.8Mpps on 2.60GHz Core(TM) i7-5600U.

Signed-off-by: Jason Wang <[hidden email]>
Acked-by: Michael S. Tsirkin <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit 59655a5b6c837e392e873629591069c898585592)
[ vilhelmgray: context adjustments ]
[ vilhelmgray: use local_bh_enable() instead of preempt_enable() ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/tun.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 6d827c0ef8cc..1d7f953d83e4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1572,7 +1572,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
  unsigned int delta = 0;
  char *buf;
  size_t copied;
- bool xdp_xmit = false;
  int err, pad = TUN_RX_PAD;
 
  rcu_read_lock();
@@ -1630,8 +1629,14 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
  local_bh_enable();
  return NULL;
  case XDP_TX:
- xdp_xmit = true;
- /* fall through */
+ get_page(alloc_frag->page);
+ alloc_frag->offset += buflen;
+ if (tun_xdp_xmit(tun->dev, &xdp))
+ goto err_redirect;
+ tun_xdp_flush(tun->dev);
+ rcu_read_unlock();
+ local_bh_enable();
+ return NULL;
  case XDP_PASS:
  delta = orig_data - xdp.data;
  break;
@@ -1659,14 +1664,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
  get_page(alloc_frag->page);
  alloc_frag->offset += buflen;
 
- if (xdp_xmit) {
- skb->dev = tun->dev;
- generic_xdp_tx(skb, xdp_prog);
- rcu_read_unlock();
- local_bh_enable();
- return NULL;
- }
-
  rcu_read_unlock();
  local_bh_enable();
 
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 13/40] i40e: add support for XDP_REDIRECT

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Björn Töpel <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

The driver now acts upon the XDP_REDIRECT return action. Two new ndos
are implemented, ndo_xdp_xmit and ndo_xdp_flush.

XDP_REDIRECT action enables XDP program to redirect frames to other
netdevs.

Signed-off-by: Björn Töpel <[hidden email]>
Tested-by: Andrew Bowers <[hidden email]>
Signed-off-by: Jeff Kirsher <[hidden email]>
(backported from commit d9314c474d4fc1985e836b92fba4c40dd84885a7)
[ vilhelmgray: context adjustment ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |  2 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 74 ++++++++++++++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |  2 +
 3 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 65799dcabcc9..de222069516c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11772,6 +11772,8 @@ static const struct net_device_ops i40e_netdev_ops = {
  .ndo_bridge_getlink = i40e_ndo_bridge_getlink,
  .ndo_bridge_setlink = i40e_ndo_bridge_setlink,
  .ndo_bpf = i40e_xdp,
+ .ndo_xdp_xmit = i40e_xdp_xmit,
+ .ndo_xdp_flush = i40e_xdp_flush,
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5bc2748ac468..9bc0edac43b5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1996,7 +1996,7 @@ static int i40e_xmit_xdp_ring(struct xdp_buff *xdp,
 static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
     struct xdp_buff *xdp)
 {
- int result = I40E_XDP_PASS;
+ int err, result = I40E_XDP_PASS;
  struct i40e_ring *xdp_ring;
  struct bpf_prog *xdp_prog;
  u32 act;
@@ -2015,6 +2015,10 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
  xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
  result = i40e_xmit_xdp_ring(xdp, xdp_ring);
  break;
+ case XDP_REDIRECT:
+ err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+ result = !err ? I40E_XDP_TX : I40E_XDP_CONSUMED;
+ break;
  default:
  bpf_warn_invalid_xdp_action(act);
  case XDP_ABORTED:
@@ -2050,6 +2054,15 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 #endif
 }
 
+static inline void i40e_xdp_ring_update_tail(struct i40e_ring *xdp_ring)
+{
+ /* Force memory writes to complete before letting h/w
+ * know there are new descriptors to fetch.
+ */
+ wmb();
+ writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
+}
+
 /**
  * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @rx_ring: rx descriptor ring to transact packets on
@@ -2182,16 +2195,11 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
  }
 
  if (xdp_xmit) {
- struct i40e_ring *xdp_ring;
-
- xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
-
- /* Force memory writes to complete before letting h/w
- * know there are new descriptors to fetch.
- */
- wmb();
+ struct i40e_ring *xdp_ring =
+ rx_ring->vsi->xdp_rings[rx_ring->queue_index];
 
- writel(xdp_ring->next_to_use, xdp_ring->tail);
+ i40e_xdp_ring_update_tail(xdp_ring);
+ xdp_do_flush_map();
  }
 
  rx_ring->skb = skb;
@@ -3445,3 +3453,49 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 
  return i40e_xmit_frame_ring(skb, tx_ring);
 }
+
+/**
+ * i40e_xdp_xmit - Implements ndo_xdp_xmit
+ * @dev: netdev
+ * @xdp: XDP buffer
+ *
+ * Returns Zero if sent, else an error code
+ **/
+int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
+{
+ struct i40e_netdev_priv *np = netdev_priv(dev);
+ unsigned int queue_index = smp_processor_id();
+ struct i40e_vsi *vsi = np->vsi;
+ int err;
+
+ if (test_bit(__I40E_VSI_DOWN, vsi->state))
+ return -ENETDOWN;
+
+ if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
+ return -ENXIO;
+
+ err = i40e_xmit_xdp_ring(xdp, vsi->xdp_rings[queue_index]);
+ if (err != I40E_XDP_TX)
+ return -ENOSPC;
+
+ return 0;
+}
+
+/**
+ * i40e_xdp_flush - Implements ndo_xdp_flush
+ * @dev: netdev
+ **/
+void i40e_xdp_flush(struct net_device *dev)
+{
+ struct i40e_netdev_priv *np = netdev_priv(dev);
+ unsigned int queue_index = smp_processor_id();
+ struct i40e_vsi *vsi = np->vsi;
+
+ if (test_bit(__I40E_VSI_DOWN, vsi->state))
+ return;
+
+ if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
+ return;
+
+ i40e_xdp_ring_update_tail(vsi->xdp_rings[queue_index]);
+}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index fbae1182e2ea..c0fe46404a0c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -500,6 +500,8 @@ void i40e_force_wb(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector);
 u32 i40e_get_tx_pending(struct i40e_ring *ring);
 int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size);
 bool __i40e_chk_linearize(struct sk_buff *skb);
+int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp);
+void i40e_xdp_flush(struct net_device *dev);
 
 /**
  * i40e_get_head - Retrieve head from head writeback
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 14/40] xdp: introduce xdp_return_frame API and use in cpumap

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

Introduce an xdp_return_frame API, and convert over cpumap as
the first user, given it have queued XDP frame structure to leverage.

V3: Cleanup and remove C99 style comments, pointed out by Alex Duyck.
V6: Remove comment that id will be added later (Req by Alex Duyck)
V8: Rename enum mem_type to xdp_mem_type (found by kbuild test robot)

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit 5ab073ffd326480a6185d096e9703f62ef92b86c)
[ vilhelmgray: context adjustment ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 include/net/xdp.h   | 27 ++++++++++++++++++++
 kernel/bpf/cpumap.c | 60 +++++++++++++++++++++++++++------------------
 net/core/xdp.c      | 18 ++++++++++++++
 3 files changed, 81 insertions(+), 24 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index b2362ddfa694..e4207699c410 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -33,16 +33,43 @@
  * also mandatory during RX-ring setup.
  */
 
+enum xdp_mem_type {
+ MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
+ MEM_TYPE_PAGE_ORDER0,     /* Orig XDP full page model */
+ MEM_TYPE_MAX,
+};
+
+struct xdp_mem_info {
+ u32 type; /* enum xdp_mem_type, but known size type */
+};
+
 struct xdp_rxq_info {
  struct net_device *dev;
  u32 queue_index;
  u32 reg_state;
+ struct xdp_mem_info mem;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
+
+static inline
+void xdp_return_frame(void *data, struct xdp_mem_info *mem)
+{
+ if (mem->type == MEM_TYPE_PAGE_SHARED)
+ page_frag_free(data);
+
+ if (mem->type == MEM_TYPE_PAGE_ORDER0) {
+ struct page *page = virt_to_page(data); /* Assumes order0 page*/
+
+ put_page(page);
+ }
+}
+
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
      struct net_device *dev, u32 queue_index);
 void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
 void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
 bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
+int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
+       enum xdp_mem_type type, void *allocator);
 
 #endif /* __LINUX_NET_XDP_H__ */
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index e4ef747915a0..1507e305ecfc 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -19,6 +19,7 @@
 #include <linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/ptr_ring.h>
+#include <net/xdp.h>
 
 #include <linux/sched.h>
 #include <linux/workqueue.h>
@@ -143,27 +144,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
  return ERR_PTR(err);
 }
 
-void __cpu_map_queue_destructor(void *ptr)
-{
- /* The tear-down procedure should have made sure that queue is
- * empty.  See __cpu_map_entry_replace() and work-queue
- * invoked cpu_map_kthread_stop(). Catch any broken behaviour
- * gracefully and warn once.
- */
- if (WARN_ON_ONCE(ptr))
- page_frag_free(ptr);
-}
-
-static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
-{
- if (atomic_dec_and_test(&rcpu->refcnt)) {
- /* The queue should be empty at this point */
- ptr_ring_cleanup(rcpu->queue, __cpu_map_queue_destructor);
- kfree(rcpu->queue);
- kfree(rcpu);
- }
-}
-
 static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
 {
  atomic_inc(&rcpu->refcnt);
@@ -194,6 +174,10 @@ struct xdp_pkt {
  u16 len;
  u16 headroom;
  u16 metasize;
+ /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
+ * while mem info is valid on remote CPU.
+ */
+ struct xdp_mem_info mem;
  struct net_device *dev_rx;
 };
 
@@ -219,6 +203,9 @@ static struct xdp_pkt *convert_to_xdp_pkt(struct xdp_buff *xdp)
  xdp_pkt->headroom = headroom - sizeof(*xdp_pkt);
  xdp_pkt->metasize = metasize;
 
+ /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
+ xdp_pkt->mem = xdp->rxq->mem;
+
  return xdp_pkt;
 }
 
@@ -271,6 +258,31 @@ struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
  return skb;
 }
 
+static void __cpu_map_ring_cleanup(struct ptr_ring *ring)
+{
+ /* The tear-down procedure should have made sure that queue is
+ * empty.  See __cpu_map_entry_replace() and work-queue
+ * invoked cpu_map_kthread_stop(). Catch any broken behaviour
+ * gracefully and warn once.
+ */
+ struct xdp_pkt *xdp_pkt;
+
+ while ((xdp_pkt = ptr_ring_consume(ring)))
+ if (WARN_ON_ONCE(xdp_pkt))
+ xdp_return_frame(xdp_pkt, &xdp_pkt->mem);
+}
+
+static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
+{
+ if (atomic_dec_and_test(&rcpu->refcnt)) {
+ /* The queue should be empty at this point */
+ __cpu_map_ring_cleanup(rcpu->queue);
+ ptr_ring_cleanup(rcpu->queue, NULL);
+ kfree(rcpu->queue);
+ kfree(rcpu);
+ }
+}
+
 static int cpu_map_kthread_run(void *data)
 {
  struct bpf_cpu_map_entry *rcpu = data;
@@ -313,7 +325,7 @@ static int cpu_map_kthread_run(void *data)
 
  skb = cpu_map_build_skb(rcpu, xdp_pkt);
  if (!skb) {
- page_frag_free(xdp_pkt);
+ xdp_return_frame(xdp_pkt, &xdp_pkt->mem);
  continue;
  }
 
@@ -609,13 +621,13 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu,
  spin_lock(&q->producer_lock);
 
  for (i = 0; i < bq->count; i++) {
- void *xdp_pkt = bq->q[i];
+ struct xdp_pkt *xdp_pkt = bq->q[i];
  int err;
 
  err = __ptr_ring_produce(q, xdp_pkt);
  if (err) {
  drops++;
- page_frag_free(xdp_pkt); /* Free xdp_pkt */
+ xdp_return_frame(xdp_pkt->data, &xdp_pkt->mem);
  }
  processed++;
  }
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 097a0f74e004..7e6b3545277d 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -71,3 +71,21 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq)
  return (xdp_rxq->reg_state == REG_STATE_REGISTERED);
 }
 EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg);
+
+int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
+       enum xdp_mem_type type, void *allocator)
+{
+ if (type >= MEM_TYPE_MAX)
+ return -EINVAL;
+
+ xdp_rxq->mem.type = type;
+
+ if (allocator)
+ return -EOPNOTSUPP;
+
+ /* TODO: Allocate an ID that maps to allocator pointer
+ * See: https://www.kernel.org/doc/html/latest/core-api/idr.html
+ */
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 15/40] ixgbe: use xdp_return_frame API

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

Extend struct ixgbe_tx_buffer to store the xdp_mem_info.

Notice that this could be optimized further by putting this into
a union in the struct ixgbe_tx_buffer, but this patchset
works towards removing this again.  Thus, this is not done.

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit 189ead81a83eba5f5c5ce56c45620e51abcb5cb8)
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      | 1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 ++++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 8611763d6129..94a77ef2bffd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -247,6 +247,7 @@ struct ixgbe_tx_buffer {
  DEFINE_DMA_UNMAP_ADDR(dma);
  DEFINE_DMA_UNMAP_LEN(len);
  u32 tx_flags;
+ struct xdp_mem_info xdp_mem;
 };
 
 struct ixgbe_rx_buffer {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 49028d005eae..25ebb81ee1bb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1207,7 +1207,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 
  /* free the skb */
  if (ring_is_xdp(tx_ring))
- page_frag_free(tx_buffer->data);
+ xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem);
  else
  napi_consume_skb(tx_buffer->skb, napi_budget);
 
@@ -5880,7 +5880,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 
  /* Free all the Tx ring sk_buffs */
  if (ring_is_xdp(tx_ring))
- page_frag_free(tx_buffer->data);
+ xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem);
  else
  dev_kfree_skb_any(tx_buffer->skb);
 
@@ -8476,6 +8476,8 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
  dma_unmap_len_set(tx_buffer, len, len);
  dma_unmap_addr_set(tx_buffer, dma, dma);
  tx_buffer->data = xdp->data;
+ tx_buffer->xdp_mem = xdp->rxq->mem;
+
  tx_desc->read.buffer_addr = cpu_to_le64(dma);
 
  /* put descriptor type bits */
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 16/40] xdp: move struct xdp_buff from filter.h to xdp.h

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

This is done to prepare for the next patch, and it is also
nice to move this XDP related struct out of filter.h.

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit 106ca27f2922e8de820d1bd3d79b1cbdf2d78eea)
[ vilhelmgray: context adjustment ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 include/linux/filter.h | 24 +-----------------------
 include/net/xdp.h      | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 449091818f43..ef3df706b417 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -29,6 +29,7 @@ struct sock;
 struct seccomp_data;
 struct bpf_prog_aux;
 struct xdp_rxq_info;
+struct xdp_buff;
 
 /* ArgX, context and stack frame pointer register positions. Note,
  * Arg1, Arg2, Arg3, etc are used as argument mappings of function
@@ -489,14 +490,6 @@ struct bpf_skb_data_end {
  void *data_end;
 };
 
-struct xdp_buff {
- void *data;
- void *data_end;
- void *data_meta;
- void *data_hard_start;
- struct xdp_rxq_info *rxq;
-};
-
 /* Compute the linear packet data range [data, data_end) which
  * will be accessed by various program types (cls_bpf, act_bpf,
  * lwt, ...). Subsystems allowing direct data access must (!)
@@ -731,21 +724,6 @@ int xdp_do_redirect(struct net_device *dev,
     struct bpf_prog *prog);
 void xdp_do_flush_map(void);
 
-/* Drivers not supporting XDP metadata can use this helper, which
- * rejects any room expansion for metadata as a result.
- */
-static __always_inline void
-xdp_set_data_meta_invalid(struct xdp_buff *xdp)
-{
- xdp->data_meta = xdp->data + 1;
-}
-
-static __always_inline bool
-xdp_data_meta_unsupported(const struct xdp_buff *xdp)
-{
- return unlikely(xdp->data_meta > xdp->data);
-}
-
 void bpf_warn_invalid_xdp_action(u32 act);
 
 struct sock *do_sk_redirect_map(struct sk_buff *skb);
diff --git a/include/net/xdp.h b/include/net/xdp.h
index e4207699c410..15f8ade008b5 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -50,6 +50,13 @@ struct xdp_rxq_info {
  struct xdp_mem_info mem;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
+struct xdp_buff {
+ void *data;
+ void *data_end;
+ void *data_meta;
+ void *data_hard_start;
+ struct xdp_rxq_info *rxq;
+};
 
 static inline
 void xdp_return_frame(void *data, struct xdp_mem_info *mem)
@@ -72,4 +79,19 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
 int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
        enum xdp_mem_type type, void *allocator);
 
+/* Drivers not supporting XDP metadata can use this helper, which
+ * rejects any room expansion for metadata as a result.
+ */
+static __always_inline void
+xdp_set_data_meta_invalid(struct xdp_buff *xdp)
+{
+ xdp->data_meta = xdp->data + 1;
+}
+
+static __always_inline bool
+xdp_data_meta_unsupported(const struct xdp_buff *xdp)
+{
+ return unlikely(xdp->data_meta > xdp->data);
+}
+
 #endif /* __LINUX_NET_XDP_H__ */
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 17/40] xdp: introduce a new xdp_frame type

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

This is needed to convert drivers tuntap and virtio_net.

This is a generalization of what is done inside cpumap, which will be
converted later.

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(cherry picked from commit c0048cff8abb69c956ce1277d17a3f7a14e41522)
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 include/net/xdp.h | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 15f8ade008b5..756c42811e78 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -58,6 +58,46 @@ struct xdp_buff {
  struct xdp_rxq_info *rxq;
 };
 
+struct xdp_frame {
+ void *data;
+ u16 len;
+ u16 headroom;
+ u16 metasize;
+ /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
+ * while mem info is valid on remote CPU.
+ */
+ struct xdp_mem_info mem;
+};
+
+/* Convert xdp_buff to xdp_frame */
+static inline
+struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
+{
+ struct xdp_frame *xdp_frame;
+ int metasize;
+ int headroom;
+
+ /* Assure headroom is available for storing info */
+ headroom = xdp->data - xdp->data_hard_start;
+ metasize = xdp->data - xdp->data_meta;
+ metasize = metasize > 0 ? metasize : 0;
+ if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
+ return NULL;
+
+ /* Store info in top of packet */
+ xdp_frame = xdp->data_hard_start;
+
+ xdp_frame->data = xdp->data;
+ xdp_frame->len  = xdp->data_end - xdp->data;
+ xdp_frame->headroom = headroom - sizeof(*xdp_frame);
+ xdp_frame->metasize = metasize;
+
+ /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
+ xdp_frame->mem = xdp->rxq->mem;
+
+ return xdp_frame;
+}
+
 static inline
 void xdp_return_frame(void *data, struct xdp_mem_info *mem)
 {
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 18/40] tun: convert to use generic xdp_frame and xdp_return_frame API

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

The tuntap driver invented it's own driver specific way of queuing
XDP packets, by storing the xdp_buff information in the top of
the XDP frame data.

Convert it over to use the more generic xdp_frame structure.  The
main problem with the in-driver method is that the xdp_rxq_info pointer
cannot be trused/used when dequeueing the frame.

V3: Remove check based on feedback from Jason

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit 1ffcbc8537d0bc32aaca7000cb9c904ec4b6300f)
[ vilhelmgray: context adjustments ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/tun.c      | 43 ++++++++++++++++++++----------------------
 drivers/vhost/net.c    |  7 ++++---
 include/linux/if_tun.h |  4 ++--
 3 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1d7f953d83e4..25962a060574 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -236,11 +236,11 @@ struct tun_struct {
  struct bpf_prog __rcu *xdp_prog;
 };
 
-bool tun_is_xdp_buff(void *ptr)
+bool tun_is_xdp_frame(void *ptr)
 {
  return (unsigned long)ptr & TUN_XDP_FLAG;
 }
-EXPORT_SYMBOL(tun_is_xdp_buff);
+EXPORT_SYMBOL(tun_is_xdp_frame);
 
 void *tun_xdp_to_ptr(void *ptr)
 {
@@ -624,10 +624,10 @@ static void tun_ptr_free(void *ptr)
 {
  if (!ptr)
  return;
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
 
- put_page(virt_to_head_page(xdp->data));
+ xdp_return_frame(xdpf->data, &xdpf->mem);
  } else {
  __skb_array_destroy_skb(ptr);
  }
@@ -1233,17 +1233,14 @@ static const struct net_device_ops tun_netdev_ops = {
 static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
 {
  struct tun_struct *tun = netdev_priv(dev);
- struct xdp_buff *buff = xdp->data_hard_start;
- int headroom = xdp->data - xdp->data_hard_start;
+ struct xdp_frame *frame;
  struct tun_file *tfile;
  u32 numqueues;
  int ret = 0;
 
- /* Assure headroom is available and buff is properly aligned */
- if (unlikely(headroom < sizeof(*xdp) || tun_is_xdp_buff(xdp)))
- return -ENOSPC;
-
- *buff = *xdp;
+ frame = convert_to_xdp_frame(xdp);
+ if (unlikely(!frame))
+ return -EOVERFLOW;
 
  rcu_read_lock();
 
@@ -1258,7 +1255,7 @@ static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
  /* Encode the XDP flag into lowest bit for consumer to differ
  * XDP buffer from sk_buff.
  */
- if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(buff))) {
+ if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(frame))) {
  this_cpu_inc(tun->pcpu_stats->tx_dropped);
  ret = -ENOSPC;
  }
@@ -1969,11 +1966,11 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 static ssize_t tun_put_user_xdp(struct tun_struct *tun,
  struct tun_file *tfile,
- struct xdp_buff *xdp,
+ struct xdp_frame *xdp_frame,
  struct iov_iter *iter)
 {
  int vnet_hdr_sz = 0;
- size_t size = xdp->data_end - xdp->data;
+ size_t size = xdp_frame->len;
  struct tun_pcpu_stats *stats;
  size_t ret;
 
@@ -1989,7 +1986,7 @@ static ssize_t tun_put_user_xdp(struct tun_struct *tun,
  iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso));
  }
 
- ret = copy_to_iter(xdp->data, size, iter) + vnet_hdr_sz;
+ ret = copy_to_iter(xdp_frame->data, size, iter) + vnet_hdr_sz;
 
  stats = get_cpu_ptr(tun->pcpu_stats);
  u64_stats_update_begin(&stats->syncp);
@@ -2161,11 +2158,11 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
  return err;
  }
 
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
 
- ret = tun_put_user_xdp(tun, tfile, xdp, to);
- put_page(virt_to_head_page(xdp->data));
+ ret = tun_put_user_xdp(tun, tfile, xdpf, to);
+ xdp_return_frame(xdpf->data, &xdpf->mem);
  } else {
  struct sk_buff *skb = ptr;
 
@@ -2315,10 +2312,10 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 static int tun_ptr_peek_len(void *ptr)
 {
  if (likely(ptr)) {
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
 
- return xdp->data_end - xdp->data;
+ return xdpf->len;
  }
  return __skb_array_len_with_tag(ptr);
  } else {
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2f3d5298d630..07ca1c0e6fc4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -32,6 +32,7 @@
 #include <linux/skbuff.h>
 
 #include <net/sock.h>
+#include <net/xdp.h>
 
 #include "vhost.h"
 
@@ -183,10 +184,10 @@ static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
 
 static int vhost_net_buf_peek_len(void *ptr)
 {
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
 
- return xdp->data_end - xdp->data;
+ return xdpf->len;
  }
 
  return __skb_array_len_with_tag(ptr);
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 08e66827ad8e..0ab590e41913 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -22,7 +22,7 @@
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);
 struct ptr_ring *tun_get_tx_ring(struct file *file);
-bool tun_is_xdp_buff(void *ptr);
+bool tun_is_xdp_frame(void *ptr);
 void *tun_xdp_to_ptr(void *ptr);
 void *tun_ptr_to_xdp(void *ptr);
 #else
@@ -38,7 +38,7 @@ static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
 {
  return ERR_PTR(-EINVAL);
 }
-static inline bool tun_is_xdp_buff(void *ptr)
+static inline bool tun_is_xdp_frame(void *ptr)
 {
  return false;
 }
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU][B:linux-azure-4.15][PATCH 19/40] virtio_net: convert to use generic xdp_frame and xdp_return_frame API

William Breathitt Gray
In reply to this post by William Breathitt Gray
From: Jesper Dangaard Brouer <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1877654

The virtio_net driver assumes XDP frames are always released based on
page refcnt (via put_page).  Thus, is only queues the XDP data pointer
address and uses virt_to_head_page() to retrieve struct page.

Use the XDP return API to get away from such assumptions. Instead
queue an xdp_frame, which allow us to use the xdp_return_frame API,
when releasing the frame.

V8: Avoid endianness issues (found by kbuild test robot)
V9: Change __virtnet_xdp_xmit from bool to int return value (found by Dan Carpenter)

Signed-off-by: Jesper Dangaard Brouer <[hidden email]>
Signed-off-by: David S. Miller <[hidden email]>
(backported from commit cac320c850efb25480cd0f71383b84ec61c0e138)
[ vilhelmgray: context adjustment ]
Signed-off-by: William Breathitt Gray <[hidden email]>
---
 drivers/net/virtio_net.c | 54 +++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 80aad47c8c97..280bab31a2d3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -398,38 +398,48 @@ static void virtnet_xdp_flush(struct net_device *dev)
  virtqueue_kick(sq->vq);
 }
 
-static bool __virtnet_xdp_xmit(struct virtnet_info *vi,
-       struct xdp_buff *xdp)
+static int __virtnet_xdp_xmit(struct virtnet_info *vi,
+      struct xdp_buff *xdp)
 {
  struct virtio_net_hdr_mrg_rxbuf *hdr;
- unsigned int len;
+ struct xdp_frame *xdpf, *xdpf_sent;
  struct send_queue *sq;
+ unsigned int len;
  unsigned int qp;
- void *xdp_sent;
  int err;
 
  qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
  sq = &vi->sq[qp];
 
  /* Free up any pending old buffers before queueing new ones. */
- while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
- struct page *sent_page = virt_to_head_page(xdp_sent);
+ while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+ xdp_return_frame(xdpf_sent->data, &xdpf_sent->mem);
 
- put_page(sent_page);
- }
+ xdpf = convert_to_xdp_frame(xdp);
+ if (unlikely(!xdpf))
+ return -EOVERFLOW;
+
+ /* virtqueue want to use data area in-front of packet */
+ if (unlikely(xdpf->metasize > 0))
+ return -EOPNOTSUPP;
 
- xdp->data -= vi->hdr_len;
+ if (unlikely(xdpf->headroom < vi->hdr_len))
+ return -EOVERFLOW;
+
+ /* Make room for virtqueue hdr (also change xdpf->headroom?) */
+ xdpf->data -= vi->hdr_len;
  /* Zero header and leave csum up to XDP layers */
- hdr = xdp->data;
+ hdr = xdpf->data;
  memset(hdr, 0, vi->hdr_len);
+ xdpf->len   += vi->hdr_len;
 
- sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
+ sg_init_one(sq->sg, xdpf->data, xdpf->len);
 
- err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp->data, GFP_ATOMIC);
+ err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdpf, GFP_ATOMIC);
  if (unlikely(err))
- return false; /* Caller handle free/refcnt */
+ return -ENOSPC; /* Caller handle free/refcnt */
 
- return true;
+ return 0;
 }
 
 static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
@@ -437,7 +447,6 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
  struct virtnet_info *vi = netdev_priv(dev);
  struct receive_queue *rq = vi->rq;
  struct bpf_prog *xdp_prog;
- bool sent;
 
  /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
  * indicate XDP resources have been successfully allocated.
@@ -446,10 +455,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
  if (!xdp_prog)
  return -ENXIO;
 
- sent = __virtnet_xdp_xmit(vi, xdp);
- if (!sent)
- return -ENOSPC;
- return 0;
+ return __virtnet_xdp_xmit(vi, xdp);
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
@@ -537,7 +543,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
  struct page *page = virt_to_head_page(buf);
  unsigned int delta = 0;
  struct page *xdp_page;
- bool sent;
  int err;
 
  len -= vi->hdr_len;
@@ -588,8 +593,8 @@ static struct sk_buff *receive_small(struct net_device *dev,
  delta = orig_data - xdp.data;
  break;
  case XDP_TX:
- sent = __virtnet_xdp_xmit(vi, &xdp);
- if (unlikely(!sent)) {
+ err = __virtnet_xdp_xmit(vi, &xdp);
+ if (unlikely(err)) {
  trace_xdp_exception(vi->dev, xdp_prog, act);
  goto err_xdp;
  }
@@ -674,7 +679,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
  unsigned int truesize;
  unsigned int headroom = mergeable_ctx_to_headroom(ctx);
  int err;
- bool sent;
 
  head_skb = NULL;
 
@@ -743,8 +747,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
  }
  break;
  case XDP_TX:
- sent = __virtnet_xdp_xmit(vi, &xdp);
- if (unlikely(!sent)) {
+ err = __virtnet_xdp_xmit(vi, &xdp);
+ if (unlikely(err)) {
  trace_xdp_exception(vi->dev, xdp_prog, act);
  if (unlikely(xdp_page != page))
  put_page(xdp_page);
--
2.25.1


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
123