Skip to content

Files

Latest commit

author
Christopher McClister
Mar 15, 2022
220a43e · Mar 15, 2022

History

History
285 lines (202 loc) · 12 KB

setup-dpdk.md

File metadata and controls

285 lines (202 loc) · 12 KB
title description services documentationcenter author manager editor ms.assetid ms.service ms.topic ms.tgt_pltfrm ms.workload ms.date ms.author
DPDK in an Azure Linux VM | Microsoft Docs
Learn the benefits of the Data Plane Development Kit (DPDK) and how to set up the DPDK on a Linux virtual machine.
virtual-network
na
laxmanrb
gedegrac
virtual-network
how-to
na
infrastructure-services
05/12/2020
labattul

Set up DPDK in a Linux virtual machine

Data Plane Development Kit (DPDK) on Azure offers a faster user-space packet processing framework for performance-intensive applications. This framework bypasses the virtual machine’s kernel network stack.

In typical packet processing that uses the kernel network stack, the process is interrupt-driven. When the network interface receives incoming packets, there is a kernel interrupt to process the packet and a context switch from the kernel space to the user space. DPDK eliminates context switching and the interrupt-driven method in favor of a user-space implementation that uses poll mode drivers for fast packet processing.

DPDK consists of sets of user-space libraries that provide access to lower-level resources. These resources can include hardware, logical cores, memory management, and poll mode drivers for network interface cards.

DPDK can run on Azure virtual machines that are supporting multiple operating system distributions. DPDK provides key performance differentiation in driving network function virtualization implementations. These implementations can take the form of network virtual appliances (NVAs), such as virtual routers, firewalls, VPNs, load balancers, evolved packet cores, and denial-of-service (DDoS) applications.

Benefit

Higher packets per second (PPS): Bypassing the kernel and taking control of packets in the user space reduces the cycle count by eliminating context switches. It also improves the rate of packets that are processed per second in Azure Linux virtual machines.

Supported operating systems minimum versions

The following distributions from the Azure Marketplace are supported:

Linux OS Kernel version
Ubuntu 18.04 4.15.0-1014-azure+
SLES 15 SP1 4.12.14-8.19-azure+
RHEL 7.5 3.10.0-862.11.6.el7.x86_64+
CentOS 7.5 3.10.0-862.11.6.el7.x86_64+
Debian 10 4.19.0-1-cloud+

The noted versions are the minimum requirements. Newer versions are supported too.

Custom kernel support

For any Linux kernel version that's not listed, see Patches for building an Azure-tuned Linux kernel. For more information, you can also contact aznetdpdk@microsoft.com.

Region support

All Azure regions support DPDK.

Prerequisites

Accelerated networking must be enabled on a Linux virtual machine. The virtual machine should have at least two network interfaces, with one interface for management. Enabling Accelerated networking on management interface is not recommended. Learn how to create a Linux virtual machine with accelerated networking enabled.

On virtual machines that are using InfiniBand, ensure the appropriate mlx4_ib or mlx5_ib drivers are loaded, see Enable InfiniBand.

Install DPDK via system package (recommended)

Ubuntu 18.04

sudo add-apt-repository ppa:canonical-server/server-backports -y
sudo apt-get update
sudo apt-get install -y dpdk

Ubuntu 20.04 and newer

sudo apt-get install -y dpdk

Debian 10 and newer

sudo apt-get install -y dpdk

Install DPDK manually (not recommended)

Install build dependencies

Ubuntu 18.04

sudo add-apt-repository ppa:canonical-server/server-backports -y
sudo apt-get update
sudo apt-get install -y build-essential librdmacm-dev libnuma-dev libmnl-dev meson

Ubuntu 20.04 and newer

sudo apt-get install -y build-essential librdmacm-dev libnuma-dev libmnl-dev meson

Debian 10 and newer

sudo apt-get install -y build-essential librdmacm-dev libnuma-dev libmnl-dev meson

RHEL7.5/CentOS 7.5

yum -y groupinstall "Infiniband Support"
sudo dracut --add-drivers "mlx4_en mlx4_ib mlx5_ib" -f
yum install -y gcc kernel-devel-`uname -r` numactl-devel.x86_64 librdmacm-devel libmnl-devel meson

SLES 15 SP1

Azure kernel

zypper  \
  --no-gpg-checks \
  --non-interactive \
  --gpg-auto-import-keys install kernel-azure kernel-devel-azure gcc make libnuma-devel numactl librdmacm1 rdma-core-devel meson

Default kernel

zypper \
  --no-gpg-checks \
  --non-interactive \
  --gpg-auto-import-keys install kernel-default-devel gcc make libnuma-devel numactl librdmacm1 rdma-core-devel meson

Compile and install DPDK manually

  1. Download the latest DPDK. Version 19.11 LTS or newer is required for Azure.
  2. Build the default config with meson builddir.
  3. Compile with ninja -C builddir.
  4. Install with DESTDIR=<output folder> ninja -C builddir install.

Configure the runtime environment

After restarting, run the following commands once:

  1. Hugepages

    • Configure hugepage by running the following command, once for each numa node:

      echo 1024 | sudo tee /sys/devices/system/node/node*/hugepages/hugepages-2048kB/nr_hugepages
    • Create a directory for mounting with mkdir /mnt/huge.

    • Mount hugepages with mount -t hugetlbfs nodev /mnt/huge.

    • Check that hugepages are reserved with grep Huge /proc/meminfo.

      [NOTE] There is a way to modify the grub file so that hugepages are reserved on boot by following the instructions for the DPDK. The instructions are at the bottom of the page. When you're using an Azure Linux virtual machine, modify files under /etc/config/grub.d instead, to reserve hugepages across reboots.

  2. MAC & IP addresses: Use ifconfig –a to view the MAC and IP address of the network interfaces. The VF network interface and NETVSC network interface have the same MAC address, but only the NETVSC network interface has an IP address. VF interfaces are running as subordinate interfaces of NETVSC interfaces.

  3. PCI addresses

    • Use ethtool -i <vf interface name> to find out which PCI address to use for VF.
    • If eth0 has accelerated networking enabled, make sure that testpmd doesn’t accidentally take over the VF pci device for eth0. If the DPDK application accidentally takes over the management network interface and causes you to lose your SSH connection, use the serial console to stop the DPDK application. You can also use the serial console to stop or start the virtual machine.
  4. Load ibuverbs on each reboot with modprobe -a ib_uverbs. For SLES 15 only, also load mlx4_ib with modprobe -a mlx4_ib.

Failsafe PMD

DPDK applications must run over the failsafe PMD that is exposed in Azure. If the application runs directly over the VF PMD, it doesn't receive all packets that are destined to the VM, since some packets show up over the synthetic interface.

If you run a DPDK application over the failsafe PMD, it guarantees that the application receives all packets that are destined to it. It also makes sure that the application keeps running in DPDK mode, even if the VF is revoked when the host is being serviced. For more information about failsafe PMD, see Fail-safe poll mode driver library.

Run testpmd

To run testpmd in root mode, use sudo before the testpmd command.

Basic: Sanity check, failsafe adapter initialization

  1. Run the following commands to start a single port testpmd application:

    testpmd -w <pci address from previous step> \
      --vdev="net_vdev_netvsc0,iface=eth1" \
      -- -i \
      --port-topology=chained
  2. Run the following commands to start a dual port testpmd application:

    testpmd -w <pci address nic1> \
    -w <pci address nic2> \
    --vdev="net_vdev_netvsc0,iface=eth1" \
    --vdev="net_vdev_netvsc1,iface=eth2" \
    -- -i

    If you're running testpmd with more than two NICs, the --vdev argument follows this pattern: net_vdev_netvsc<id>,iface=<vf’s pairing eth>.

  3. After it's started, run show port info all to check port information. You should see one or two DPDK ports that are net_failsafe (not net_mlx4).

  4. Use start <port> /stop <port> to start traffic.

The previous commands start testpmd in interactive mode, which is recommended for trying out testpmd commands.

Basic: Single sender/single receiver

The following commands periodically print the packets per second statistics:

  1. On the TX side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address of the device you plan to use> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      -- --port-topology=chained \
      --nb-cores <number of cores to use for test pmd> \
      --forward-mode=txonly \
      --eth-peer=<port id>,<receiver peer MAC address> \
      --stats-period <display interval in seconds>
  2. On the RX side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address of the device you plan to use> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      -- --port-topology=chained \
      --nb-cores <number of cores to use for test pmd> \
      --forward-mode=rxonly \
      --eth-peer=<port id>,<sender peer MAC address> \
      --stats-period <display interval in seconds>

When you're running the previous commands on a virtual machine, change IP_SRC_ADDR and IP_DST_ADDR in app/test-pmd/txonly.c to match the actual IP address of the virtual machines before you compile. Otherwise, the packets are dropped before reaching the receiver.

Advanced: Single sender/single forwarder

The following commands periodically print the packets per second statistics:

  1. On the TX side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address of the device you plan to use> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      -- --port-topology=chained \
      --nb-cores <number of cores to use for test pmd> \
      --forward-mode=txonly \
      --eth-peer=<port id>,<receiver peer MAC address> \
      --stats-period <display interval in seconds>
  2. On the FWD side, run the following command:

    testpmd \
      -l <core-list> \
      -n <num of mem channels> \
      -w <pci address NIC1> \
      -w <pci address NIC2> \
      --vdev="net_vdev_netvsc<id>,iface=<the iface to attach to>" \
      --vdev="net_vdev_netvsc<2nd id>,iface=<2nd iface to attach to>" (you need as many --vdev arguments as the number of devices used by testpmd, in this case) \
      -- --nb-cores <number of cores to use for test pmd> \
      --forward-mode=io \
      --eth-peer=<recv port id>,<sender peer MAC address> \
      --stats-period <display interval in seconds>

When you're running the previous commands on a virtual machine, change IP_SRC_ADDR and IP_DST_ADDR in app/test-pmd/txonly.c to match the actual IP address of the virtual machines before you compile. Otherwise, the packets are dropped before reaching the forwarder. You won’t be able to have a third machine receive forwarded traffic, because the testpmd forwarder doesn’t modify the layer-3 addresses, unless you make some code changes.

References