Supporting both VMDq and RSS in Snabb

January 15, 2018 · 5 min read

Blog Author

In my previous blog post, I talked about the support libraries and the core structure of Snabb's NIC drivers. In this post, I'll talk about some of the driver improvements we made at Igalia over the last few months.

(as in my previous post, this work was joint work with Nicola Larosa)

Background

Modern NICs are designed to take advantage of increasing parallelism in modern CPUs in order to scale to larger workloads.

In particular, to scale to 100G workloads, it becomes necessary to work in parallel since a single off-the-shelf core cannot keep up. Even with 10G hardware, processing packets in parallel makes it easier for software to operate at line-rate because the time budget is quite tight.

To get an idea of what the time budget is like, see these calculations. tl;dr is 67.2 ns/packet or about 201 cycles.

To scale to multiple CPUs, NICs have a feature called receive-side scaling or RSS which distributes incoming packets to multiple receive queues. These queues can be serviced by separate cores.

note

RSS and related features for Intel NICs are detailed more in an overview whitepaper

RSS works by computing a hash in hardware over the packet to determine the flow it belongs to (this is similar to the hashing used in IPFIX, which I described in a previous blog post).

The diagram above tries to illustrate this. When a packet arrives in the NIC, the hash is computed. Packets with the same hash (i.e., they're in the same flow) are directed to a particular receive queue. Receive queues live in RAM as a ring buffer (shown as blue rings in the diagram) and packets are placed there via DMA by consulting registers on the NIC.

All this means that network functions that depend on tracking flow-related state can usually still work in this parallel setup.

As a side note, you might wonder (I did anyway!) what happens to fragmented packets whose flow membership may not be identifiable from a fragment. It turns out that on Intel NICs, the hash function will ignore the layer 3 flow information when a packet is fragmented. This means that on occasion a fragmented packet may end up on a different queue than a non-fragmented packet in the same flow. More on this problem here.

Snabb's two Intel drivers

The existing driver used in most Snabb programs (apps.intel.intel_app) worked well and was mature but was missing support for RSS.

An alternate driver (apps.intel_mp.intel_mp) made by Peter Bristow supported RSS, but wasn't entirely compatible with the features provided by the main Intel driver. We worked on extending intel_mp to work as a more-or-less drop in replacement for intel_app.

The incompatibility between the two drivers was caused mainly by lack of support for VMDq (Virtual Machine Device Queues) in intel_mp. This is another feature that allows for multiple queue operation on Intel NICs that is used to allow a NIC to present itself as multiple virtualized sets of queues. It's often used to host VMs in a virtualized environment, but can also be used (as in Snabb) for serving logically separate apps.

The basic idea is that queues may be assigned to separate pools assigned to a VM or app with its own particular MAC address. A host can use this to run logically separate network functions sharing a single NIC. As with RSS, services running on separate cores can service the queues in parallel.

A diagram showing how VMDq affects queue selection

As the diagram above shows, adding VMDq changes queue selection slightly from the RSS case above. An appropriate pool is selected based on criteria such as the MAC address (or VLAN tag, and so on) and then RSS may be used.

note

BTW, VMDq is not the only virtualization feature on these NICs. There is also SR-IOV or "Single Root I/O Virtualization" which is designed to provide a virtualized NIC for every VM that directly uses the NIC hardware resources. My understanding is that Snabb doesn't use it for now because we can implement more switching flexibility in software.

The intel_app driver supports VMDq but not RSS and the opposite situation is true for intel_mp. It turns out that both features can be used simultaneously, in which case packets are first sorted by MAC address and then by flow hashing for RSS. Basically each VMDq pool has its own set of RSS queues.

We implemented this support in the intel_mp driver and made the driver interface mostly compatible with intel_app so that only minimal modifications are necessary to switch over. In the process, we made bug-fixes and performance fixes in the driver to try to ensure that performance and reliability are comparable to using intel_app.

The development process was made a lot easier due to the existence of the intel_app code that we could copy and follow in many cases.

The tricky parts were making sure that the NIC state was set correctly when multiple processes were using the NIC. In particular, intel_app can rely on tracking VMDq state inside a single Lua process.

For intel_mp, it is necessary to use locking and IPC (via shared memory) to coordinate between different Lua processes that are setting driver state. In particular, the driver needs to be careful to be aware of what resources (VMDq pool numbers, MAC address registers, etc.) are available for use.

Current status

The driver improvements are now merged upstream in intel_mp, which is now the default driver, and is available in the Snabb 2017.11 "Endive" release. It's still possible to opt out and use the old driver in case there are any problems with using intel_mp. And of course we appreciate any bug reports or feedback.

Background​

Snabb's two Intel drivers​

Current status​

Background

Snabb's two Intel drivers

Current status