Writing network drivers in a high-level language

November 22, 2017 · 7 min read

Blog Author

Another day, another post about Snabb. Today, I'll start to explain some work I've been doing at Igalia for Deutsche Telekom on driver development. All the DT driver work I'll be talking about was joint work with Nicola Larosa.

When writing a networking program with Snabb, the program has to get some packets to crunch on from somewhere. Like everything else in Snabb, these packets come from an app.

These source apps might be a synthetic packet generator or, for anything running on real hardware, a network driver that talks to a NIC (a network card). That network driver part is the subject of this blog post.

These network drivers are written in LuaJIT like the rest of the system. This is maybe not that surprising if you know Snabb does kernel-bypass networking (like DPDK or other similar approaches), but it's still quite remarkable! The vast majority of drivers that people are familiar with (graphics drivers, wifi drivers, or that obscure CueCat driver) are written in C.

For the Igalia project, we worked on extending the existing Snabb drivers for Intel NICs with some extra features. I'll talk more about the new work that we did specifically in a second blog post. For this post, I'll introduce how we can even write a driver in Lua.

(and to be clear, the existing Snabb drivers aren't my work; they're the work of some excellent Snabb hackers like Luke Gorrie and others)

For the nitty-gritty details about how Snabb bypasses the kernel to let a LuaJIT program operate on the NIC, I recommend reading Luke Gorrie's neat blog post about it. In this post, I'll talk about what happens once user-space has a hold on the network card.

Driver infrastructure

When a driver starts up, it of course needs to initialize the hardware. The datasheet for the Intel 82599 NIC for example dedicates an entire chapter to this. A lot of the initialization process consists of poking at the appropriate configuration registers on the device, waiting for things to power up and tell you they're ready, and so on.

To actually poke at these registers, the driver uses memory-mapped I/O to the PCI device. The MMIO memory is, as far as LuaJIT and we are concerned, just a pointer to a big chunk of memory given to us by the hardware.pci library via the FFI.

It's up to us to interpret this returned uint32_t pointer in a useful way. Specifically, we know certain offsets into this memory are mapped to registers as specified in the datasheet.

Since we're living in a high-level language, we want to hide away the pointer arithmetic needed to access these registers. So Snabb has a little DSL in the lib.hardware.register library that takes text descriptions of registers like this:

-- Name        Address / layout        Read-write status/description
array_registers = [[
   RSSRK       0x5C80 +0x04*0..9       RW RSS Random Key
]]

and then lets you map them into a register table:

my_regs = {}

pci      = require 'lib.hardware.pci'
register = require 'lib.hardware.register'

-- get a pointer to MMIO for some PCI address
base_ptr = pci.map_pci_memory_unlocked("02:00.01", 0)

-- defines (an array of) registers in my_regs
register.define_array(array_registers, my_regs, base_ptr)

After defining these registers, you can use the my_regs table to access the registers like any other Lua data. For example, the "RSS Random Key" array of registers can be initialized with some random data like this:

for i=0, 9 do
   my_regs.RSSRK[i](math.random(2^32))
end

This code looks like straightforward Lua code, but it's poking at the NIC's configuration registers. These registers are also often manipulated at the bit level, and there is some library support for that in the lib.bits.

For example, here are some prose instructions to initialize a certain part of the NIC from the datasheet:

Disable TC arbitrations while enabling the packet buffer free space monitor:

  — Tx Descriptor Plane Control and Status (RTTDCS), bits:
  TDPAC=0b, VMPAC=1b, TDRM=0b, BDPM=1b, BPBFSM=0b

This is basically instructing the implementor to set some bits and clear some bits in the RTTDCS register, which can be translated into some code that looks like this:

bits = require "lib.bits"

-- clear these bits
my_regs.RTTDCS:clr(bits { TDPAC=0, TDRM=4, BPBFSM=23 })

-- set these bits
my_regs.RTTDCS:set(bits { VMPAC=1, BDPM=22 })

The bits function just takes a table of bit offsets to set (the table key strings only matter for documentation's sake) and turns it into a number to use for setting a register. It's possible to write these bit manipulations with just arithmetic operations as well, but it's usually more verbose that way.

Getting packets into the driver

To build the actual driver, we use the handy infrastructure above to do the device initialization and configuration and then drive a main loop that accepts packets from the NIC and feeds them into the Snabb program (we will just consider the receive path in this post). The core structure of this main loop is simpler than you might expect.

On a NIC like the Intel 82599, the packets are transferred into the host system's memory via DMA into a receive descriptor ring. This is a circular buffer that keeps entries that contain a pointer to packet data and then some metadata.

A typical descriptor entry looks like this:

----------------------------------------------------------------------
|             Address (to memory allocated by driver)                |
----------------------------------------------------------------------
|    VLAN tag    | Errors | Status |    Checksum    |    Length      |
----------------------------------------------------------------------

The driver allocates some DMA-friendly memory (via memory.dma_alloc from core.memory) for the descriptor ring and then sets the NIC registers (RDBAL & RDBAH) so that the NIC knows the physical address of the ring. There are some neat tricks in core.memory which make the virtual to physical address translation easy.

In addition to this ring, a packet buffer is allocated for each entry in the ring and its (physical) address is stored in the first field of the entry (see diagram above).

The NIC will then DMA packets into the buffer as they are received and filtered by the hardware.

The descriptor ring has head/tail pointers (like a typical circular buffer) indicating where new packets arrive, and where the driver is reading off of. The driver mainly sets the tail pointer, indicating how far it has processed.

A Snabb app can introduce new packets into a program by implementing the pull method. A driver's pull method might have the following shape (based on the intel_app driver):

local link = require "core.link"

-- pull method definition on Driver class
function Driver:pull ()
   -- make sure the output link exists
   local l = self.output.tx
   if l == nil then return end

   -- sync the driver and HW on descriptor ring head/tail pointers
   self:sync_receive()

   -- pull a standard number of packets for a link
   for i = 1, engine.pull_npackets do
      -- check head/tail pointers to make sure packets are available
      if not self:can_receive() then break end

      -- take packet from descriptor ring, put into the Snabb output link
      link.transmit(l, self:receive())
   end

   -- allocate new packet buffers for all the descriptors we processed
   -- we can't reuse the buffers since they are now owned by the next app
   self:add_receive_buffers()
end

Of course, the real work is done in the helper methods like sync_receive and receive. I won't go over the implementation of those, but they mostly deal with manipulating the head and tail pointers of the descriptor ring appropriately while doing any allocation that is necessary to keep the ring set up.

The takeaway I wanted to communicate from this skeleton is that using Lua makes for very clear and pleasant code that doesn't get too bogged down in low-level details. That's partly because Snabb's core abstracts the complexity of using device registers and allocating DMA memory and things like that. That kind of abstraction is made a lot easier by LuaJIT and its FFI, so that the surface code looks like it's just manipulating tables and making function calls.

In the next blog post, I'll talk about some specific improvements we made to Snabb's drivers to make it more ready for use with multi-process Snabb apps.

Driver infrastructure​

Getting packets into the driver​

Driver infrastructure

Getting packets into the driver