Waveforms using Pi DMA

ServoBlaster, pigpio, pi-blaster and `yapidh` all operate on the same principle. The Raspberry Pi lacks a generic timer-based trigger for the DMA engine, but it does have peripherals which can request data at a continuous rate (the "feed me" signal!). We use these peripherals to trigger DMA transactions at a constant known rate, in order to provide accurate timing. By making the DMA engine write to the GPIO set/clear registers, we can use this "trick" to generate GPIO output (or input) very consistently, and very precisely.

The DMA engine reads lists of "control blocks" (CBs), with each CB describing a DMA "job". The control blocks are linked together, and the DMA engine will fetch the next in the list once its finished processing the current job.

So, there's two parts to this trick:

  1. Set up a peripheral (PWM or PCM) so that asks the DMA engine for data every N microseconds
  2. Set up a list of DMA control blocks, with "useful work" (gpio set/clear) interleaved with control blocks which just feed the PWM/PCM

For the purposes of the explanations below, I'm going to assume that the PWM peripheral is used, and is configured to request a new sample every 10us.

Servoblaster/pi-blaster

ServoBlaster and pi-blaster take a straightforward approach. They build a long list of DMA CBs, which form a loop. Each "sample" consists of two control blocks - one which feeds the PWM/PCM, and the other which writes to GPIO.

When the PWM/PCM sends a "feed me" request, the DMA fetches the next CB, which writes to the GPIO set/clear register to make the appropriate output.  However, because this CB doesn't "feed" the PWM, the "feed me" request is still asserted, and the DMA immediately fetches the next CB. This CB writes some data to the PWM, which satisfies its hunger. The PWM spends the next 10us outputting a sample - and 10us later, it asserts the "feed me" signal again, and the process continues.

This approach works well.

The implications are that:

  1. Every single "sample" means two DMA CBs have to be fetched from memory (32 bytes each - so a 10us/sample that's only 6.4 MB/s, which is realistically pretty insignificant)
  2. The number of CBs you need is directly related to the length (not the complexity) of your wave segment
  3. Any update of the waveform has to be done while the DMA is "live" and currently reading the CBs. This means it can be hard to guarantee your changes get in at the right time.

yapidh

`yapidh` (and I think pigpio's waveform generators, but I didn't really understand the code) takes a slightly different approach: instead of having a single loop with two CBs for every single sample, `yapidh` uses one CB for each delay, rising edge and falling edge respectively. This means that if you're generating a single square wave, you only need 4 CBs (actually a couple more for "fences" discussed below), regardless of how long the square wave's period is.

To achieve this, each "delay" CB is set up to transfer multiple samples. For a PWM tick-rate of 10us, a 100us delay would be represented as a CB with length = 10. That means that the DMA engine will keep using the same CB 10 times, once every time the "feed me" signal is raised, and it will take a total of 100us for the DMA to process that CB.

On-the-fly CB generation

The other difference with yapidh, is that it generates the CB chains 'on the fly'. For the sake of argument - lets say we're generating 16ms chunks of waveform at a time (but any length is possible). 

yapidh uses the idea of "event sources" to figure out how to generate the CBs. It can have an arbitrary number of sources, and each source only needs to do two things:

  1. Tell yapidh how long it is until its next rising or falling edge
  2. What pins should be set/cleared at that time

yapidh then builds "chunks" of CBs, by generating delay/rising edge/falling edge CBs until it reaches the desired chunk length....

Read more »