Close

Bitbanging a protocol that has no hardware support

ken-yapKen Yap wrote 02/25/2021 at 08:34 • 14 min read • Like

Everybody loves a video so here's one to start with before we get to the boring fascinating details.

There are many protocols out there for communicating between computers, and other computers or peripherals. A subset of these are common enough to be supported by dedicated hardware, e.g. I2C, SPI, UART. New ones are popping up all the time as designers invent ways to communicate with the minimum of hardware and lines. An example is the Neopixel protocol. An example of a uncommon protocol is TMP, the serial protocol used by the Titan Micro family of display driver that resembles I2C but is a cut-down lookalike (that incidentally didn't require Titan Micro to get a I2C address allocation). What do you do when the protocol you want to implement on your MCU doesn't have hardware support?

There are various ways to tackle the problem. One would be to select a different MCU that has the support. Another is to add a peripheral chip that implements the protocol. This is a traditional path, all those UART chips for MPUs are implementations before the functionality was absorbed into MCUs. Similarly for the USB protocols, these are beginning to be integrated into some MCUs. Another way would be to delegate the task to a slave MCU.

The Raspberry Pi Pico family (in which the RP2040 is deployed) has an interesting approach to support for uncommon protocols. This SoC contains state engines that can be allocated and programmed to deal with the protocol, relieving the main processor of this work. It's explained in the Pico SDK documentation, and blogs of how to use this feature are appearing, including this Hackaday #RP2040 : PIO - case study . Expect to see more MCUs implement such state engines.

But what the Pico doesn't suit your other requirements, or if you are too cheap to allocate more hardware to the problem? Surely the MCU has power to spare and you can drive GPIO pins in software? Thus begins your foray into bitbanging.

Drive it fast but not too fast

Unfortunately many of those protocols put one in a bind. Ideally you would like the data transfer to go as fast as possible, but the slave chip may have limits. For example TMP above goes up to 250kHz. So delays may have to be inserted in the code to maintain the minimum timing. In this case a delay of 5 µs is needed each state change. For older microprocessors, a inserted NOP or perhaps a call and return would suffice. But for fast MCUs this busy wait prevents the MCU from doing other work. The wastage gets worse as MCUs get faster. Also unless using a timer, the wait is model dependent. Use a faster member of the family and the waits have to rejigged.

A proper solution takes advantage of the fact that we usually don't need top speed. In fact running it at a lower speed makes the wiring less critical. We only have to run the protocol fast enough. In the case of the TM display chips, it suffices that the transfer time is negligible compared to the update period. A few milliseconds updating the display of a time-of-day clock won't be noticed.

Solution 1: interrupt handlers

This method puts the protocol engine in the interrupt handler. Essentially you have to maintain the state of the connection in the handler, and handle all the situations that can occur. When interfacing with the main line you have to deal with resource contention, e.g. shared variables and buffers. This method has a high overhead compared with the dedicated hardware or dedicated state engine methods because you have to execute at least tens of instructions for one state change, but is acceptable if the data volume is low. Another problem is that the interrupt handler enters at the top but you have to restart where you left off in the state machine. But interrupt handlers are the best option if you don't have dedicated hardware to handle the bits. If you think about it, device drivers in operating systems are sophisticated modules which use interrupts to update state in the driver, though usually they handle larger blocks of data.

Solution 2: polled state machines

Instead of interrupts, your main line may have a polling loop you can use, say one that scans multiplexed displays, which runs from hundreds of Hz to tens of kHz. You could implement a state engine in a function, using persistent variables (e.g. static C variables). The advantage is that there is no resource contention as the multitasking is co-operative. Protothreads are another way to do this in C.

Solution 3: co-routines

Your implementation language may have co-routines or equivalent, like Lua does. Again this can be coupled with a polling loop.

A case study

Let's start with a test program that scrolls through the 10 digits horizontally on a TM1637 display, which is what the initial video featured. Here is the Arduino version using delay calls.

// Module connection pins (Digital Pins)
#define CLK 2
#define DATA 3

static uint8_t startnum = 1;
static uint8_t display[4];

static const uint8_t font[] = { 0x3f, 0x06, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 0x07, 0x7f, 0x6f };

void start(void)
{
  digitalWrite(CLK, HIGH); //send start signal to TM1637
  digitalWrite(DATA, HIGH);
  delayMicroseconds(5);
  digitalWrite(DATA, LOW);
  digitalWrite(CLK, LOW);
  delayMicroseconds(5);
}

void stop(void)
{
  digitalWrite(CLK, LOW);
  digitalWrite(DATA, LOW);
  delayMicroseconds(5);
  digitalWrite(CLK, HIGH);
  digitalWrite(DATA, HIGH);
  delayMicroseconds(5);
}

bool writevalue(uint8_t value)
{
  for (unsigned int mask = 0x1; mask < 0x100; mask <<= 1)
  {
    digitalWrite(CLK, LOW);
    delayMicroseconds(5);
    digitalWrite(DATA, (value & mask) ? HIGH : LOW);
    delayMicroseconds(5);
    digitalWrite(CLK, HIGH);
    delayMicroseconds(5);
  }
  // wait for ACK
  digitalWrite(CLK, LOW);
  delayMicroseconds(5);
  pinMode(DATA, INPUT);
  digitalWrite(CLK, HIGH);
  delayMicroseconds(5);
  bool ack = digitalRead(DATA) == 0;
  pinMode(DATA, OUTPUT);
  return ack;
}

void writedigits(uint8_t *values)
{
  start();
  (void)writevalue(0x40);
  stop();
  start();
  (void)writevalue(0xc0);
  for (uint8_t i = 0; i < 4; i++)
    (void)writevalue(*values++);
  stop();
}

void setup()
{
  pinMode(LED_BUILTIN, OUTPUT);
  pinMode(CLK, OUTPUT);
  pinMode(DATA, OUTPUT);
  start();
  (void)writevalue(0x8f);   // for changing the brightness (0x88-DIM     0x8f-Bright)
  stop();
}

void loop()
{
  uint8_t first = startnum;
  for (uint8_t i = 0; i < 4; i++) {
    display[i] = font[first];
    first++;
    if (first >= 10)
      first = 0;
  }
  writedigits(display);
  delay(500);
  startnum++;
  if (startnum >= 10)
    startnum = 0;
  digitalWrite(LED_BUILTIN, (startnum & 0x1) ? HIGH : LOW);  // flash at 0.5 Hz to debug
}

Now let's look at the Lua version of this program. Incidentally I had a small rabbit hole adventure getting this to work. It turns out that bitwise operations only came with Lua 5.3. So I had to rebuild the NodeMCU firmware with the 5.3 branch. I could have used the bit module, but that would have required me to rewrite mask & value as bit.band(mask, value). I wanted to be able to test a large part of the program on my host machine so didn't want to edit the syntax. That accounts for the commented out loadfile("extra.lua")() line by the way.

-- Module connection pins (Digital Pins)

local CLK = 1
local DATA = 2
local LED_BUILTIN = 4

local startnum = 1
local display = { 0x06, 0x5b, 0x4f, 0x66 }
local font = { 0x3f, 0x06, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 0x07, 0x7f, 0x6f }

--loadfile("extra.lua")()

function start()
        gpio.write(CLK, gpio.HIGH)
        gpio.write(DATA, gpio.HIGH)
        tmr.delay(5)
        gpio.write(DATA, gpio.LOW)
        gpio.write(CLK, gpio.LOW)
        tmr.delay(5)
end

function stop()
        gpio.write(CLK, gpio.LOW)
        gpio.write(DATA, gpio.LOW)
        tmr.delay(5)
        gpio.write(CLK, gpio.HIGH)
        gpio.write(DATA, gpio.HIGH)
        tmr.delay(5)
end

function writevalue(value)
        local mask = 1
        while mask < 0x100 do
                gpio.write(CLK, gpio.LOW)
                tmr.delay(5)
                if (value & mask) == 0 then
                        gpio.write(DATA, gpio.LOW)
                else
                        gpio.write(DATA, gpio.HIGH)
                end
                tmr.delay(5)
                gpio.write(CLK, gpio.HIGH)
                tmr.delay(5)
                mask = mask << 1
        end
        -- wait for ACK
        gpio.write(CLK, gpio.LOW)
        tmr.delay(5)
        gpio.mode(DATA, gpio.INPUT)
        gpio.write(CLK, gpio.HIGH)
        tmr.delay(5)
        local ack = (gpio.read(DATA) == 0)
        gpio.mode(DATA, gpio.OUTPUT)
        return ack
end

function writedigits(values)
        start()
        writevalue(0x40)
        stop()
        start()
        writevalue(0xc0)
        for i = 1, 4, 1 do
                writevalue(values[i])
        end
        stop()
end

gpio.mode(LED_BUILTIN, gpio.OUTPUT)
gpio.mode(CLK, gpio.OUTPUT)
gpio.mode(DATA, gpio.OUTPUT)
start()
-- for changing the brightness (0x88-dim 0x8f-bright)
writevalue(0x8f)
stop()
while true do
        local first = startnum
        for i = 1, 4, 1 do
                display[i] = font[first + 1]
                first = first + 1
                if first >= 10 then
                        first = 0
                end
        end
        writedigits(display)
        tmr.delay(500000)
        startnum = startnum + 1
        if startnum >= 10 then
                startnum = 0
        end
        -- flash at 0.5 Hz to debug
        if (startnum & 1) == 0 then
                gpio.write(LED_BUILTIN, gpio.LOW)
        else
                gpio.write(LED_BUILTIN, gpio.HIGH)
        end
end

I miss many conveniences from C, like the ternary ?: operator, but as an embedded language, Lua's not too bad.

Now here's the Lua coroutine version. Notice that all the waits have been turned into coroutine yields so it's expected that enough time elapses before the coroutine is resumed. You also see that the protocol handler is moved into a coroutine which yields often but never exits.

-- Module connection pins (Digital Pins)

local CLK = 1
local DATA = 2
local LED_BUILTIN = 4

local startnum = 1
local display = { 0x06, 0x5b, 0x4f, 0x66 }
local font = { 0x3f, 0x06, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 0x07, 0x7f, 0x6f }

--loadfile("extra-c.lua")()

function start()
        gpio.write(CLK, gpio.HIGH)
        gpio.write(DATA, gpio.HIGH)
        coroutine.yield(true)
        gpio.write(DATA, gpio.LOW)
        gpio.write(CLK, gpio.LOW)
        coroutine.yield(true)
end

function stop()
        gpio.write(CLK, gpio.LOW)
        gpio.write(DATA, gpio.LOW)
        coroutine.yield(true)
        gpio.write(CLK, gpio.HIGH)
        gpio.write(DATA, gpio.HIGH)
        coroutine.yield(true)
end

function writevalue(value)
        local mask = 1
        while mask < 0x100 do
                gpio.write(CLK, gpio.LOW)
                coroutine.yield(true)
                if (value & mask) == 0 then
                        gpio.write(DATA, gpio.LOW)
                else
                        gpio.write(DATA, gpio.HIGH)
                end
                coroutine.yield(true)
                gpio.write(CLK, gpio.HIGH)
                coroutine.yield(true)
                mask = mask << 1
        end
        -- wait for ACK
        gpio.write(CLK, gpio.LOW)
        coroutine.yield(true)
        gpio.mode(DATA, gpio.INPUT)
        gpio.write(CLK, gpio.HIGH)
        coroutine.yield(true)
        local ack = (gpio.read(DATA) == 0)
        gpio.mode(DATA, gpio.OUTPUT)
        return ack
end

function writedigits(values)
        start()
        writevalue(0x40)
        stop()
        start()
        writevalue(0xc0)
        for i = 1, 4, 1 do
                writevalue(values[i])
        end
        stop()
end

co = coroutine.create(function()
        coroutine.yield(true)
        gpio.mode(LED_BUILTIN, gpio.OUTPUT)
        gpio.mode(CLK, gpio.OUTPUT)
        gpio.mode(DATA, gpio.OUTPUT)
        start()
        -- for changing the brightness (0x88-dim 0x8f-bright)
        writevalue(0x8f)
        stop()
        while true do
                local first = startnum
                for i = 1, 4, 1 do
                        display[i] = font[first + 1]
                        first = first + 1
                        if first >= 10 then
                                first = 0
                        end
                end
                writedigits(display)
                startnum = startnum + 1
                if startnum >= 10 then
                        startnum = 0
                end
                coroutine.yield(false)
        end
end)

ledon = false
while true do
        cont = true
        -- tick rate = 1kHz
        for i = 1, 1000, 1 do
                if cont then
                        status, cont = coroutine.resume(co)
                end
                -- in real life do other work and wait for next tick
                tmr.delay(1000)
        end
        -- flash at 0.5 Hz to debug
        if ledon then
                gpio.write(LED_BUILTIN, gpio.LOW)
        else
                gpio.write(LED_BUILTIN, gpio.HIGH)
        end
        ledon = not ledon
end

When run on the ESP8266 this works almost like the previous version. At the tick rate of 1 kHz, it takes a couple of hundred yields to complete the transfer so you can see the update move across the digits. Increasing the tick rate will reduce this effect, but it's not unpleasant. Notice that the last yield returns false as a flag that the bitbanging is complete. The other side effect is that due to the additional time for the coroutine to handle the update, the flash rate is less than 0.5 Hz. In practice one would not delay in the main loop, but do other work then wait until the second is up by watching the timer. Also in a real program the protocol handler would not update the display but leave this to the main line. Changing this would have made it harder to compare with the previous version.

Like

Discussions