As mentioned in the previous log, the WS2812 is driven by a serial asynchronus square signal of 800KHz, and the duty will determine if the value transmitted is a '1' or a '0'.
- if the signal is high for 0.8µs, and then goes low for 0.45µs, it's transmitting a '1' (duty cycle 64%)
- if the signal is high for 0.4µs, and then goes low for 0.85µs, it's transmitting a '0' (duty cycle 32%)
Now this timming can be a bit of pain to achieve if trying to do by "bitbang". By "bitbang" I mean setting level high, doing a couple of NOPS or dummy instructions, then setting it low, and so on for the total amount of bits required for all the pixels. And any interruption or asynchronous event will ruin the timming.
I saw some projects use an SPI peripheral, setting its clock in a convenient way, and achieving the duty with the data bits. It's a clever way to do it, which has several advantages, including easing a lot the CPU usage. But it also makes the required memory grow significatively, as each pixel bit demands several bits of memory.
So I went for a more basic/traditional/academic approach. The STM32F103C8 Timers can be configured to generate PWM signals on a pin, and use a DMA channel to load the following value. That way, if the LED strip is connected to the right pin, it can be refreshed with accurate timming and minimal CPU usage.
As both peripherals are quite architecture/vendor specific, they are rarely used in frameworks like Arduino or Mbed, and require using the STM32CubeMx tool to generate the skeleton of the application. The timer channel is configured to generate a precise 1.25µs clock signal, and the PWM value in the channel will set the duty to 32% or 64% if it's a '0' or a '1'.
I wrote the first couple of functions of the GFX_DRV module, and quickly realized the idea was not as efficient as I envisioned. Instead of 3 or 4 bits per pixel bit, I was now using 16 bits (that's the size of the timer compare register) per pixel bit. The code was simple to understand and debug, but not worth the memory budget.
Yet I decided to keep it with some extra change. The refresh procedure was split into two levels, the Timer/DMA doing one full pixel at a time, and the CPU reloading the buffer with the next pixel values when the DMA cycle was completed.
The memory overhead is now fixed to 48 bytes, independently of the screen size. The CPU, when the screen needs an updated, is required to reload the timer buffer every 30µs, which is not ideal. But as it's not doing anything else, it turned out to be acceptable in the end.