A native NeoPixel interface for the NEORV32

A project log for The NEORV32 RISC-V Processor

An easy-to-use, customizable, lightweight and open-source 32-bit RISC-V microcontroller/CPU written in platform-independent VHDL.

StephanStephan 02/25/2021 at 21:140 Comments

The cool thing of having a soft-core processor is you can add exactly the features you are wishing for. Some days ago I got a hint on the WS2812 LEDs, which are used in Adafruit's awesome NeoPixels. The interface is quite heavy when it comes to timing constraints and I figured out that some platforms struggle with it.

WS2812 Protocol

In summary, the WS2812 interface is based on a single signal carrying an asynchronous data protocol. It uses a fixed 800KHz frequency and modifies the duty cycle to carry the '0' and '1' bits.

RESET (data "strobe")

Timings and diagram were taken from the Adafruit NeoPixel Überguide.

Small mikrocontrollers like AVRs can tackle the hard timing (for example using the FastLED library), but might need some inline assembly to keep up with it. Also, the image is better off being stored entirely somewhere in memory (consuming precious RAM) since real time rendering might crash in-time bit-banging. Things are getting even more complicated when pairing the hard timing constraints with setups using interrupts...

More powerful platforms like the Raspberry PI do have the processing power to bit-bang the interface, but they also might run into real-time problems when OS interrupts kick in.

In summary, pure-software based approaches come with more than just a few issues. So, why not create a dedicated hardware interface that takes all the critical interface work from us?

---------- more ----------

The NEORV32 Custom Functions Subsystem

To implement a native WS2812 interface I am using the Custom Functions Subsystem (CFS) of the NEORV32 RISC-V Processor. This subsystem provides a blank template for creating application-specific memory-mapped accelerators and interfaces. Besides the actual processor interface, the CFS features "empty" IO conduits allowing easy integration of external signals.

The CFS also provides 8 different "clocks" derived from the system's main clock:

f_main/2, f_main//4, f_main//8, f_main//64, f_main//128, f_main/1024, f_main/2048, f_main/4096

The WS2812 Hardware Interface

May version of the interface uses two memory-mapped registers: The control register and the data register. The control register is used to configure everything. Writing data to the data register will trigger a new transmission to the LED stripes.

For the WS2812 interface core I have implemented a programmable clock divider (selecting one of the clocks mentioned above), a shift register for serializing LED data and a programmable counter with two programmable comparators. Everything is orchestrated by a simple state machine.

The counter is used to count the ticks of the selected clock and defines the base bit rate. The first programmable comparator is used to configure the time of the whole period ("T_total" = 1.25µs) for sending a single bit. The second comparator selects one out of two programmed times for setting the LED data line high. Hence, it is used to define the high-time for sending a '0' or '1' bit according the current bit of the data shift register. The serial output can be multiplexed to 4 different channels, which allows to drive up to 4 independent NeoPixel stripes in parallel - or to send "broadcast data" to all of them at once.

The shift register can be configured for 24-bit data (for the "normal" RGB LEDs) and also for 32-bit (for the RGBW LEDs that provide an additional white LED chip). All configuration is programmable so it can be modified by the software at any time. This allows to use RGB and RGBW LEDs at the same time. The interface can also support the WS2811 LEDs by adapting the timing configuration (but I have not tested that yet).

The whole interface module takes up only 140 LUTs and 100 FFs on an Altera Cyclone IV FPGA and has no problems integrating into a 100MHz system.

The Result

I setup the NEORV32 + WS2812 CFS on a Terasic DE0-nano FPGA board and connected two NeoPixel arrays: An Adafruit 12-LED RGB ring to channel 0 and an Adafruit 8-LED RGBW stick to channel 1. Since the NeoPixels are powered by a 5V supply I am using a random 74HC04 hex inverter (two inverters in a row, of course) as simple level shifter to connect to the 3.3V FPGA IOs (placed on the second breadboard).

The animations are quite simple, but right now this is more like a proof-of-concept. Oh, and please note that this "video" is just a chopped GIF ;)

LED data is send to the stripes by the send_data function, which configures the interface for the actual mode (24-bit or 32-bit) and enables the selected channels:

void send_data(uint32_t channel, uint32_t mode, uint32_t data) {

  uint32_t channel_int = channel & 3; // new channel select
  uint32_t mode_int = mode & 1; // RGB (24-bit) or RGBW (32-bit) mode

  while(WS2812_CONTROL & (1 << WS2812_CT_BUSY)); // polling (FIXME!): wait for busy flag to clear

  uint32_t ctrl = WS2812_CONTROL;
  ctrl &= ~(0b1111 << WS2812_CT_CHMASK); // clear current channel selection
  ctrl &= ~(0b1 << WS2812_CT_MODE); // clear current mode
  ctrl |= (0b1 << (channel_int + WS2812_CT_CHMASK)); // set new channel enable
  ctrl |= (mode_int << WS2812_CT_MODE); // set new mode
  WS2812_CONTROL = ctrl;

  WS2812_DATA = data; // send new LED data

As already mentioned, the whole setup runs at 100MHz. Sending data to a RGB LED (24-bit) takes (1.25µs * 24) / 10ns = 3000 clock cycles for transmission. Sending data to a RGBW LED (32-bit) takes (1.25µs * 32) / 10ns = 4000 clock cycles for transmission. Since the transmission is entirely handled by the hardware, the CPU has enough time (up to 1500 or 2000 instructions, respectively) to take care of other things - like computing the next image frame.

Source Code

The VHDL and C source files are available on GitHub as Gists:

The sources are also available via a new discussion in the NEORV32 GitHub repository.


In the next step I would like to add interrupt support to avoid nasty polling and maybe some kind of data buffer (maybe a FIFO or maybe just some simple double-buffering).

I am also thinking about adding the WS2812 hardware interface as a standard (but still optional) module to the NEORV32 SoC. Indeed, this is quite a niche application but somehow NeoPixels really put me under their spell - they are such a great thing to play with. And hey, they are called NEOPixels - so how could I resist? :)