BeagleBone PRU Firmware

A project log for Open Source Ultrasonic Phased Array

Documentation of the 3rd generation of my ultrasonic phased array project. Levitating stuff, directional speaker, haptic feedback and more!

Niklas FauthNiklas Fauth 07/14/2018 at 13:000 Comments

I spent quite a few thoughts on how I want to control the HV583 shift registers.
The most obvious option would be to use a FPGA, like in V2 of my phased array: The shift registers need 8x Data, 1x Clock and 1x Latch, all shifted out in parallel with 40MHz. With a FPGA, this isn't a problem, because the driver for generating these control signals can be implemented in VHDL / Verilog. Unfortunately, if you don't have a expensive SoC like the Zynq that also has a ARM hardcore, this also means that all of the control logic has to be implemented in VHDL. I did quite some projects with FPGAs already, but when I started reading about implementing TCP/IP stacks on FPGAs I kinda felt lost. There are not many open IP cores for network available, and those that are free to use require so many resources that you would need a FPGA in the 20€ price range to make it fit.

For microcontrollers like the STM32 or Texas Instruments Tiva-C, developing network applications isn't a problem. Lightweight TCP/IP stacks like LWIP run on ARMs small as the f4 serial of STM, and there are tons of application code and tutorial available. Unfortunately, generating the required shift register waveforms is the more difficult on instruction set architectures. You need at least five assembler instructions the set the port (8 data bits) and toggle the clock pin on a typical microcontroller: LOAD, SET PORT, SET CLK, INC ADDR, CLEAR CLK. So you need at least 200MHz CPU clock to shift out the data with 40MHz. Also, during this no other tasks can be executed by the CPU, like loading new waveform patterns via ethernet.

When you search for popular projects utilizing the BeagleBone SBC, you probably sooner or later stumble upon the BeagleLogic project. The BeagleLogic is a logic analyzer capable of sampling with 100MHz in realtime without any additional logic or hardware. It does that by using the BeagleBone's PRUs (programmable realtime unit):  A PRU is a fast (200-MHz, 32-bit) processor with
single-cycle I/O access to a number of the pins and full access to the internal
memory and peripherals on the AM3358 processor on BeagleBones ( This basically means that you can have a Linux running on the main ARM core, while still having two 200MHz coprocessors available to do stuff with the GPIOs of the BeagleBone. The beaglelogic does exactly this: using the PRUs to sample up the 16 GPIOs while still providing a Linux to control the sampling, analyze the data... In fact, the beaglelogic project was exactly what I needed, just the other way around: Instead of reading GPIOs and write their values to DDR memory, I want the write DDR memory to eight GPIOs.

There are quite some things to learn before you can start programming the PRUs. Luckily, there is lots of very useful documentation out there on this topic. On of the most helpful resources to understand how the beaglelogic assembler code works was the awesome blog of the beaglelogic developer himself, Kumar Abhishek. He explains in detail almost every single line of the PRU firmware, and it shouldn't be a problem to understand my modifications to his idea after you read this blogpost.

Here are few lines of the pru1.p assembler code used for shifting out the waveforms that form a 40kHz square wave at the output of the shift registers:

  XIN 10, &r21, 32 ; r21-r28

  MOV r30, r21.b0 ; set data pins
  SET r30, r30, CLOCKPIN ; set clock pin
  CLR r30, r30, CLOCKPIN ; clear clock pin

  MOV r30, r21.b1 ;
  SET r30, r30, CLOCKPIN ;
  CLR r30, r30, CLOCKPIN ;


  NOP r30, r28.b3 ;
  SET r30, r30, CLOCKPIN ;
  CLR r30, r30, CLOCKPIN ;

  SET r30, r30, LATCHPIN ; set latch pin
  CLR r30, r30, LATCHPIN ; clear latch pin


When you compare this to the beaglelogic code, basically all I did was swapping the XOUT to XIN to load instead of store data to PRU0. In fact, I even had to add some NOP lines because else the signals would be too fast for the HV583. I don't want to go too much into detail how everything works, because there really is a lot of documentation available for the PRUs and how to talk to them in various ways, like using Kernel modules or the TI pruss driver. To cut a long story short, I have a userspace application that enables the PRUs and loads the corresponding firmware to both of them. It also reserves some space in the DDR memory and fills it with the waveform pattern that is about to be transmitted by the ultrasonic transducers. This memory has 30x32 bytes: The 40kHz square waveform for each transducer consists of 30 timeslots. For each timeslot, 32x 8 bit are shifted out to the shift registers. 8 bit because we have eight data pins and 32bit serial per shift registers. So each bit in the memory represents the state of a transducer at one of the 30 given timeslots. To output the same 40kHz 50:50 square wave to all transducers, you simply set the upper 15x30 bytes to 0xFF (all bits high), and the other 15x32 bytes to 0. This memory region whith the prepared waveform pattern is mapped to PRU0, which takes data for one timeslot at a time and moves it to PRU1. RU1 then shifts out the 32x 8bit to 8 pins of the BeagleBone, toggling the clock pin for each bit and the latch pin for each timeslot.

The file arraycontrol.c is a userspace app to write simple test patterns to the HV583 and I will extend it by a python wrapper in the future to shift out actual waveform patterns calculated by my existing numpy simulation. Here is all the code I wrote so far for the PRUs:

Even if it doesn't look like too much, it was a lot of work and debugging to get everything working the way I wanted.

Here are some excellent sources that really helped me a lot. Huge thanks to these guys / code:

Kumar Abhishek, the developer of beaglelogic, with his great blog

This (unfinished?) microphone array on github:

The famous LEDscape project:

By now, I really think a beaglebone is the best way to build such a phased array. It has all the requirements I liked about my V2, like ethernet and linux. Also, it is super easy to use (2.54mm pitch header), easy available at almost all distributors, well known and established within the community, and also very cheap. You can even use a BeagleBone Green which only costs 25€.