Close

Finally! PRUs in the picture

A project log for Lepton 3.5 Thermal Imaging Camera

Documenting my experiments with the FLIR Lepton 3.5 thermal imaging camera.

dan-julioDan Julio 12/07/2018 at 04:540 Comments

My post-holiday obsession continued until I am - finally - reading data from the Lepton using both PRUs in a Beaglebone Black.  Given how - relatively - easy it is to use the PRUs I have to confess I pored over a lot of web postings before figuring it out.  I'm not even the first person to document using the PRUs to read a Lepton camera.  That honor, as near as I can tell, goes to  Mikhail Zemlyanukha who used a PRU and a custom kernel driver to get image data on a 4.9 kernel system.  Unfortunately as I found out, programming PRUs is an evolving paradigm and what worked in the past, including Mikhail's code and old methods such the UIO interface no longer work on current kernels.  Finally I found Andrew Wright's tutorials and started reading TI's remote processor and rpmsg framework source and started to get code running on the PRUs.

Although it was a slog, I have been converted.  I think the real-time possibilities offered by the PRUs in close cooperation with the Linux system are amazing.  The PRU is my current favorite embedded micro-controller because it has easy access to an entire Linux system without the baggage of any OS - and on something as small and cheap as the pocketbeagle.

In case the following looks TL,DR; the code is the github repository.

Failed First Attempt

I was daunted trying to get Mikhail's kernel driver running on my 4.14 system but understood his use of one PRU to capture packets and send them upward to user-land.  I also successfully built and ran the rpmsg "hello" demos.  The rpmsg system is built on top of the kernel's virtio framework to allow user-land and kernel processes to talk to remote devices (e.g. embedded cores or co-processors).  TI has adopted it as the "official" mechanism for their OMAP processors to talk to the on-board co-processors (including the PRUs and the power-management ARM core).  It is probably used in every smart phone as I found Qualcomm's contributions to the source.  The kernel's rpmsg driver makes a co-processor available as a simple character device file that user-land processes can read or write just like any other character device.

So I put together a simple program that got non-discard packets from the PRU and sent them to the kernel using rpmsg.  The PRU bit-banged a simulated SPI interface at about 16 MHz, buffered one packet's worth of data (164 bytes) and then copied it to kernel space via rpmsg.  My idea was to essentially replace the calls to the SPI driver in earlier programs with a call to the rpmsg driver to get SPI packets through it via the PRU.  I did have the sense to try to filter out discard packets but still, BOOM.  My BBB was brought to its knees by a message from the PRU about every 95 uSec (basic packet rate at ~16 MHz SPI + a very quick buffer copy - the PRUs are excellent at pushing data to main system memory).  They system was 100% pegged, unresponsive and my application seemed to be getting about 1 out of 100 packets.

I didn't know it at the time but I was way overrunning the capability of the rpmsg facility and the kernel was bogging down trying to write an error message for each rpmsg from my PRU (that was overflowing its virtio queues) into several log files.  I saw later the hundred megabytes of log files that had accumulated.  The poor micro-SD card.

Taming rpmsg

Clearly I had to reduce the frequency at which messages were sent to the kernel driver to deal with - and also increase the amount of data sent at a time.  Quickly I found that the maximum rpmsg message size is 512 bytes, of which 16 bytes are used for message overhead.  It took a long time - this stuff doesn't seem to be documented anywhere - to understand that the kernel could manage a maximum of 32 entries in a queue for messages for one rpmsg "channel".  At least I had some parameters to work with.  The Lepton is fussy about making sure that the VoSPI interface keeps up with its true 27 fps rate (even though, because of US government restrictions, it only outputs 9 fps of actual image data).  Losing sync causes it to be unable to output good data at all.  The easy way to manage this is to dedicate one PRU to reading the Lepton and the other PRU can manage pushing data to the kernel.

The system I came up with has PRU0 reading packets, discarding any ones it can, and writing packets it thinks are good, or potentially might be good (another quirk of the Lepton VoSPI is that we don't know if the first segment's worth of packets is real until the 20th packet), to a circular buffer held in the 12K shared memory buffer between the PRUs.  PRU1 is responsible for accumulating more than one packet's worth of data from the circular buffer and sending it upward through the kernel to the user process consuming frames.   It does this slowly enough not to overwhelm the kernel, but fast enough to keep up with the realtime data requirements.  Additionally, to reduce the data size, I enable the AGC mode on the Lepton so that it outputs 8-bit data words (1/2 of each 16-bit word in a packet) reducing the size of a frame to 19200 bytes.  This is still bigger than the shared buffer so the timing of the the PRUs has to be set so that PRU1 doesn't read data too fast to read ahead of where PRU0 is writing and PRU0 doesn't write it too fast to overrun where PRU1 is reading.  The deterministic nature of the PRUs is useful here.  Instruction execution timing is set and data memory accesses have a small variability based on activity of other devices (other PRU and main ARM CPU).

PRU0 reads one 164-byte packet every 128 uSec.  It requires about 90 uSec to read the data and write it to the shared memory.  It writes 80 bytes (only the low 8-bits of every data word).  It restarts its acquisition process whenever it sees packets out-of-sequence and it notifies PRU1 when it has seen the first 20 packets of segment 1 (the first opportunity to know we are probably in a good frame).

PRU1 accumulates six packets worth of data (480 bytes) and adds a sequence number to create a single rpmsg message every 1024 uSec once it's been triggered by PRU0.  It takes 40 message for a complete frame over 41 mSec.  A new frame's worth of data is generated about every 112 mSec and both the kernel and a user-land process are able to easily keep up with the full 9 fps data flow for only a couple of percent of main processor time.

PRU0 can tell PRU1 to abort a transfer if it sees a bad packet sequence and then PRU1 informs the user-land process that the frame it is reading is bad with an illegal sequence number.  The user-land process uses the rpmsg facility to enable the VoSPI interface by sending a '1' to PRU1 and disable the interface by sending a '0' to PRU1.  

The PRU shows up as /dev/rpmsg_pru31 to the user-land process and normal read and write operations are used to receive and send data.  The remoteproc framework is used to load firmware into the PRUs and start and stop them.  The current BBB Debian distributions include the PRU C-compiler and support libraries, including rpmsg, so it isn't a chore to setup a compile chain for the PRUs anymore.  The I2C interface is used to configure the Lepton.  prudebug is invaluable (and shows how to just directly map and manipulate PRU memory from user-land).

The github readme has more detailed technical information.

I ended up writing two separate demo applications.  The "pru_rpmsg_fb" application reads frames from the lepton in one thread, pushes them into a circular queue of frames, and reads them out, through a colormap into RGB565 data written directly to the LCD framebuffer in another thread.  The fb access is fast(!) and the two threads are probably unnecessary.  The "pru_leptonic" is yet another take on Damien Walsh's leptonic program and uses the PRU pipeline to feed his zmq-based socket server.  You can connect his web server as a client or the "zmq_fb" program which reads from a socket and updates the frame buffer, or even my PRU-Leptonic demo on the Mac which reads across the network connection.

Performance

Here's the output from "top" while running both the "pru_leptonic" network server, the "zmq_fb" network client program to display images on the local LCD and a connection from the PRU-Leptonic program running on my Mac also displaying the images.  Both displays running at the full 9 fps.

top - 04:17:48 up  4:32,  3 users,  load average: 0.92, 0.76, 0.69
Tasks:  97 total,   1 running,  67 sleeping,   0 stopped,   0 zombie
%Cpu(s)  4.4 us,  4.8 sy,  0.0 ni, 90.1 id,  0.0 wa,  0.0 hi,  0.7 si,  0.0 st
KiB Mem :   495024 total,   288804 free,    70552 used,   135668 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   408160 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                   
 1158 debian    20   0   39672   3560   2880 S  2.6  0.7   2:37.42 pru_leptonic              
 1184 debian    20   0   22164   3388   3000 S  2.3  0.7   0:03.02 zmq_fb                    
  945 root     -51   0       0      0      0 S  2.0  0.0   3:05.48 irq/74-remotepr           
 1201 debian    20   0    7052   2748   2232 R  1.3  0.6   0:01.01 top            


Next steps

I bought a pocketbeagle and have been playing around with it, especially to see if its built-in PMIC can be used to manage power in a battery-powered device.  It doesn't look hopeful but I am still experimenting.  I want to make the final camera based around it and a WiFi dongle, the Lepton 3.5, a short-range IR (NOIR) camera and bright IR LED for night vision and an Adafruit capacitive touch 2.8" ILI9341-based 320x240 pixel LCD, the whole thing powered by a 2000 mA LiPo battery.

On the software side I downloaded LittlevGL as a possible light-weight GUI that can ride directly on top of the linux frame buffer.  And I will contemplate writing my own linux kernel driver to work around some of the limitations of rpmsg and perhaps have a driver that can support all of the Lepton's video modes.  But that my have to wait too...

Correction to my understanding of the Lepton with Radiometry

In the past I thought one had to disable the Radiometry function to enable AGC.  However it seems that one only has to disable the TLinear function (which returns 16-bit absolute temperature values for each pixel), while leaving Radiometry operating.  This allows using the spot meter function with AGC.  I will update my Teensy 3 lep_test10 sketch to include this new understanding.  It only took the millionth reading of FLIR's documentation for me to intuit this was possible.

Discussions