Close

Ode to HAL

A project log for TapTDOA

Turn almost any flat surface into a sensor using cheap piezos and time difference of arrival.

ben-henckeBen Hencke 08/05/2018 at 16:450 Comments
Oh HAL,
We're not a match.
I tried to be your pal
but you make my head scratch.
Your code is a puzzle
and the interface is jagged.
While CPU cycles you guzzle
my interrupt is run ragged!
But the hardware is elegant,
you time wasting contraption.
I've discovered registers most relevant
now I'm done with this abstraction

Hello ARM World

ST has made getting in to ARM pretty easy. I had some false starts with NXP, but these STM32 NUCLEO dev boards with ST-Link debuggers and free high quality tools are really nice.

These STM32F303RE are pretty complex chips with a lot going on, a lot of registers and clocks and all kinds of rules. Here's the clock tool in STM32CubeMX, a chip configurator and code generator:

(it doesn't all fit on my screen)

Using the tool definitely saved a ton of time. It has a lot of sanity checks, and generates code in either HAL (Hardware Abstraction Layer) or LL (Low Level). HAL code has a ton of extra sanity checks and promises portability across different chips should the need arise. LL on the other hand is pretty bare bones and gives light wrapper functions around poking registers.

This can be paired with your favorite IDE, or if you want something that is free and takes less time to set up you can use SW4STM32 which based on Eclipse. I used Eclipse in a previous job quite extensively, so I'm comfortable enough with this setup. Its all backed by the all powerful GCC, and integrates well with the ST-Link debugger via Open OCD. The only trick is that you have to fiddle with the Cube settings and import the generated project in just the right way. 

Getting Some Data

So the basic capture architecture is this:

            trigger   +------+ dma   +---------------+
            +-------> | adc1 +------>+  circ buffer  |
            |         +------+       +---------------+
            |         +------+ dma   +---------------+
            +-------> | adc2 +------>+  circ buffer  |
  +------+  |         +------+       +---------------+
  | tim3 +--+         +------+ dma   +---------------+
  +------+  +-------> | adc3 +------>+  circ buffer  |
            |         +------+       +---------------+
            |         +------+ dma   +---------------+
            +-------> | adc4 +------>+  circ buffer  |
                      +--+---+       +---------------+
                         |
                         |adc 1-4 watchdogs
                         +------------------->  captureEvent()

The timer tim3 triggers all 4 ADCs simultaneously, which then use DMA to write into a circular buffer. Meanwhile ADC watchdogs keep an eye on measured ADC values and interrupt when a value is outside of a predefined range. 

That all happens without any CPU, its all just wiring peripherals and DMA together until the watchdog triggers. 

I want to capture some leading data because I won't know when exactly the signal starts. It is likely going to precede the trigger as a weaker signal. Put another way, I want to stop capturing data into the circular buffers after (BUFFER_SIZE - X) more samples are taken, where X is the amount of leading data to capture.

So I thought, I'll just add an interrupt handler to tim3 and decrement a counter. Tim3 is already triggering for each sample. The CPU runs at 72MHz, and the ADCs can run up to about 5Msps. Initially I'm going for 1Msps. That only leaves about 72 cycles between samples, not enough to really do much, but surely enough to decrement a counter, right?

Enter HAL. HAL provides an abstraction layer, but also tons of checks. But it's not written in C++ templates or even a bunch of #ifdefs, its mostly just readable C code. And abstraction. So the interrupt fires, and the interrupt vector looks like this:

The TIM3_IRQHander is of course provided by the generated HAL code. This calls HAL_TIM_IRQHandler, passing in a pointer to the tim3 hal data structure. But of course HAL_TIM_IRQHandler is generic for any timer, not just tim3, and not even just your flavor of tim3. So it runs through every possible reason any timer could ever interrupt. To be fair its only about 140 lines of code, but here is where my 'user' function gets called from:

It's buried deep enough that the timer has rolled over a few times before it even executes. Forget decrementing a counter in 72 cycles. If I could implement TIM3_IRQHander directly, it would probably be fine, but the Cube wants HAL to own it. I could inject a bit of 'user' code into it's TIM3_IRQHander and return before HAL, but thats kind of playing dirty, and at that point what is HAL buying me?

So HAL might not be the best fit for every application, and certainly won't work for this. I could switch to LL and probably be OK for 1Msps, but probably not a good approach if sampling is increased near 5Msps. 

The good news is that I have heaps of timers available on this chip, so its easy enough to fire up another timer and unburden the CPU from handling anything related to tim3. 

Discussions