Close

RF Stack Fails

A project log for AND!XOR DEFCON 24 Badge

Building our own electronic badge. ARM Cortex M3 and Arduino based

zappZapp 01/29/2016 at 06:430 Comments

As promised earlier, it's time to document the fails with the RF stack. In the process, I learned a lot about timers, priorities, and interrupts I never knew before.

During early development of the software I learned about interrupts as a good way to handle user input from buttons. After reading the documentation and some trial-and-error, we were off and running. Even added a software debounce in the interrupts. From the user's perspective at 72Mhz, the interrupts aren't noticed and appear to be handled in the background. This is exactly the behavior I wanted for the RF stack.

One of the major functions of the badge is to perform node discovery and host a chat room. Both of these functions need to run behind the scenes as much as possible to make it seamless for the user. What better way to implement this than interrupts? Not only did this mean I did not have to update any of the animation or user input code but it also meant the RF logic was kept in a single place. Win Win.

To make this work, I fired up Timer2, set an interrupt for every 100ms and had it process data waiting in the RFM69 buffer. The data is then processed and routed to the appropriate buffer internal to the MCU for later use. And it worked.

Until my Moteino crashed...

I'm using a Moteino with a basic implementation of our RF protocol to test send/receive of various packets. However, my code seems to have a memory and was crashing hard about every 10 minutes. When this occurs sometimes the Moteino restarts other times it hangs. Sometimes it hangs while the RFM69 is transmitting and it never powers down the radio essentially saturating the channel at 13dBm. Everytime this happened the badge also hung. Here's why:

  while (!canSend() && millis() - now < RF69_CSMA_LIMIT_MS) {
    receiveDone();
  }
Do you see it? Probably not. canSend() is returning true and the CSMA limit is set to 1000msec. Felix was nice enough to give a timeout.

After a lot of Serial.print(..) calls I determined that millis() was not updating and thus never exiting the while loop above. ::facepalm::

Turns out that during the Timer2 interrupt the systick interrupt that increments millis() is not called so it stops. That line of code above is being called from my Timer2 interrupt and thus blocking waiting for millis() to change which it never does. This is due to the fact that systick on ARM has an extremely low priority and won't pre-empt anything.

So I moved the RF Handler interrupt to fire every 100msec on systick thinking this would allow millis() to update. Also not true.

At this point I could remove the timeout or push my code onto the user thread forcing modifications throughout the badge to call the RF Handler function. If I remove the timeout another node (maliciously or poorly coded) could DOS our badges simply by transmitting near or on top of our 433Mhz channel.

I opted to push the handler call into any process that blocks (user input, animations etc). Turns out it wasn't so bad, but running the handler here allows millis() to update permitting graceful degredation in the event of a strong interferor.

Discussions