If you follow us at all on Twitter, you've probably heard us complaining and celebrating success with the RFM69 radio we're using on the badge. Through many frustrating hours of tedious debugging of the radio's registers and a lot of trial and error we believe we've narrowed down an issue related to production of the radios themselves.
Issue: Occasionally we would notice the badges unable to receive data. It turns out the radios were getting stuck in a frequency synthesizer mode waiting for PLL to lock. RFM69 has an automatic sequencer (that we're using) that does the job of walking the radio through standby, receive, synthesizer, etc modes as our application switches it. Frequency synthesizer is an intermediate mode between standby and receive. According to the data sheet the radio should spend about 60 usec in this mode before jumping to receive. In our case it was getting stuck.
Hours and hours of debugging lead us to be able to recreate the issue (cycle the badge several times by removing the battery) and detect the issue (opmode = synthesizer and irqflags ready = 0). Every register we tried to manipulate to get it into a receive mode failed. Even re-initializing the radio failed. As a workaround the badge now detects the issue on startup and prompts the user to cycle the badge by removing the battery. Not ideal, but it works.
Eventual Solution: Over the past few months my four year old has become interested in what I've been working on. He likes to watch the animations and flashing lights. This past weekend I let him play with it only to have him drop it on the floor. Upon picking it up and restarting, the radio failed every time it was restarted. I inspected the radio and found this:
Notice the capacitor is dislodged from one of its pads.
After fixing the capacitor the PLL lock failure went away. I wasn't able to repeat anymore either. Not even once after many battery removals. This leads me to believe the dislodged capacitor is a load capacitor for the 32mhz crystal (also pictured). Without it, the radio doesn't have a good reference clock and can't lock.
We've started producing a few of the badges now for DEFCON. Part of the production process includes a functional tests of each component. The third badge produced came up with the PLL Lock error when first booted. Upon inspection, its capacitor was also dislodged. It turns out the next four radios inspected also have this issue to various degrees. Most are good enough. There is clearly a manufacturing QA issue here.
Fortunate for us, this is a simple fix. The capacitor is extremely small but doesn't take much work with a soldering iron to straighten it out.