Close

Dealing with (and fixing) an unexpected glitch

A project log for Tinymovr Motor Controller

Affordable, precise, integrated motion control for robotics

yannisYannis 05/31/2020 at 12:230 Comments

In this post I wanted to focus on Tinymovr firmware improvements and plan out the next steps, however an unexpected glitch had me spending all my development time on it and thus the rest of the updates have been pushed back a bit.

Last week I put together a “torture test” for Tinymovr with the aim of testing several aspects such as position tracking, torque generation, commutation in higher angular velocities, communication etc. The test consisted of reading encoder estimates and commanding random position increments ranging from 0 to 2π (0 to 8192 encoder ticks) at random intervals ranging from 5ms to 100ms.

During the test I’ve noticed that in some cases the motor jumps around at velocity limit, and towards positions far beyond commanded ones. Monitoring the CAN traffic I found that some messages sent by Tinymovr contain garbage position values. They are not frequent (on average one in 5000), but they are certainly deal-breakers for high-frequency communication. Moreover, the issue only affected messages sent by Tinymovr, the ones received were always pristine. 

Long story short, after discovering this I engaged in a week-long investigation to find the root cause. CAN having strong error detection and all, it is unlikely that this error originates in the bus itself. I thus focused on the parts before and after the bus. I quickly dismissed the receiver side (consisting of an Arduino and a MCP2515 breakout, connected through short, shielded wiring) after successfully testing it with another CAN device. In addition, I knew that with the exact same setup and the previous Tinymovr R2 boards there were no errors in communication at all. Therefore I focused on Tinymovr itself.

One thing I did notice was that the PAC5527 controller itself was relatively warm to the touch. Not so much that it would burn your hands but it would be uncomfortable to the touch after a few seconds. Disabling all peripherals and setting the MCU to sleep mode revealed a 30mA current draw at 12V.  The PAC5527 on Tinymovr powers two external devices and a led through its VSYS LDO: the led draws 5mA, the magnetic encoder is supposed to draw 12mA at 3.3V and the CAN transceiver is supposed to draw 10mA at 5V at recessive state, but also a whopping 40mA when setting the line to dominant state! This is of course reasonable if one remembers that CAN has 120Ohm termination resistors and a 5V potential difference between high and low states when bus dominant, which gives a minimum current draw of 5/120 ~= 0.04, but it was something that admittedly I did not consider in design.

Taking a look at the datasheet, the VSYS LDO mentions a max external load of 50mA at 5V, which is higher than the combined 27mA when recessive, but not higher than the combined 57mA when dominant! It is therefore possible that the excessive current required during CAN bus transmission introduces voltage drops that in turn cause memory or I/O related issues, as well as excessive heat due to the stress on the LDOs. At last, we have a plausible scenario! :) 

This also corroborates with the observation that the previous R2 boards running without errors, as, according to the PAC5523 dataset, the PAC5523 LDO can supply up to 330mA total, and excluding the 60mA max per internal LDO, there are 80mA left for external loads.

In order to mitigate I took a few steps to minimize current draw as much as I could given the current design:

  1. Increase encoder SPI frequency as much as possible without introducing errors, to minimize the time spent by the processor in waiting encoder data. Currently it is set to 12.5MHz
  2. Increase CAN Bus frequency to 1MHz from 250kHz, to minimize the time the transceiver has to hold the line in the dominant state (and thus request more current). Admittedly this seems one of the few cases where *increasing* the baud rate results in less errors… :D
  3. Use large heatsinks where possible
  4. Remove the led! Currently the led consumes just 5mA, but still it is a significant fraction of the 50mA total load of the VSYS LDO.

After these changes, I powered up the board and let it run for a while. BOOM! Squeaky-clean comms! I let the whole setup run for around two hours, reaching 100000 messages, and not a single error appeared. That was certainly the root of the problem. 

In order to properly deal with this issue, a board revision is required. In particular, the CAN transceiver needs to be powered from an external source and not the internal MCU LDOs. This means either that I need to introduce a buck converter, or find a transceiver IC that powers directly from external voltage up to 30V. I looked around for the second option and found UJA1162ATK, which is a CAN transceiver from NXP that can be powered by a source of up to 28V. This is just about enough for Tinymovr that is rated at 24V, so I took the time yesterday and integrated it into the design. I also took a few steps to improve thermal design; in particular, I increased the copper area beneath the MCU and left exposed pads on ground for a heatsink under the chip.

All this means that there will be some additional delay in making boards available. Apologies, but I would not want to make available boards that have thermal or power deficiencies, even if they seem to be functioning ok at the surface. I'll try to validate the board design and send a small batch to fab as soon as possible.

Stay tuned for the next update!

Discussions