Close

Tiny Inference Engines for MCU deployment

A project log for Generative AI on a Microcontroller

The Electronic Die of the Future

timTim 11/19/2023 at 15:360 Comments

The big question is now how to implement our trained model on a microcontroller. Ideally that should be a solution that works with PyTorch (since I trained the models in it) and that minimized SRAM and Flash footprint also on very small devices (No point in having a 6k parameter model if your inference code is 30k).

I spent quite some time searching and reviewing various options. A short summary of my findings:

Tensorflow based

Tensorflow lite
TinyEngine from MCUNet. Looks great, targeting ARM CM4.
CMSIS-NN-  ARM centric. Examples. They also have an example for a pytorch to tflite converter via onnx
TinyMaix -  Very minimalistic, can also be used on RISC-V. 
Nnom – Relatively active project. Small footprint and portable.

Pytorch based

PyTorch Edge / Executorch - The answer to Tensorflow lite from PyTorch. Seems to target intermediate systems. Runtime is 50kb...
microTVM. Targeting CM4, but claims to be platform agnostic. 

MAX7800X Toolchain and Documentation (proprietary) This is a proprieteray toolchain to deploy models to the MAX78000 edge NN devices.

Meta Glow Machine learning compiler, seems to target rather medium to large platforms. 

ONNX based

DeepC. Open source version of DeepSea. Very little activity, looks abandoned
onnx2c - onnx to c sourcecode converter. Looks interesting, but also not very active.
cONNXr - framework with C99 inference engine. Also interesting and not very active.

The Dilemma

The diversity of different solutions, and also the number of seemingly abandoned approaches, shows that deployment on tiny MCUs is far from a one-fits-all solution.  

Most of the early solutions for edge inference / TinyML are based on tensorflow lite, PyTorch only seems to be catching up more recently. There are also some solutions that convert models from the ONNX interchangeable format to C code.

The issue is clearly that it is very hard to combine the requirements of easy deployment, flexibility and very small footprint into one. To optimize small models, they need to be trained taking the limitations of the inference code into account (quantization aware training). This means that training and inference cannot be implemented completely independently.

An attractive workaround that seem to gain more traction is to use compilers that create specific infrence code, such as microTVM and Glow. Both of those seem to be used for slightly more powerful microcontrollers (CM4 and up). 

In terms of very small inference engines, onnx2c, tinymaix and nnom seem to be quite interesting. The last two are unfortunately based on tensorflow lite. TinyEngine seems to be the most optimized MCU inference engine, but is highly reliant on the ARM architecture.

Plan/Next steps

Discussions