Generative AI on a Microcontroller

The Electronic Die of the Future

Public Chat
Similar projects worth following

After many retro projects, it is time to look intro the future! It's the year 2023 and the biggest technology trend (hype) is generative AI. Typically this is associated with high performance GPUs. But maybe we can make use of it on a lowly microcontroller? And maybe we can use it to implement just another incarnation of an electronic die?

Table of contents

  1. Goal and Plan
  2. Training Dataset Generation and Evaluation Model
  3. Options for Generative AI Models
  4. CVAE: Condititional Variational Autoencoder
  5. CVAE: Minimizing the VAE Memory Footprint
  6. CVAE: Improving Image Quality (More Layers!)
  7. Tiny Inference Engines for MCU deployment
  8. Building my own Inference Engine

View all logs in order

  • Building my own Inference Engine

    Tim04/22/2024 at 17:58 2 comments

    Nothing has happened for a while in this project. The reason is somewhat obvious from the previous entry: There are many solutions for edge inference, but none that is really fitting my purpose.

    To be honest, it is not a problem that is easy to solve, because a lot of flexibility (aka complexity) is required to address all possible types of NN-models people come up with. In addition there is a tendency trying to hide all the complexity - and this adds even more overhead.

    When it comes to really low end edge devices, it seems to be simpler to build your own inference solution and hardwire it to your NN-architecture.

    ... and that is what I did. You can find the project here:

    Detailed write up

    I learned a lot about squeezing every last bit out of the weights during training and make the inference as lean as possible. I used the well-known MNIST dataset and a small CH32V003 microcontroller as a test vehicle and achieved >99% test accuracy. This is not a world record, but beats most other MCU based applications I have seen, especially on a MCU with only 16kb flash and 2kb sram. (Less than an Arduino UNO).

    So far, I got away with only implementing fc-layers, normalization and ReLU. But to address "GenAI" i will also have to implement other operators, eventually. We'll see...

  • Tiny Inference Engines for MCU deployment

    Tim11/19/2023 at 15:36 0 comments

    The big question is now how to implement our trained model on a microcontroller. Ideally that should be a solution that works with PyTorch (since I trained the models in it) and that minimized SRAM and Flash footprint also on very small devices (No point in having a 6k parameter model if your inference code is 30k).

    I spent quite some time searching and reviewing various options. A short summary of my findings:

    Read more »

  • Improving Quality (More Layers!)

    Tim11/12/2023 at 11:52 0 comments

    So far, I avoided introducing convolutional layers at the full image resolution of 32x32. This is because it would  drive up the SRAM memory footprint signficantly. However, since no convolution takes place in 32x32, there are limitations to the image quality.

    Depth first/tiled inference of the CNN may help to reduce memory footprint. So we should not immediately discard adding more layers.

    Read more »

  • Minimizing the VAE memory footprint

    Tim11/09/2023 at 23:18 0 comments

    To implement the VAE on a microcontroller with small SRAM and Flash footprint it is necessary to minimize the size of the network weights themselves and also consider the SRAM footprint required for evaluation.

    1. The size of the model in the flash is defined by the total number of parameters. I assume that it will be possible to quantize the model to 8 bit, so the model must be reduced to a few thousand of parameters to fit into the flash of a small MCU.
    2. The SRAM consumption is defined by the parameters that need to be stored temparily during inference. Assuming that we evaluate the net layer by layer, then we should limit the maximum memory footprint of one layer. There may also be some optimization posssible by evaluating the CNN part of the decoder tile by tile. (depth first)
    Read more »

  • Conditional Variational Autoencoder (CVAE)

    Tim11/09/2023 at 20:27 0 comments

    After dabbling a bit with both diffusion models and VAEs, I decided to focus on CVAEs first, instead. As it seems the main problem is not the training of the network, but finding a smooth way to implement it on a MCU. So I'd rather deal with a simple architecture first to tackle the MCU implementation.

    VAEs were originally introduced in 2013 in this paper. There is a very good explanation of VAEs here

    A VAE consists of an encoder and a decoder part. The encoder is a multilayer artificial neural network (usually a CNN) that reduces the input data  to a latent representation with fewer parameters. The decoder does the opposite and expands the latent representation to a high resolution picture. The network is trained to exactly reproduce the input image on the output. In addition, there is a clever trick (the "reparamerization trick") that ensures that the latent representation is encoded in a way, where similar images are grouped. After the network is trained, we can use only the decoder part and feed in random numbers to generate new images.

    Since we also want to control the number of pips on the die, we also need to label the data that is fed in - that is where the conditional part in the CVAE comes from.

    The Model:Encoder

            self.encoder = nn.Sequential(
                nn.Conv2d(1 + num_classes, dim1, kernel_size=3, stride=2, padding=1),
                nn.Conv2d(dim1, dim2, kernel_size=3, stride=1, padding=1),
                nn.Conv2d(dim2, dim3, kernel_size=3, stride=2, padding=1),
                nn.Conv2d(dim3, dim3, kernel_size=3, stride=2, padding=1),
            self.fc_mu = nn.Linear(dim3*4*4 + num_classes, VAE_choke_dim)
            self.fc_var = nn.Linear(dim3*4*4 + num_classes, VAE_choke_dim)
    Read more »

  • Options for Generative AI Models

    Tim11/05/2023 at 21:28 0 comments

    Since our goal is to generate images, we need to select a suitable artifical neural network architecture that is able to generate images based on specific input.

    Typically, three architectures are discussed in this context as of today (2023):

    1. Generative Adversarial Networks (GAN)
    2. Conditional Variational Autoencoders (VAE)
    3. Diffusion Models

    Diffusion models are the newest of the bunch and are at the core of the AI image generators that are creating a lot of hype currently. Latent diffusion, the architecture at the core of Stable Diffusion, is actually a combintion of a diffusion model and a VAE.

    Obviously, a diffusion model may be the most interesting to implement. Therefore I will start with that. There may be a risk that it turns out too heavyweight for a microcontroller though, even when the problem is as simplified as we made it already.

    Variational Autoencoders may be a good alternative for a simpler architecture with a higher probability of it being able to fit to be deployed on a small device. Therefore this is second priority, at least as a backup.

    Generative Adversarial Networks were the most lauded approach before diffusion models stole the show. Since they basically train a decoder that could be used in a very similar way as a VAE, they may also be an interesting option to create a lightweight model. Compared to VAEs, they may be better suited to create novel images. But that is something to find out. Unfortunatley, it appears that training GANs is less easy than the other two options. Therefore I will park this for now, maybe to be revisited later.

    Generally, it has to be assumed that the problem of generating images requires more processing power and larger neural networks than a model that only does image recognition (a discriminator). There are plenty of examples of running MNIST infererence on an Arduino. Does it work for generative NN as well? That remains to be seen...

    Next Steps

    1) Investigate diffusion models

    2) look into variational autoencoders

  • Training Dataset Generation and Evaluation Model

    Tim11/05/2023 at 10:08 0 comments

    Training dataset

    Since the capabilities of the target platform are somewhat limited, I elected to create a simplified synthetic dataset for training. I chose 1x32x32 greyscale as target resolution for the images, as this fits into 1kb footprint. The resolution can be increased later, and we can obviously also use a fancier looking die image at a later time.

    I need labeled images showing die-rolls with 1-6 pips. There should also be some variation, because otherwise using generative AI is quite pointless.

    I used GPT4 to generate a Python program to generate images of dice and later refined it iteratively with copilot in vscode. While all the attention is on GPT, copilot chat got impressively useful in the mean time. It's much easier to interact with specific parts of your code, while this is a hazzle in GPT4.

    Images are created in 128x128 and then downscaled to 32x32 to introduce some antiailiasing. The die images are rotated by an arbitraty angle to introduce variation. It should be noted that rotation of the dies requires them to be scaled down, so they are not clipped. This will introduce also a variation in scaling to the dataset.

    Example outputs are shown below.

    Read more »

  • Goal and Plan

    Tim11/04/2023 at 23:28 0 comments


    The goal of this project is to build an electronic die using generative AI on a microcontroller. And of course it is a nice opportunity for me to play a bit with ML.

    Pushing a button shall initiate the roll of a dice and a random result is shown on a display. Instead of using 7 LEDs and logic circuit, as in a traditional circuit project, we shall use a small display (e.g. SSD1306) and a microntroller (TBD - not sure how low we can go.).

    • The display shall show a picture of the die as rolled. 
    • The number of pips should be clearly indicated. 
    • The graphics shall be generated in real time by a generative AI algorithm.
    • Everything should be light-weight enough to run on a MCU.
    Read more »

View all 8 project logs

Enjoy this project?



allexoK wrote 04/22/2024 at 22:40 point

Hello Tim, cool project!

Recently I made a library for both learning and inference on MCUs. It's not NN though, just a decision tree. Check it out if you are interested: It's available both from Arduino and Platformio(Version 2 is significantly better than Version 1).

Maybe you can make a physical dice with accelerometer inside and teach it to predict the dice roll result after it's thrown based on acceleration data before the throw is settled. That would be kind of cool.

 Also it would save some milliseconds of the fellow board games enjoyers lifes (since the dice result would be available a little bit earlier than the dice is settled) and therefore can be treated as life-saving humanitarian project.

  Are you sure? yes | no

Tim wrote 04/23/2024 at 08:11 point


thanks! Cool project on the decision tree classifier. I went a bit the other route to spend a maximum amount of flops outside the MCU to reduce flops on the MCU. No chance to do on-device learning that way :)

  Are you sure? yes | no

allexoK wrote 04/23/2024 at 11:09 point

Thanks, that makes sense, good luck!

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates