Part: The SBC

Ideally, Poetroid runs the language models on board. Currently easier said than done.

I initially wanted to find a low-powered board that could run the language model itself. While it is possible to run the models on many SBCs, according to the information I could find and my own testing on the few boards I own, processors are not powerful enough and language models are not small enough yet to do this well, quickly, and easily.

Some of the options I considered included Rockchip's latest, the zimaboards, intel N100 (tested), the Pocket AI from ADLink, AMD mobile GPUs (tested), and many others advertised with AI or NPU capabilities.

There are various issues in the current generation of things. AI-accelerated embeddable devices seem to be optimized for vision-related tasks, which need different compute than the matrix multiplication required by language models.

AMD's mobile 7000 or 8000 series (I have tested on the Win Mini and the Steam Deck) could possibly be fast enough and efficient enough, but running language models on AMD mobile GPUs is not as of last week straightforward. Projects like MLC exist to run LLMs accelerated on AMD hardware, but are currently limited in multimodal capabilities.

Surprisingly, the best performance-per-watt board that I found is the m1 mac mini. 5-7 watts standby and 39 watts at full load, and it's possible to run multimodal models with ollama at acceptable speeds. And don't think I didn't take the possibility seriously. The M1 mini is an impressively small SBC:

Ultimately, however, none of these boards meet the right matrix of features needed for this project. I believe there are AI-accelerated boards focused on matrix-multiplication in the works and we will quickly see drastic improvements in this area.

So for now, I decided not to invest in any of the current solutions while I wait for a better SBC/LLM combo that can meet the needs of the project. To that end, the server software is written separately from the client software so it can be installed on another machine on the network (ideally one with cuda cores). While you can technically run it right on your camera's SBC, it will be slow for now.

The Poetroid is currently powered by an Orange Pi Zero 3 to run the client software. It has an add-on board with two USB ports and an audio jack.

The USB ports are a neccessity and I was particularly interested in the audio jack. Even offloading the processing to an RTX 3090 there is a decent wait. The instant camera magic is instant, even though the image appears after a wait. With this, the wait is first, so it's more like a photo-booth. The idea of including a small speaker is useful, not just for the click noise of the "shutter", but to play a bunch of other random sounds like 90s dial up modem noises during the 20-30 second wait.

Goals and Motivations

Discussions

Become a Hackaday.io Member