Introducing a face

A project log for Automatic Speech Recognition AI Assistant

Turning a Raspberry Pi 4B into a satellite for self-hosted language model, all with a sprinkle of ASR and NordVPN Meshnet

adam-meshnetAdam - Meshnet 12/21/2023 at 14:480 Comments

In my first project log, I mentioned that the examples provided with the Vosk library were very easy to work with and worked right from the get-go. This was a no-brainer to use it for this project. After all, if it's not broken, don't fix it.

To the code provided by the Vosk devs, I've added a simple POST request handling along with PyGame based display. The "frontend" shows the assistant "thinking", and once the POST request has been responded to by the language model, it displays the response.

For the time being and as a proof-of-concept, the assistant has only two states:  "thinkingFace" and "responseFace". The “thinkingFace” involves eyes moving side to side, which, at least in my mind, mimics someone trying to figure things out, and the “responseFace” displays the text the language model responded with.

One more kink that needs to be ironed out is flipping the display orientation 180 degrees. However, as it turns out, it is not so simple, and all the guides I found on the internet for doing so didn't work with my display.

Project files are available in their respective section if you want to give it a try.