Proof of Concept

I set out to try and find open-source projects that already achieved the same thing or something similar.

The first thing that I found was Rhasspy, which is "an open source, fully offline set of voice assistant services for many human languages... ". While it's nearly everything that I need for my project, it's also a little bit too much, as many of the available features were of no use to me.

What's really great is that you can explore all the libraries that Rhasspy, or (Rhasspy v3 more likely) takes advantage of. There, I discovered Vosk, a voice recognition toolkit that can run on Raspberry Pi 4B.

Before ordering a Raspberry Pi for the project, I decided to test the examples provided in Vosk's GitHub repository. I chose to go with the Python API, even though I’m more comfortable with JavaScript. Setting up the Python project was just much easier overall.

To absolutely nobody's surprise, the example worked beautifully. It started right up and inferred text from speech with quite high accuracy, even with the smaller model.

Now that I had automatic speech recognition set up, I wanted someone or something to talk to. Hence, I started looking for language models I could interface with. Long story short, I found LocalAI. According to the linked website "LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.".

There are two really important points in this description:

OpenAI API compatibility - after all there is extensive and easy-to-understand documentation available for OpenAI API at platform.openai.com/docs/api-reference.
Consumer-grade hardware? Does not require GPU? - Yes, please! Even though running an LLM on a CPU won't give you ChatGPT-level of speeds, this means I could just throw the language model on my home lab server, give it a few threads for it, and call it a day.

And throw it on my homelab server I did! The very helpful thing is that LocalAI provides a great step-by-step guide on how to set up the model with a Docker container. I'm not sure if there is an easier way to do that currently (great job!).

Now that I had a language model up and running, I needed to send the inferred text to the LocalAI API as a request. Once the model replies, I can display the response.

With a little bit of fiddling with Postman, handling POST requests, and a lot of ChatGPT assistance, due to my less-than-adequate programming skills, I managed to make it work. So far everything is being run on my homelab server - hence the long cable running from the cupboard to the microphone.

Making a case - what needs to be inside?

Discussions

Become a Hackaday.io Member