Recently, more and more AI voice assistants have entered the public eye, with everyone praising the benefits AI can bring. However, this has led people to overlook the fact that the essence of these hardware products is just a recording device that can upload data in real-time!
The privacy of personal recordings is paramount, and it's hard not to be concerned about where the data goes. Are private conversations really safe? How do they store the original recordings? How do they analyze and use such private data?
Moreover, most of these products require manual triggering to start recording. If you tend to forget important information, you might also forget to trigger the recording. What I want is an assistant that can automatically record all information without any extra operations.
Therefore, I started this project with the hope of creating a device with the following features:
- A hardware device capable of 24-hour continuous recording, with data transmission to a smartphone via BLE.
- Possible features include:
- 24-hour continuous recording capability
- The ability to locally extract useful recording content (90% of the time we are not in conversation)
- Uploading useful data to a smartphone via BLE
- Firmware upgrade via BLE
- As small and lightweight as possible to attach to clothing
We also need to develop a mobile app to receive data from the device and allow us to send the recording data to any custom model interface we choose. Of course, it will include common public large model interfaces, but the key is that we decide where everything goes!
- The app may include local speech-to-text and voiceprint recognition capabilities to reduce data volume (lower priority).
Additionally, we need a server program, including a local model to process data, running on our own server. (I think summarizing doesn't require a very powerful large model, though I could be wrong)
- Possible features include:
- Converting speech to text
- Voiceprint recognition
- Summarizing text content with a local model
- Deployment via Docker on most servers
This is my vision for the device. I will implement these features step by step. The current plan includes:
- Validating the functionality using ESP32 (though its power consumption is high)
- Developing the accompanying app to forward recording data to the model and receive the model's feedback
- Developing the server software to support simple speech-to-text and text summarization
- Replacing the ESP32 with a lower-power BLE MCU
- Designing the final PCB and manufacturing the enclosure
This is the current plan. The project will be fully open-source, from hardware to software. I hope that through this project, we can provide everyone with a convenient and secure personal voice assistant, ensuring our privacy.
Hi,
I run all my home automation stuff in house *except* for a few Google homes scattered about the place.
It's annoying, as nothing connects to "the cloud" except for the Google assistants. The reason being is that I like having voice control. The commands are as simple as "turn on the kitchen lights", which then comes back into node red and gets processed into an MQTT request and sent out.
I'd love to be able to replace these with something processing voice on my local server, with the ability to figure out what the commands are saying and do the correct action.
I know you're more focused at LLM stuff at the moment, which is understandable, but have you given any thought to simpler actions?
Although saying that, teaching a local LLM to process the commands correctly might be the easiest option...
J