The goal of this project was to be able to make a device which could record myself saying something, and play it back in a different person's voice.
To create the Talkboy Ultra, I started with:
- Raspberry Pi Zero W
- A USB Audio Card + Lavalier Mic
- A USB Speaker
This allowed me to record and playback audio with the Raspberry Pi and some custom Python code.
I wanted this to be a standalone device, so I needed a few more components. To control the recording and playback, I used some push buttons.
The device had to be portable so I also added:
- A 1000 mAh lithium ion battery
- A TP4056 Charging Module with micro usb input
- A MT3608 DC-DC Step Up Boost Power Converter set to 5v for the Raspberry Pi
This let me run the device without being plugged directly to a usb power source.
I also added a 1602 LCD Display so I could show which output voice was currently selected.
Finally, I added an on/off switch to be able to turn the thing off. I soldered the components together and amazingly they worked the first time!
I had to write software on the device to be able to record my voice, save it to a file, and then playback the changed voice. I looked into ways of changing the voice on the device, but the Raspberry Pi (and probably any device that would fit into the case) would be too slow and low on memory to run the inference.
I ended up going with a client-server model where I recorded the audio on the device, then sent it to a local server (my MacBook) running the inference. I wrote a small web server on the MacBook in Flask, and had it run the voice changing software (so-vits-svc-fork), then return a response with the updated audio file. Then the Talkboy would be able to play the file back through the USB speakers.
I set up the green button to play back audio on a single click, and change voices with a double click. Holding down the red button will record, and releasing it will stop recording and send the recorded audio to the webserver. The LCD will change to “processing” state and change back when the voice is ready to play back.
I also 3D printed a case for the Talkboy, using Tinkercad and a couple of iterations of designing and supergluing. I think it turned out pretty good, using the original Talkboy as inspiration.
Just as an aside, from the above models, I trained the Arnold Schwartzenegger and Patrick Stewart voices (both from audio that I found from audio book clips). The Simpsons ones that I used in the video were downloaded. I didn’t train for a super large number of steps so that might explain why the Simpsons ones sound very close to the real deal, whereas the other ones are just okay.
I had a lot of fun with this project. It really ended up combining a bunch of my interests, including hardware hacking, software (both on an embedded device and server), and utilizing some state of the art AI models too! I also got to 3D print a case that’s much more complicated than anything else I’ve made myself.
It was fun coming up with this challenge and then solving all of the problems that came with it, including fitting everything in the case, and lining up all of the components like the screen and buttons.
While I think it would be really interesting to see this device hit the mainstream market, I don’t think it will happen any time soon. For one thing, the licensing of real voices would probably be an issue. Plus the hardware to run the inference on a device doesn’t exist as far as I know. I think you might be able to do it on an iPhone if you shrank the model, but that wouldn’t be nearly as fun as using a dedicated device.
I’m hoping that I’m wrong though, and that a 30th anniversary Talkboy makes its way to the toy stores, complete with the latest in AI tech!