long range, low power, modular
To make the experience fit your profile, pick a username and tell us what interests you.
Audio from a Youtube video run through codec2
x-wav - 1.83 MB - 09/09/2020 at 19:41
plain - 1.15 kB - 08/28/2020 at 18:54
original voice sample
x-wav - 1.72 MB - 08/09/2020 at 14:56
voice sample passed through codec2 voice codec
x-wav - 1.72 MB - 08/09/2020 at 14:56
github : Implements codec2 on a Adafruit Feather nRF52 Bluefruit LE.
Codec2 has been modified so that it can be built using Arduino framework.
The SM1000 contains the analog circuitry we need.
Using a speech codec, data transmission can be brought down to about 1200bps. But how can we reduce data even further. Let's take a 1min52s mono 8kHz speech sample as an example
Using codec2, we get a 106/1 compression ratio, using text 1527/1 compression ratio. That's almost fifteen times better!
If we use speech recognition on the transmit side, then send the transcription over and use speech synthesis on the receiving side, this might work.
Even though speech recognition engines are very good these days, they're still not as good as human beings. The transcript text could be shown to the speaker during transmission. In case of errors, the speaker could repeat the incorrect word or spell out its characters.
A speech engine also requires a language preset. That shouldn't be too much of a hurdle because we all speak the same language most of the times.
Is there good speech recognition software that runs offline?
Is speech recognition software not too power hungry?
Using Linux command line tools
I'm not planning to turn this into an actual application. It's only a proof of concept. If you want a VoIP solution that you could really use in your application, have a look at Mumble. It's used in the RigPi.
When you're using Ubuntu, version > 19 is needed.
sudo apt install python3-pip libcodec2-dev
It might be better not to use "sudo" to avoid messing up the libraries that come with your Linux distribution. Alternatively, you can use PyCharm and use a virtual environment where all of these libraries get installed.
sudo pip3 install Cython sudo pip3 install numpy sudo pip3 install pycodec2
The codec2 examples you can find on the internet are presumably specially chosen to go well with the algorithm. Let's grab a video from youtube using a Youtube Video Downloader. This will give you an mp4-file. Strip the audio from that video and convert the audio to 8kHz mono and strip it down to the first two minutes using a only a single command:
ffmpeg -i Who\ Invented\ the\ Food\ Pyramid\ and\ Why\ You\'d\ Be\ Crazy\ to\ Follow\ It.mp4 -acodec pcm_s16le -ac 1 -ar 8000 -t 00:02:00 out.wav
Then using codec2 on that file is as simple as:
c2enc 1200 ve9qrp.wav ve9qrp.bit c2dec 1200 ve9qrp.bit ve9qrp_decoded.raw ffmpeg -f s16le -ar 8k -ac 1 -i ve9qrp_decoded.raw ve9qrp_decoded.wav
The sample from the Youtube video, after running it through codec2 sounds like this. No, it doesn't sound great, but keep in mind that the original video has a 44.1kHz stereo signal. Converting that to 8kHz mono already has an audible impact. Passing it through a 1200bps codec2 tunnel is responsible for the other artifacts.
Then run it using:
python3 example.py ve9qrp.wav
To simultaneously encode/decode this 1'52s audio fragment, the Wandboard (iMX6Q) needs 8.21s. So real-time implementation should be possible.
Example.py will create output.raw which will contain the wav file encoded and then decoded by codec2. You can listen to it with:
aplay -D hw:CARD=imx6wandboardsg,DEV=0 -f S16_LE output.raw
The -D option is to send the audio to line-out on a Wandboard. You could remove it when you're using other hardware.
You can encode and decode a file by using python or using cli-tools and compare the results. They will be identical. I simplified the example.py by using np.frombuffer() and np.asarray(), so the "import struct" is no longer necessary.
play_sound_samplerate_codec2 : works fine on a core i7,8thGen and on the Wandboard.
record_play_codec2 : get input from line-in, encode it to Codec2, decode it and output it to line-out. Works both on the Wandboard and on the laptop. On the Wandboard, the samplerate filter had to be reduced from "sinc_best" to "sinc_medium", otherwise there was no sound.
Requirements for pyaudio:
sudo apt install libportaudio2 libportaudiocpp0 python3-pip portaudio19-dev sudo pip3 install pyaudio sudo pip3 install sounddevice
Some simple test applications:
Codec2 works with a fixed sample rate of 8kHz. The line-in and line-out work at 48kHz. A sample rate conversion is needed. Several options are considered:
No libraries need to be installed. The implemented filters are simple first order filters and it certainly sounds like that.
This works both on the Core i7 8thGen and the Wandboard.
sudo pip3 install samplerate
This library runs fine on a Core i7 8thGen and on the Wandboard. Be sure to convert the output of this library to int16 before feeding it back to pyAudio. Python doesn't complain about wrong types as a C++ program would, but the result sounds terrible.
This "sync-best"-filter is audibly the best performing filter.
Works fine on a Core i7 8thGen. Takes more than two hours to build on a Wandboard. This package has more than 300MB of dependencies.
sudo apt install gfortran llvm-dev libblas3 liblapack3 liblapack-dev libblas-dev sudo pip3 install resampy
play_sound_resampy : open a 8kHz wave file, up-sample it to 48kHz and then play it. Works fine on a Core i7 8thGen. On the Wandboard, it takes much longer to start and it hangs. No sound is ever generated.
This depends on the chipset and modulation used. Typically a preamble is inserted here. LoRaWAN uses an 8 symbol preamble.
FreeDV uses ethernet frames : 2x 6byte address, 2byte length, payload (46 to 1500bytes), FCS(=CRC32):
As the walkie talkie will use digital voice transmission, we need a way to digitize speech. Several open source speech codecs are available. We will focus on low-bitrate codecs because we want long range. Opus and Speex won't do. There's one codec that excels : codec2 :
sudo apt install codec2
After installation of codec2, the raw audio file test samples are available in /usr/share/codec2/raw
I tried playing with the 700bps bitrate, but that never yielded results that were easily understandable. If you had a conversation with someone using this bitrate, you would frequently have to ask to repeat sentences.
1200bps seems to me the minimum practically achievable bitrate.
christoph@christoph-ThinkPad-L580:/usr/share/codec2/raw$ c2enc 1200 ve9qrp.raw ~/ve9qrp.bit --natural && c2dec 1200 ~/ve9qrp.bit ~/ve9qrp_codec2_1200.raw --natural && aplay -f S16_LE ~/ve9qrp_codec2_1200.raw <br><br> max_amp: 80 m_pitch: 320 p_min: 20 p_max: 160 Wo_min: 0.039270 Wo_max: 0.314159 nw: 279 tw: 40 max_amp: 80 m_pitch: 320 p_min: 20 p_max: 160 Wo_min: 0.039270 Wo_max: 0.314159 nw: 279 tw: 40 Playing raw data '/home/christoph/vk5qi_codec2_1200.raw' : Signed 16 bit Little Endian, Rate 8000 Hz, Mono christoph@christoph-ThinkPad-L580:/usr/share/codec2/raw$
For your audio playback convenience, these raw files have been converted to WAV-file format using:
sox -e signed-integer -b 16 -r 8000 -c 1 ve9qrp.raw ~/ve9qrp.wav
So we see that codec2 achieves a 128k/1.2k = 106.7/1 compression ratio. That's truly impressive.
Of course, this compression ratio comes at a price : computational complexity. There's no way you could pull this off in real time with an AVR-microcontroller. You need at least an MCU with an FPU, such as the STM32F4. An ESP32 doesn't seem suited either. A Raspberry Pi could be used at the cost of higher current consumption.
Become a member to follow this project and never miss any updates