Close
0%
0%

digital-walkie-talkie

long range, low power, modular

Similar projects worth following
The final goal is to have a long range, low power, low bandwidth, decent audio quality and low cost radio.Before attempting that build, some experimentation will be done around the main building blocks of this project:
* voice codec
* digital data transmission : reliability
* audio quality: needed bitrate
We'll start of using existing hardware : a laptop and a Wandboard, both are equipped with audio inputs and outputs and a network connection. That's the minimal setup for the application.First experiments will be done in Python. It has the advantage that it's simple, popular and widely documented . The same Python code runs on desktop and on the embedded platform.

Progress

Voice codec

☑ Selection of voice codec : Codec2
☑ Implementing voice codec on embedded platform : esp32-codec2
☐ Making unit test for voice codec
☐ Turning Codec2 into a standalone Arduino library, which will allow for easier integration by third parties.

Audio streaming

☑ Audio playback : Sine output by I²S on ESP32's internal DAC : esp32-dds (direct digital synthesis)
☑ Decoded Codec2 audio output on ESP32's internal DAC : esp32-codec2-DAC
Audio capture (through I²S)
☐ Audio feed-through : not possible without external I2S peripherals. Internal DAC and ADC only work on I2S0. I2S0 can be set as output or input, not both at the same time.

Issues skipped

Issues listed here are of minor priority.  They are not blocking further development of the project.  A working prototype is considered more important.  If we would spend too much time on the issues listed here, we could get bogged down into trouble, potentially leading to never building something that works.

  1. Find out why codec2 on the ESP32 doesn't yield the same codec2 packets as the codec2 on the PC, although they're based on the same code base.  ESP32-codec2 packets can successfully be decoded by c2dec on the PC, so the two implementations are still compatible.

youtube_codec2_1200.wav

Audio from a Youtube video run through codec2

x-wav - 1.83 MB - 09/09/2020 at 19:41

Download

ve9qrp.txt

audio transcription

plain - 1.15 kB - 08/28/2020 at 18:54

Download

ve9qrp.wav

original voice sample

x-wav - 1.72 MB - 08/09/2020 at 14:56

Download

ve9qrp_codec2_1200.wav

voice sample passed through codec2 voice codec

x-wav - 1.72 MB - 08/09/2020 at 14:56

Download

  • References

    Christoph Tack09/16/2020 at 19:43 0 comments

    Prior art

    nRF24 based

    1. Long Range Arduino Based Walkie Talkie using nRF24L01 : many similar projects, all using the RF24Audio library.

    RFM12 based

    1. Walkie Talkie Duino using RFM12B "open source" (only to the Kickstarter backers).

    Analog FM based

    1. DRA818V analog FM module (lots of harmonics, will need a license to operate)
    2. HamShield (by Casey Halverson), also available in Mini version.
      1. HamShield on Tindie
      2. Kickstarter
      3. Hackaday.io
      4. Instructables
      5. Github
      6. InductiveTwig

    Comparable projects

    LoRa

    HamShield LoRa (by Casey Halverson)

    STM32

  • Hardware choices

    Christoph Tack09/15/2020 at 18:09 0 comments

    Platform

    Flash size requirements

    The codec2 library needs about 87KB, while the RadioLib needs about 10KB.  Then, there's also the base Arduino libraries.  And we still need to add our own code.  To be on the safe side, a device with at least 256KB of flash will be needed.

    ESP32

    Test application built, based on Arduino codec2 library, but it crashed.  This has been solved with esp32-codec2.

    Some presumably also got it to work before I did, but they are unwilling to share their source code:


    STM32

    STM32F4Discovery

    Rowetel/Dragino Tech SM1000

    NUCLEO-L432KC

    Runs only on 80MHz, might be an option to shrink size.

    More info on STM32 development

    nRF52

    64MHz

    github : Implements codec2 on a Adafruit Feather nRF52 Bluefruit LE.

    Codec2 has been modified so that it can be built using Arduino framework.  I doubt this implementation is working correctly.

    Audio IO

    I²S

    On ESP32, using I²S is definitely advantageous because it can use DMA, which off-loads the reading and writing audio data from the processor.

    As we're only processing low quality 8kHz speech here, a high-end audio codec like the SGTL5000 is not necessary, but it might be a good choice after all:

    1. open source support (pjrc)
    2. I²S sink & source in a single device.
    3. High quality audio might be useful for other projects and designs.
    4. Extra features:
      1. Input: Programmable MIC gain, Auto input volume control
      2. Output: 98dB SNR output, digital volume
    5. Development board price is acceptable.

    PWM-DAC & ADC

    The SM1000 and NucleoTNC contain the analog circuitry we need.

    Adafruit Voice Changer also features some form of audio pass-through

    LoRa module

    Low power, low cost modules

    High power modules

    NiceRF

    • NiceRF LORA1268F30-433
      • 33dBm
      • works on 3V3 (max. 28dBm), needs 6VDC for 33dBm
      • min. TX-pwr = 10dBm on 3V3.
      • no coax-connector
      • 38x20mm
      • 5mA RX
      • 2µA sleep
      • LoRa RX-sensitivity (BW=62.5 KHz, SF = 12 CR=4/5) = -139dBm
    • NiceRF LORA1278F30-433
      • 30dBm
      • 13mA RX
      • 10µA sleep
      • LoRa RX-sensitivity (BW=125 KHz, SF = 12 CR=4/5) = -139dBm

    EByte

    • E19-433M30S
      • 3V3 to 5V
      • 25x37mm
      • 20mA RX
      • 3µA sleep
      • has u.fl-connector
      • max. TX-power 29.5dBm
      • LoRa RX-sensitivity figures unrealistic
    • E22-400M30S
      • 2V5 to 5V5
      • 24x38.5mm
      • 14mA RX
      • 3µA sleep
      • has u.fl-connector
      • max. TX-power 21.5dBm
      • LoRa RX-sensitivity figures unrealistic

    HopeRF

    • RFM98PW
      • 5V to 6V
      • 18x35mm
      • 15mA RX
      • 5µA sleep
      • no coax connector
      • max. TX-pwr=30dBm
      • LoRa RX-sensitivity (SF12, BW=125kHz) = -136dBm
    • Seeedstudio RFM98
      • 2mm pitch, not breadboard friendly

    Microchip

    RN2483, includes an MCU so there's no direct access to the transceiver. 

  • Codecs

    Christoph Tack08/28/2020 at 19:16 0 comments

    Codec2

    • open source, royalty free replacement for AMBE.  At 2400bps, AMBE+ still performs better than Codec2.
    • down to 700bps
    • already implemented on embedded platforms : STM32, nRF52
    • used in FreeDV, QRadioLink

    Opus

    • open source, royalty free
    • replacement for Speex
    • down to 6kbps
    • used in VoIP-applications (e.g. WhatsApp)

    MELPe

    • NATO standard
    • licensed & copyrighted

    Speech-to-text-to-speech

    Using a speech codec, data transmission can be brought down to about 1200bps.  But how can we reduce data even further.  Let's take a 1min52s mono 8kHz speech sample as an example

    1. ve9qrp.wav : 1.799.212 bytes : 128000bps
    2. ve9qrp.bin : codec2 1200bps encoded : 16.866 bytes : 1200bps
    3. ve9qrp.txt : audio transcription of ve9qpr.wav : 1.178 bytes : 85bps

    Using codec2, we get a 106/1 compression ratio, using text 1527/1 compression ratio.  That's almost fifteen times better!

    If we use speech recognition on the transmit side, then send the transcription over and use speech synthesis on the receiving side, this might work.

    Speech recognition

    https://pypi.org/project/SpeechRecognition/

    Even though speech recognition engines are very good these days, they're still not as good as human beings.  The transcript text could be shown to the speaker during transmission.  In case of errors, the speaker could repeat the incorrect word or spell out its characters.

    A speech engine also requires a language preset.  That shouldn't be too much of a hurdle because we all speak the same language most of the times.

    Is there good speech recognition software that runs offline?

    Is speech recognition software not too power hungry?

    Speech synthesis

    Using Linux command line tools

  • Wireless communication

    Christoph Tack08/26/2020 at 19:25 0 comments

    WiFi

    I'm not planning to turn this into an actual application.  It's only a proof of concept.  If you want a VoIP solution that you could really use in your application, have a look at Mumble.  It's used in the RigPi.

    TCP

    • wireless_mic_TCP : a server is connected to an audio source.  Clients can connect to this server and output the received sound to an audio sink.  One way transmission only.  In contrast to the example it was based on, this implementation is fully non-blocking.
    • wireless_2way_audio_TCP : simultaneous 2 way audio system.  Voice-intercom over TCP-connection.  A very simple VoIP application in some way.
    • wireless_2way_audio_TCP_codec2 : the same application as above, but transferred data is first encoded with codec2.

    UDP

    LoRa

    • 433MHz
    • maximum effective bitrate : 37.5kbps
    • 256 byte FIFO, shared for RX & TX
    • Critical parameters:
      • spreading factor,
      • modulation bandwidth
      • error coding rate

    References

  • Codec2 evaluation

    Christoph Tack08/15/2020 at 12:01 0 comments

    Evaluation of Codec2 will be done using command-line tools and python tools.

    Command line tools

    sudo apt install codec2

    Some experiments

    After installation of codec2, the raw audio file test samples are available in /usr/share/codec2/raw

    I tried playing with the 700bps bitrate, but that never yielded results that were easily understandable.  If you had a conversation with someone using this bitrate, you would frequently have to ask to repeat sentences.

    1200bps seems to me the minimum practically achievable bitrate.

    christoph@christoph-ThinkPad-L580:/usr/share/codec2/raw$ c2enc 1200 ve9qrp.raw ~/ve9qrp.bit --natural && c2dec 1200 ~/ve9qrp.bit ~/ve9qrp_codec2_1200.raw --natural && aplay -f S16_LE ~/ve9qrp_codec2_1200.raw 
    max_amp: 80 m_pitch: 320
    p_min: 20 p_max: 160
    Wo_min: 0.039270 Wo_max: 0.314159
    nw: 279 tw: 40
    max_amp: 80 m_pitch: 320
    p_min: 20 p_max: 160
    Wo_min: 0.039270 Wo_max: 0.314159
    nw: 279 tw: 40
    Playing raw data '/home/christoph/vk5qi_codec2_1200.raw' : Signed 16 bit Little Endian, Rate 8000 Hz, Mono
    christoph@christoph-ThinkPad-L580:/usr/share/codec2/raw$ 

    For your audio playback convenience, these raw files have been converted to WAV-file format using:

    sox -e signed-integer -b 16 -r 8000 -c 1 ve9qrp.raw ~/ve9qrp.wav

    File sizes:

    • ve9qrp.raw (original file) : 16bit samples, 8kHz sampling = 128ksps -> 1799168 bytes (WAV-file version)
    • ve9qrp.bit (codec2 1200 encoded) : 1.2ksps : 16866 bytes
    • ve9qrp_codec2_1200.raw (decoded) : 1799040 bytes (WAV-file version)

    So we see that codec2 achieves a 128k/1.2k = 106.7/1 compression ratio.  That's truly impressive.

    Of course, this compression ratio comes at a price : computational complexity.  There's no way you could pull this off in real time with an AVR-microcontroller.  You need at least an MCU with an FPU, such as the STM32F4.  An ESP32 doesn't seem suited either.  A Raspberry Pi could be used at the cost of higher current consumption and a lot longer startup time.  Would you want a walkie-talkie that you switch on and have to wait a minute before you can use it?  I wouldn't.

    TEST0: offline encoding & decoding using cli-tools

    The codec2 examples you can find on the internet are presumably specially chosen to go well with the algorithm.  Let's grab a video from youtube using a Youtube Video Downloader.  This will give you an mp4-file.  Strip the audio from that video and convert the audio to 8kHz mono and strip it down to the first two minutes using a only a single command:

    ffmpeg -i Who\ Invented\ the\ Food\ Pyramid\ and\ Why\ You\'d\ Be\ Crazy\ to\ Follow\ It.mp4 -acodec pcm_s16le -ac 1 -ar 8000 -t 00:02:00 out.wav

     Then using codec2 on that file is as simple as:

    c2enc 1200 ve9qrp.wav ve9qrp.bit
    c2dec 1200 ve9qrp.bit ve9qrp_decoded.raw
    ffmpeg -f s16le -ar 8k -ac 1 -i ve9qrp_decoded.raw ve9qrp_decoded.wav

    The sample from the Youtube video, after running it through codec2 sounds like this.  No, it doesn't sound great, but keep in mind that the original video has a 44.1kHz stereo signal.  Converting that to 8kHz mono already has an audible impact.  Passing it through a 1200bps codec2 tunnel is responsible for the other artifacts.

    TEST1: Sine waves

    It's easy to generate sine waves online and then downsampling them to 8kHz  (sox 440.wav -r 8000 440_8kHz.wav).  Unfortunately, pure sine waves are filtered completely out by codec2.

    Python

    When you're using Ubuntu, version > 19 is needed.

    sudo apt install python3-pip libcodec2-dev

    It might be better not to use "sudo" to avoid messing up the libraries that come with your Linux distribution. Alternatively, you can use PyCharm and use a virtual environment where all of these libraries get installed.

    sudo pip3 install Cython
    sudo pip3 install numpy
    sudo pip3 install pycodec2
    

    TEST1: offline encoding & decoding using python

    Download example.py:

    wget https://raw.githubusercontent.com/gregorias...
    Read more »

  • Python : audio capture & playback

    Christoph Tack08/10/2020 at 11:32 0 comments

    Requirements for pyaudio:

    sudo apt install libportaudio2 libportaudiocpp0 python3-pip portaudio19-dev
    sudo pip3 install pyaudio
    sudo pip3 install sounddevice

    Some simple test applications:

    • play sound from WAV-file: plays 8kHz files on Wandboard, but not on my PC, as 8kHz is not supported by the hardware.
    • record sound from mic/line-in to WAV-file
    • audio pass through from mic/line-in to line-out, implemented in two ways: using pyaudio and using sounddevice library.  On the Wandboard, the sounddevice library only seems to work for 30s or so.  After that, there's sonic boom on the output.  The Wandboard needs to be power-cycled to restore sound output.  The pyaudio library doesn't seem to have that issue and keeps playing Judas Priest without problems.

    References:

    Audio resampling

    Codec2 works with a fixed sample rate of 8kHz.  The line-in and line-out work at 48kHz.  A sample rate conversion is needed.  Several options are considered:

    audioop.ratecv

    No libraries need to be installed.  The implemented filters are simple first order filters and it certainly sounds like that.

    This works both on the Core i7 8thGen and the Wandboard.

    play_sound_ratecv

    samplerate

    sudo pip3 install samplerate

    This library runs fine on a Core i7 8thGen and on the Wandboard.  Be sure to convert the output of this library to int16 before feeding it back to pyAudio.  Python doesn't complain about wrong types as a C++ program would, but the result sounds terrible.

    This "sync-best"-filter is audibly the best performing filter.

    play_sound_samplerate

    resampy

    Works fine on a Core i7 8thGen.  Takes more than two hours to build on a Wandboard.  This package has more than 300MB of dependencies.

    sudo apt install gfortran llvm-dev libblas3 liblapack3 liblapack-dev libblas-dev
    sudo pip3 install resampy

    play_sound_resampy : open a 8kHz wave file, up-sample it to 48kHz and then play it.  Works fine on a Core i7 8thGen.  On the Wandboard, it takes much longer to start and it hangs.  No sound is ever generated.

  • Data layers

    Christoph Tack08/09/2020 at 21:15 0 comments

    Physical layer

    LoRa

    • EU433 ISM-band (less stringent than EU863-870MHz ISM Band)
    • Maximum TX-power = 12.15dBm EIRP
    • TX-duty cycle < 10%

    This depends on the chipset and modulation used.  Typically a preamble is inserted here.  LoRaWAN uses an 8 symbol preamble.

    Data link layer

    LoRa

    The walkie-talkie application doesn't need that amount of overhead. We can suffice with the OSI layer 2 provided by the wireless chipset:
    Semtech SX127x Layer 2, Variable length frame

    Depending on the chosen Codec2 bitrate, we'll have to transfer 8bytes 50 times/s down to 6bytes 25 times/s.

    Libraries

    1. RadioLib
    2. RadioHead
    3. LoRaLayer2

  • Codec2 configuration

    Christoph Tack08/09/2020 at 14:48 0 comments

     As the walkie talkie will use digital voice transmission, we need a way to digitize speech.  Several open source speech codecs are available.  We will focus on low-bitrate codecs because we want long range.  Opus and Speex won't do.  There's one codec that excels : codec2 :

    • bitrates as low as 700bps possible (but not usable, see Codec2 evaluation)
    • open source
    • existing implementation on PC, STM32 and nRF52
    • used in ham radio applications (e.g. FreeDV)

    Codec2 technical details

    Audio input format

    16bit signed integer, 8kHz sample rate, mono

    Codec2 packet details

    Reference : codec2 source code

    Encoded Data rate [bps]
    Bits/packetBytes/packetTime interval [ms]
    Packets/s
    320064820
    50
    240048620
    50
    160064840
    25
    140056740
    25
    120048640
    25

    When using one of the lowest three data rates, there's a drawback that loosing a single packet will cost you 40ms of audio.

View all 8 project logs

Enjoy this project?

Share

Discussions

Christoph Tack wrote 11/12/2020 at 12:55 point

If I wanted to use Opus over wifi, the easiest solution would be to open WhatsApp on my smartphone and start a call, wouldn't it?  I want to try to improve on the common VHF/UHF HT.  Wifi isn't suitable for that because of its limited range and bad penetration through buildings.  You could use directional antennas, but how will you keep them aligned?  I opted for codec2 because it also works on very low bitrates (<6kbps).  Lower bitrates also lead to a longer range.
You're right that I mesh-protocol won't do for voice comms.  There'll be too much latency and throughput will be an issue as well.  I'm aware of the Disaster Radio and meshtastic project, but I think there's little I can reuse from them.

  Are you sure? yes | no

Daniel Dunn wrote 11/11/2020 at 20:00 point

What about using the Opus codec via WiFi?   It would be near-impossible to switch between them automatically in a multicast environment, but you could let the user decide.

Mesh infrastructure like BATMAN is probably going to be plenty fast for 48kbps Opus, and it will take so much of the load off 915Mhz, which we all need to try hard not to totally trash.

  Are you sure? yes | no

Christoph Tack wrote 10/28/2020 at 19:48 point

I'll first try to get my hands on a STM32F4Discovery board (new or old version).  These seem to be out of stock everywhere.  I haven't made up my mind yet on the audio transducers.  I prefer to design in something that can easily be replicated.

  Are you sure? yes | no

Simon Merrett wrote 10/29/2020 at 08:17 point

How about taking a chance with https://uk-m.banggood.com/STM32F407VET6-Development-Board-Cortex-M4-STM32-Small-System-ARM-Learning-Core-Module-p-1460490.html

Or you could look at using a slightly different model (F411 for example). I do think a general port to more readily available microcontrollers would be fantastic. I know esp32 would be on many people's list but I would prefer SAMD51. 

  Are you sure? yes | no

Simon Merrett wrote 10/29/2020 at 09:03 point

Doh, that's the wrong one, no? Aren't you after the stm32f405? 

  Are you sure? yes | no

Christoph Tack wrote 11/01/2020 at 18:33 point

Because I had the ESP32, I started implementing it on it.  After finding a bug in Codec2 and tripling ESP32's task memory I have an application now that takes an 40ms audio frame, encodes it (takes 10ms) and then decodes it (takes 24ms).  So real time use would be possible.  I still have to check if the decoded audio is ok.

  Are you sure? yes | no

Simon Merrett wrote 11/01/2020 at 18:59 point

Well done! May I ask what you had to change to make it work (specifically the bug)? 

  Are you sure? yes | no

Simon Merrett wrote 10/27/2020 at 21:32 point

Well found! The existing implementation is very interesting. The pdm mic filter is a handy addition. Will you try to recreate it yourself in your own hardware? 

  Are you sure? yes | no

Christoph Tack wrote 08/16/2020 at 11:13 point

Initially I'm experimenting on a Wandboard (iMX6Q) just because it happened to be in my cabinet.  I'm planning to use it on a Raspberry Pi Zero with python.  I might later use the (existing) implementation on a STM32F4, but I guess that will take a lot more effort.

  Are you sure? yes | no

Simon Merrett wrote 08/16/2020 at 14:54 point

I agree with you that it would be significant effort but illuminating to understand what the process looks like to get it into lower performance embedded systems. 

  Are you sure? yes | no

Simon Merrett wrote 08/11/2020 at 07:35 point

Codec2? I'm intrigued to see what processor you port this to it would be fantastic to have a way of using it on more embedded devices. Very excited to follow your project. Thanks for posting it. 

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates