Close
0%
0%

digital-walkie-talkie

long range, low power, modular

Similar projects worth following
The final goal is to have a long range, low power, low bandwidth, decent audio quality and low cost radio.Before attempting that build, some experimentation will be done around the main building blocks of this project:
* voice codec
* digital data transmission : reliability
* audio quality: needed bitrate
We'll start of using existing hardware : a laptop and a Wandboard, both are equipped with audio inputs and outputs and a network connection. That's the minimal setup for the application.First experiments will be done in Python. It has the advantage that it's simple, popular and widely documented . The same Python code runs on desktop and on the embedded platform.

Progress

Voice codec

☑ Selection of voice codec : Codec2
☑ Implementing voice codec on embedded platform : esp32-codec2
☐ Making unit test for voice codec
☐ Turning Codec2 into a standalone Arduino library, which will allow for easier integration by third parties.

Audio streaming

☑ Audio playback : Sine output by I²S on ESP32's internal DAC : esp32-dds (direct digital synthesis)
☑ Real time Codec2 decoding and audio output on ESP32's internal DAC : esp32-codec2-DAC
Audio capture (through I²S)
Output sine wave to external I2S Audio codec (i.e. SGTL5000)
Decode Codec2 packets in real time and output them on SGTL5000 headphone and line out. The Codec2 decoding and audio streaming is all done in tasks. The 'loop'-function has nothing to do.
Audio feed-through using SGTL5000 : it took some tweaking to adjust the input audio level to line-in levels of the SGTL5000 and headphone output volume settings.  I2S-peripheral works full duplex here, while ESP32 documentation only mentions half-duplex operation.
Real time codec2 encoding analog audio from SGTL5000's line input.  Codec packets are printed real time in base64 format to serial port
☐ Audio filtering in SGTL5000, which codec2 should benefit from.
☑ Half-duplex operation : every few seconds the codec switches between encoding and decoding.  It decodes packets stored in flash.  It encodes audio from the SGTL5000 codec.
☑ Refactoring encoding/decoding of packets.  Codec2-engine now has two separate queues for output and two separate queues for input.  Semaphores have been removed as they made the code unnecessarily complicated.

Wireless communication

☑ Generating some RTTTL music and transmitting it with the SX1278 FSK-modem on 434MHz.  The RSP1A decodes it fine using CubicSDR.  It's not very useful, but it's fun.  Using PDM, we might even be able to play rudimentary audio.
☑ SX1278 modules using RadioLib : LoRa, FSK and OOK.
☑ SI4463 module using RadioHead : 2GFSK
☑ SI4463 module using Zak Kemble's SI4463 library : 4GFSK in 6.25kHz channel spacing
☑ Adding SI4463 to RadioLib library (basic RX/TX works, but the code needs a lot of clean up)

Audio & wireless combined

☑ One way radio : transmitter station sends codec2 packets, while receiving station decodes them and plays them through the SGTL5000 on the headphone.
☑ Two way radio with PTT : both stations run the same code. When PTT-button is pushed, the station starts encoding audio from line-in of the SGTL5000 and broadcasts them using the SI4463. The other station receives the packets, decodes them, and plays the SGTL5000.  The custom main.cpp source file is less than 150 lines long.  The remainder of the code consists of reusable libraries.

Issues skipped

Issues listed here are of minor priority.  They are not blocking further development of the project.  A working prototype is considered more important.  If we would spend too much time on the issues listed here, we could get bogged down into trouble, potentially leading to never building something that works.

  1. Find out why codec2 on the ESP32 doesn't yield the same codec2 packets as the codec2 on the PC, although they're based on the same code base.  ESP32-codec2 packets can successfully be decoded by c2dec on the PC, so the two implementations are still compatible.
  2. Clean up SI4463 code (replace Silabs official header files) and add functionality (e.g. read RSSI, support for fixed length packets).

TK-3201(ET)-English.pdf

User Manual Kenwood TK-3201 PMR446 radio

Adobe Portable Document Format - 559.56 kB - 01/02/2021 at 18:55

Preview
Download

youtube_codec2_1200.wav

Audio from a Youtube video run through codec2

x-wav - 1.83 MB - 09/09/2020 at 19:41

Download

ve9qrp.txt

audio transcription

plain - 1.15 kB - 08/28/2020 at 18:54

Download

ve9qrp.wav

original voice sample

x-wav - 1.72 MB - 08/09/2020 at 14:56

Download

ve9qrp_codec2_1200.wav

voice sample passed through codec2 voice codec

x-wav - 1.72 MB - 08/09/2020 at 14:56

Download

  • Security

    Christoph Tack4 days ago 0 comments

    The advantage of using digital communication over analog is that it's much easier to implement decent security measures.  Security deals with the following properties of the information:

    • secrecy or confidentiality
    • authentication
    • non-repudiation
    • integrity control

    We won't reinvent the warm water here.  So let's see what TLS1.3 has to offer us: TLS_CHACHA20_POLY1305_SHA256.  We'll use the CHACHA20_POLY1305 authenticated encryption scheme.  As an AEAD algorithm, It provides us secrecy and message integrity.  There are other AEAD options using AES as well, but ChaCha lends itself better for MCU-use.

    Implementation

    The drawback is that more info needs to be sent over the network.  In this case, a 24byte nonce and a 16byte MAC must be transmitted.  That adds 40 bytes per packet.

    We're left with a problem.  For the AEAD algorithm to work, both parties need to share a secret session key.  How to set this up over an insecure channel?  To be continued...

  • Housing

    Christoph Tack07/18/2021 at 20:25 0 comments

    As this project doesn't have a real use case yet, there's no requirement about the housing.

    The original idea was to build the electronics in a Tongboxin C803 radio.

    Unfortunately, there's very little room for electronics.  The left side of the housing is taken up by the speaker.  The bottom side is taken up by the 18650-cells.

    The LED-segment display is soldered onto the PCB and needs to be desoldered for taking the electronics from the housing.

    Another option is to use a power bank housing.  These are fairly cheap, already contain room for some 18650 cells.  The solar panel is an extra, but probably won't be of much use.  The LED-panel on the back might be replaced by some TFT-panel or LCD-panel.  It's transparent anyway.

    One of the things that would put me down is the probably low build quality.  There are no screws to hold it all together.

  • Data link layer

    Christoph Tack04/22/2021 at 18:08 3 comments

    Packeting

    Packet interval

    Codec2 1200bps has been selected, it needs to be fed 6 bytes every 40ms.

    dPMR uses packets that are (Header (80ms) + 4* super frame(320ms) + end (20ms)) = 1.38s long!  Using such long packets has the advantage that the overhead is relatively small for the payload.  This also implies that the FIFO is refilled as the transmission is ongoing. 

    SCIP-210, Revision 3.6 §2.1.3 : Transport framing : All frames are split up in 20 byte frames, of which 13 bytes are data.

    Packet size

    The raw data rate of Codec2 is 1200baud.  If consider that raw data will only make up 25% of the total packet interval, then we'lll need to send at least at 4800baud.  The remainder of the packet interval goes up on:

    • inter-packet dead time
    • intra packet overhead for data link layer : preamble, sync word, CRC, ...
    • intra packet overhead for transport layer (security)

    If we want to adhere more or less to dPMR, we'll want to use 6.25kHz channels.  4800baud FSK needs more than 6.25kHz bandwidth, so we'll need more bits/symbol : 4(G)FSK.

    This only leaves the SI4463 and AX5043 as options.

    For the 1200bps, FSK and OOK are still options:

    1. SX1278 : FSK : 2.4kbps BR, 4.8kHz freq.dev., 7.8kHz Rx BW.
    2. SX1278 : OOK : 3.0kbps, 5.2kHz Rx BW.

    Is there a suitable library for the SI4463?

    • RadioHead library can send 4FSK data (with a suitable config file), but can't receive it. 
    • The #NPR New Packet Radio project is 2FSK as well as 4FSK, but it might be difficult to strip the radio code from the library.
    • Zak Kemble's library was the first one I got working with 4GFSK.  But it's interrupt based and many function don't yield a return code.
    • The official SiLabs WDS3 tool can create an example project.  Unfortunately the header files are nearly unusable.  A header file with commands is generated, which is about 3800(!) line long.  Then there's also the header file listing the properties.  That one is 5800(!) lines long.  I spend more time finding the right "define" statement than it would have taken me to write the statement myself based on the HTML-documentation.

    So I decided to merge Zak's code and the official WDS3 code into my favorite radio library : RadioLib.

    Now with the library working (based on Zak Kemble's code), I noticed that sending the 10byte packes from Zak Kemble's example takes 57ms.  That's measured from the end of the 0x31 START_TX command to the falling edge of IRQ that signals a PACKET_SENT.  For 1200bps, we need to send 6 bytes every 40ms.  If we can't get the TX-time down, we'll have to group codec2 frames in a single wireless packet.  Sending 6 bytes takes 51ms (as verified with the logic analyser: time between end of START_TX and falling PACKET_SENT IRQ).  This matches with the theoretical limit: 4 bytes in 6ms = 32 bit/6ms = 5.3kbps.  The radio is configured for 2.4ksymbols/s (=4.8kbps for 4GFSK).

    SI446x potential packet structure

    The following settings are used in Zak Kemble's library:

    1. Preamble : 8 bytes (sine wave) : 2.4kbps encoded, not 4.8kbps as the rest of the packet. 
    2. Sync word : 2 bytes
    3. Field 1 : 1 byte (length of the packet)
    4. CRC-Field 1 : 2 bytes
    5. Field 2 : data bytes (e.g. 6 bytes)
    6. CRC-Field 2 : 2 bytes

    So we have 15 bytes overhead for our packet.  With respect to time, we even have 23 bytes overhead, because the preamble is sent out at half the bit rate.  So the total packet time = (23 + N) * 8 / 4800 [s], where N is the number of data bytes. 

    Recording taken with RSP1A and CubicSDR opened in Audacity.  The selected length in the image is 51ms.

    It takes 48.3ms to send a packet with 6 data bytes.  Codec2_1200 generates 6 bytes every 40ms.  So Codec2 generates the packets faster than they are transmitted.

    The following condition must be met:

    Which can be simplified and generalized to:

    N = number of data bytes, OH = number of equivalent bytes in overhead...
    Read more »

  • Audio line level

    Christoph Tack01/24/2021 at 19:35 0 comments

    Codec2 expects nominal signal levels to be able to decode data.  So for testing how the codec2 encodes our packets, the PC will generate speech audio on its line-out (what voltage level to use here?), which is connected to the left line-in of the SGTL5000 audio codec, which will convert the audio voltage levels to 16bit PCM signed samples.

    Maximum audio output voltage level

    As a laptop only has a headpone output, no line out, I used an external sound device.  The cheapest possible USB-audio card has been use here.  It only costs €2.

    To find the maximum amplitude it can deliver, we download a 1kHz sine wave 0dB file (maximum amplitude).  The values of the audio samples vary from -1 to +1.

    https://cdn.hackaday.io/images/7886221611515142020.png
    1kHz 0dB wave file in Audacity

    Play it and set your computer sound volume to maximum. Then measure the amplitude. If the wave form starts clipping, then there's a problem in your audio system.

    USB Soundcard at maximum amplitude, oscilloscope snapshot

    The unloaded headphone output of the Lenovo L580 Thinkpad even goes up to 1.68Vp (=3.32Vpp).  Remark that the SGTL5000 only accepts up to 2.83Vpp (=1Vrms) line-in voltage levels.

    Ok, so now we know that different audio sources have different maximum voltage settings.

    Nominal signal level

    Maximum signal level is -1 to +1, but what should we use as nominal signal level?  Let's download a speech sample from a news report, that one should be set correctly.

    Audacity view from news report, signal level limited from -0.7 to +0.7.

    SGTL5000 audio codec

    The analog gain stage before the ADC (controlled by the CHIP_ANA_ADC_CTRL register) of the SGTL5000 will need to be adjusted so that when a 0dB sine wave is played at maximum amplitude from the USB-sound card, it will result in 16bit samples that are also maximum amplitude.

    Let's take a 100Hz sine wave, 0dB so that we have at least 80 samples per cycle.  Remember we're using 8kHz sampling frequency because that's a codec2 requirement.  Of course we might sample at higher frequencies, but then the ESP32 would have to down sample again.

    The I2S samples could be printed to the Arduino serial plotter to get an idea of the amplitude.

  • ESP32 with SGTL5000

    Christoph Tack01/03/2021 at 14:36 0 comments

    Hardware

    The SGTL5000 uses a virtual ground for the audio outputs.  This likely makes it unsuitable for use in smartphone headsets in which the ground of the microphone is shared with the audio output.  To be tested.

    1. NodeMCU-32S
    2. Adafruit 1780 : Adafruit Accessories Audio Adapter Board for Teensy

    Generating I2S

    The annoying thing about the Adafruit audio adapter is that it's not fully open source.  These are the supply voltages:

    • VDDD = 1.8V
    • VDDIO = 3.3V (powers line out)
    • VDDA = 3.3V (powers the headphone)
    https://cdn.hackaday.io/images/6563761610045017903.png
    I2S standard format : data is one bit delayed with respect to WS edges.

    On the SGTL5000 datasheet, this is the one bit delay on I2S format with respect to the left-justified format on Figure 10. I2S Port Supported Formats.  This seems to be normal I2S behavior. 

    The delay could be removed by setting the i2s_comm_format_t in the ESP32 to 0, but I'll just leave it to the standard setting.

    The SGTL5000 considers the 16bit data as signed format.  It's analog output is inverted, which actually doesn't matter much for audio.  Voutmax corresponds to 0x8000 = -32768, while Voutmin corresponds to 0x7FFF = 32767).

    The sample code to generate a 200Hz sine wave on the left channel of line-out and headphone can be found here.

    References

    1. SGTL5000 driver on Github (by PJRC)
    2. Interfacing an audio codec with ESP32 – Part 1 and Interfacing an Audio Codec with ESP32 – Part 2
    3. Audio Adaptor Boards for Teensy 3.x and Teensy 4.x
    4. ESP32 I2S Internet Radio (with software MP3 decoding inside ESP32)
    5. esp32_audio
    6. ESP32-2-Way-Audio-Relay

  • References

    Christoph Tack09/16/2020 at 19:43 0 comments

    Commercial products

    Kiwi-tec LAP-E01

    Prior art

    nRF24 based

    1. Long Range Arduino Based Walkie Talkie using nRF24L01 : many similar projects, all using the RF24Audio library.

    RFM12 based

    1. Walkie Talkie Duino using RFM12B "open source" (only to the Kickstarter backers).

    RFM22 based

    Codec2WalkieTalkie

    Analog FM based

    1. DRA818V analog FM module (lots of harmonics, will need a license to operate)
    2. Auctus A1846S : HamShield (by Casey Halverson), also available in Mini version.
      1. HamShield on Tindie
      2. Kickstarter
      3. Hackaday.io
      4. Instructables
      5. Github
      6. InductiveTwig

    Comparable projects

    KISS modem interface

    LoRa

    HamShield LoRa (by Casey Halverson)

    STM32

    ESP32

  • Hardware choices

    Christoph Tack09/15/2020 at 18:09 0 comments

    Platform

    Flash size requirements

    The codec2 library needs about 87KB, while the RadioLib needs about 10KB.  Then, there's also the base Arduino libraries.  And we still need to add our own code.  To be on the safe side, a device with at least 256KB of flash will be needed.

    ESP32

    Test application built, based on Arduino codec2 library, but it crashed.  This has been solved with esp32-codec2.

    Some presumably also got it to work before I did, but they are unwilling to share their source code:


    STM32

    STM32F4Discovery

    Rowetel/Dragino Tech SM1000

    NUCLEO-L432KC

    Runs only on 80MHz, might be an option to shrink size.

    More info on STM32 development

    nRF52

    64MHz

    github : Implements codec2 on a Adafruit Feather nRF52 Bluefruit LE.

    Codec2 has been modified so that it can be built using Arduino framework.  I doubt this implementation is working correctly.

    Audio IO

    I²S

    On ESP32, using I²S is definitely advantageous because it can use DMA, which off-loads the reading and writing audio data from the processor.

    As we're only processing low quality 8kHz speech here, a high-end audio codec like the SGTL5000 is not necessary, but it might be a good choice after all:

    1. open source support (pjrc)
    2. I²S sink & source in a single device.
    3. High quality audio might be useful for other projects and designs.
    4. Extra features:
      1. Input: Programmable MIC gain, Auto input volume control
      2. Output: 98dB SNR output, digital volume
    5. Development board price is acceptable.

    A cheaper alternative is the Waveshare WM8960 Audio HAT (technical info).

    PWM-DAC & ADC

    The SM1000 and NucleoTNC contain the analog circuitry we need.

    Adafruit Voice Changer also features some form of audio pass-through

  • Speech codec

    Christoph Tack08/28/2020 at 19:16 0 comments

    Codec options

    Codec2

    Opus

    • open source, royalty free
    • replacement for Speex
    • down to 6kbps
    • used in VoIP-applications (e.g. WhatsApp)

    MELPe

    • NATO standard
    • licensed & copyrighted

    Speech-to-text-to-speech

    Using a speech codec, data transmission can be brought down to about 1200bps.  But how can we reduce data even further?  Let's take a 1min52s mono 8kHz speech sample as an example

    1. ve9qrp.wav : 1.799.212 bytes : 128000bps
    2. ve9qrp.bin : codec2 1200bps encoded : 16.866 bytes : 1200bps
    3. ve9qrp.txt : audio transcription of ve9qpr.wav : 1.178 bytes : 85bps

    Using codec2, we get a 106/1 compression ratio, using text 1527/1 compression ratio.  That's almost fifteen times better!

    If we use speech recognition on the transmit side, then send the transcription over and use speech synthesis on the receiving side, this might work.

    Speech recognition

    https://pypi.org/project/SpeechRecognition/

    Even though speech recognition engines are very good these days, they're still not as good as human beings.  The transcript text could be shown to the speaker during transmission.  In case of errors, the speaker could repeat the incorrect word or spell out its characters.

    A speech engine also requires a language preset.  That shouldn't be too much of a hurdle because most of us only commonly use a single language.

    Is there good speech recognition software that runs offline?

    Is speech recognition software not too power hungry?

    Speech synthesis

    Using Linux command line tools


    Codec2 Configuration

    As the walkie talkie will use digital voice transmission, we need a way to digitize speech.  Several open source speech codecs are available.  We will focus on low-bitrate codecs because we want long range.  Opus and Speex won't do.  There's one codec that excels : codec2 :

    • bitrates as low as 700bps possible (but not usable, see Codec2 evaluation)
    • open source
    • existing implementation on PC, STM32 and nRF52
    • used in ham radio applications (e.g. FreeDV)

    Codec2 technical details

    Audio input format

    16bit signed integer, 8kHz sample rate, mono

    Codec2 packet details

    Reference : codec2 source code

    Encoded Data rate [bps]
    Bits/packetBytes/packetTime interval [ms]
    Packets/s
    320064820
    50
    240048620
    50
    160064840
    25
    140056740
    25
    120048640
    25

    When using one of the lowest three data rates, there's a drawback that loosing a single packet will cost you 40ms of audio.

  • Physical layer : wireless communication

    Christoph Tack08/26/2020 at 19:25 0 comments

    Theory

    Channel capacity

    Shannon-Hartley law: C = B * log2(S/N + 1), where C is channel capacity [bps], B is bandwidth [Hz] and S/N is signal/noise ratio.

    Example: dPMR (C = 4800bps, B=6250Hz).  So S/N must be at least 0.70.

    Number of bits/symbol needed

    Nyquist's Theorem : C = 2*B *log2(N)

    Example: dPMR (C = 4800bps, B=6250Hz).  So N is 1.3bits/symbol.

    Noise floor

    –174 dBm is the thermal noise floor at room temperature in a 1-Hz bandwidth.

    e.g. for 10kHz bandwidth, the noise floor is -134dBm.

    (see Long-range RF communication: Why narrowband is the de facto standard, Texas Instruments)


    Legal limitations

    We can only make use of unlicensed bands.  Some bands only allow pre-certified equipment and fixed antennas.  Here are some options for unlicensed spectrum.  I left out the <100mW options and constrained myself to the sub-1GHz options.  If you're looking for a DIY-solution for 2.4GHz, have a look at the nRF24Audio library.

    27MHz

    1. Citizen band : 12W, SSB, 10kHz channels, 26.690MHz to 27.410MHz, some channels excluded
      1. Packet radio Germany : 27.235 MHz and 27.245 MHz
      2. Packet radio Netherlands: 27.235 MHz and 27.395(wikipedia)/27.405 MHz
      3. Packet radio Belgium : forbidden
    2. SRD : 100mW, 5 10kHz wide channels around 27MHz, <0.1% duty cycle

    169MHz

    1. SRD : 0.5W, 169.4MHz to 169.475MHz, 50kHz channels, <1% duty cycle

    446MHz

    1. PMR446 : 0.5W, ,6.25kHz or 12.5kHz, 446MHz to 446.2MHz
      1. dPMR446 aka dPMR tier 1, ETSI TS 102 490 & ETSI TS 102 587.

    823-832MHz

    1. Intercom : 100mW, BW<200kHz

    865-868MHz & 874-874.4MHz & 917.3-918.9MHz

    1. SRD : 0.5W, BW<200kHz, divided into 4 allowable sub bands, <2.5% duty cycle

    869.4-869.65MHz

    1. SRD860 : 0.5W, BW<250kHz, <10% duty cycle

    Side note

    Polite spectrum access = listen before transmit (LBT) and adaptive frequency agility (AFA).

    As contradictory as it might seem, LBT+AFA is no benefit over 10% duty cycle.  It restricts the system to 100s per hour per 200kHz bandwidth (=2.8% duty cycle), a maximum transmission time of 4s and so on... (see ETSI EN 300 220-1 V3.1.1 (2017-02), 5.21.3.1).


    Modulation types

    LoRa

    LoRa is a wide band modulation (250kHz), which forces us to keep duty cycles <10%.  To get enough throughput, SF6 would have to be used.

    Background on LoRa

    Possible ICs : SX1278/RFM98

    Parameters

    The code used is here.  The client keeps sending data to the server.  The server acknowledges each packet.  Every 10s, the server prints out a report.

    The RSSI is low because both modules are connected to a u.fl/SMA cable assembly which ends in a 50ohm load.  These cable assemblies don't perform well.  The signal level could be dropped further by removing the cable assemblies.

    RadioHead library, reliable datagram

    • Bw = 125 kHz, Cr = 4/5, Sf7 = 128chips/symbol, CRC on :
      • 10 byte/frame : Total bytes : 1160      Total packets : 116     Bitrate : 928bps   Average RSSI : -112.82  Average SNR : 6.78
      • 30 bytes/frame : Total bytes : 2700      Total packets : 90      Bitrate : 2160bps  Average RSSI : -114.97  Average SNR : 5.02
      • 60 bytes/frame : Total bytes : 3900      Total packets : 65      Bitrate : 3120bps  Average RSSI : -110.23  Average SNR : 7.68
    • Bw = 500 kHz, Cr = 4/5, Sf7 = 128chips/symbol, CRC on :
      • 10 bytes/frame : Total bytes : 4680      Total packets : 468     Bitrate : 3744bps  Average RSSI : -110.21  Average SNR : 0.70
      • 30 bytes/frame : Total bytes : 10200     Total packets : 340     Bitrate : 8160bps  Average RSSI : -108.82  Average SNR : 1.40
      • 60 bytes/frame : Total bytes : 14880     Total packets : 248     Bitrate : 11904bps Average RSSI : -108.08  Average SNR : 1.75

    Spreading...

    Read more »

  • Codec2 evaluation

    Christoph Tack08/15/2020 at 12:01 0 comments

    Evaluation of Codec2 will be done using command-line tools and python tools.

    Command line tools

    sudo apt install codec2

    Some experiments

    After installation of codec2, the raw audio file test samples are available in /usr/share/codec2/raw

    I tried playing with the 700bps bitrate, but that never yielded results that were easily understandable.  If you had a conversation with someone using this bitrate, you would frequently have to ask to repeat sentences.

    1200bps seems to me the minimum practically achievable bitrate.

    christoph@christoph-ThinkPad-L580:/usr/share/codec2/raw$ c2enc 1200 ve9qrp.raw ~/ve9qrp.bit --natural && c2dec 1200 ~/ve9qrp.bit ~/ve9qrp_codec2_1200.raw --natural && aplay -f S16_LE ~/ve9qrp_codec2_1200.raw 
    max_amp: 80 m_pitch: 320
    p_min: 20 p_max: 160
    Wo_min: 0.039270 Wo_max: 0.314159
    nw: 279 tw: 40
    max_amp: 80 m_pitch: 320
    p_min: 20 p_max: 160
    Wo_min: 0.039270 Wo_max: 0.314159
    nw: 279 tw: 40
    Playing raw data '/home/christoph/vk5qi_codec2_1200.raw' : Signed 16 bit Little Endian, Rate 8000 Hz, Mono
    christoph@christoph-ThinkPad-L580:/usr/share/codec2/raw$ 

    For your audio playback convenience, these raw files have been converted to WAV-file format using:

    sox -e signed-integer -b 16 -r 8000 -c 1 ve9qrp.raw ~/ve9qrp.wav

    File sizes:

    • ve9qrp.raw (original file) : 16bit samples, 8kHz sampling = 128ksps -> 1799168 bytes (WAV-file version)
    • ve9qrp.bit (codec2 1200 encoded) : 1.2ksps : 16866 bytes
    • ve9qrp_codec2_1200.raw (decoded) : 1799040 bytes (WAV-file version)

    So we see that codec2 achieves a 128k/1.2k = 106.7/1 compression ratio.  That's truly impressive.

    Of course, this compression ratio comes at a price : computational complexity.  There's no way you could pull this off in real time with an AVR-microcontroller.  You need at least an MCU with an FPU, such as the STM32F4.  An ESP32 doesn't seem suited either.  A Raspberry Pi could be used at the cost of higher current consumption and a lot longer startup time.  Would you want a walkie-talkie that you switch on and have to wait a minute before you can use it?  I wouldn't.

    TEST0: offline encoding & decoding using cli-tools

    The codec2 examples you can find on the internet are presumably specially chosen to go well with the algorithm.  Let's grab a video from youtube using a Youtube Video Downloader.  This will give you an mp4-file.  Strip the audio from that video and convert the audio to 8kHz mono and strip it down to the first two minutes using a only a single command:

    ffmpeg -i Who\ Invented\ the\ Food\ Pyramid\ and\ Why\ You\'d\ Be\ Crazy\ to\ Follow\ It.mp4 -acodec pcm_s16le -ac 1 -ar 8000 -t 00:02:00 out.wav

     Then using codec2 on that file is as simple as:

    c2enc 1200 ve9qrp.wav ve9qrp.bit
    c2dec 1200 ve9qrp.bit ve9qrp_decoded.raw
    ffmpeg -f s16le -ar 8k -ac 1 -i ve9qrp_decoded.raw ve9qrp_decoded.wav

    The sample from the Youtube video, after running it through codec2 sounds like this.  No, it doesn't sound great, but keep in mind that the original video has a 44.1kHz stereo signal.  Converting that to 8kHz mono already has an audible impact.  Passing it through a 1200bps codec2 tunnel is responsible for the other artifacts.

    TEST1: Sine waves

    It's easy to generate sine waves online and then downsampling them to 8kHz  (sox 440.wav -r 8000 440_8kHz.wav).  Unfortunately, pure sine waves are filtered completely out by codec2.

    Python

    When you're using Ubuntu, version > 19 is needed.

    sudo apt install python3-pip libcodec2-dev

    It might be better not to use "sudo" to avoid messing up the libraries that come with your Linux distribution. Alternatively, you can use PyCharm and use a virtual environment where all of these libraries get installed.

    sudo pip3 install Cython
    sudo pip3 install numpy
    sudo pip3 install pycodec2
    

    TEST1: offline encoding & decoding using python

    Download example.py:

    wget https://raw.githubusercontent.com/gregorias...
    Read more »

View all 12 project logs

Enjoy this project?

Share

Discussions

Simon Merrett wrote 01/14/2021 at 21:45 point

It's coming along well - good work! 

  Are you sure? yes | no

Christoph Tack wrote 11/12/2020 at 12:55 point

If I wanted to use Opus over wifi, the easiest solution would be to open WhatsApp on my smartphone and start a call, wouldn't it?  I want to try to improve on the common VHF/UHF HT.  Wifi isn't suitable for that because of its limited range and bad penetration through buildings.  You could use directional antennas, but how will you keep them aligned?  I opted for codec2 because it also works on very low bitrates (<6kbps).  Lower bitrates also lead to a longer range.
You're right that I mesh-protocol won't do for voice comms.  There'll be too much latency and throughput will be an issue as well.  I'm aware of the Disaster Radio and meshtastic project, but I think there's little I can reuse from them.

  Are you sure? yes | no

Daniel Dunn wrote 11/11/2020 at 20:00 point

What about using the Opus codec via WiFi?   It would be near-impossible to switch between them automatically in a multicast environment, but you could let the user decide.

Mesh infrastructure like BATMAN is probably going to be plenty fast for 48kbps Opus, and it will take so much of the load off 915Mhz, which we all need to try hard not to totally trash.

  Are you sure? yes | no

Christoph Tack wrote 10/28/2020 at 19:48 point

I'll first try to get my hands on a STM32F4Discovery board (new or old version).  These seem to be out of stock everywhere.  I haven't made up my mind yet on the audio transducers.  I prefer to design in something that can easily be replicated.

  Are you sure? yes | no

Simon Merrett wrote 10/29/2020 at 08:17 point

How about taking a chance with https://uk-m.banggood.com/STM32F407VET6-Development-Board-Cortex-M4-STM32-Small-System-ARM-Learning-Core-Module-p-1460490.html

Or you could look at using a slightly different model (F411 for example). I do think a general port to more readily available microcontrollers would be fantastic. I know esp32 would be on many people's list but I would prefer SAMD51. 

  Are you sure? yes | no

Simon Merrett wrote 10/29/2020 at 09:03 point

Doh, that's the wrong one, no? Aren't you after the stm32f405? 

  Are you sure? yes | no

Christoph Tack wrote 11/01/2020 at 18:33 point

Because I had the ESP32, I started implementing it on it.  After finding a bug in Codec2 and tripling ESP32's task memory I have an application now that takes an 40ms audio frame, encodes it (takes 10ms) and then decodes it (takes 24ms).  So real time use would be possible.  I still have to check if the decoded audio is ok.

  Are you sure? yes | no

Simon Merrett wrote 11/01/2020 at 18:59 point

Well done! May I ask what you had to change to make it work (specifically the bug)? 

  Are you sure? yes | no

Simon Merrett wrote 10/27/2020 at 21:32 point

Well found! The existing implementation is very interesting. The pdm mic filter is a handy addition. Will you try to recreate it yourself in your own hardware? 

  Are you sure? yes | no

Christoph Tack wrote 08/16/2020 at 11:13 point

Initially I'm experimenting on a Wandboard (iMX6Q) just because it happened to be in my cabinet.  I'm planning to use it on a Raspberry Pi Zero with python.  I might later use the (existing) implementation on a STM32F4, but I guess that will take a lot more effort.

  Are you sure? yes | no

Simon Merrett wrote 08/16/2020 at 14:54 point

I agree with you that it would be significant effort but illuminating to understand what the process looks like to get it into lower performance embedded systems. 

  Are you sure? yes | no

Simon Merrett wrote 08/11/2020 at 07:35 point

Codec2? I'm intrigued to see what processor you port this to it would be fantastic to have a way of using it on more embedded devices. Very excited to follow your project. Thanks for posting it. 

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates