It works! 2-Way Network Audio Relay

I'm happy to report that the 2-way network audio interface branch of this project is fully operational!

I have to admit, I dove into this quite naively, expecting it to be trivial to splice my audio-in and audio-out demos together with some simple logic to handle transitions between transmitting and receiving modes. That mode switching logic did fall into place quite easily, but there were many other issues that made this an uphill battle.

First, when I embarked on developing this audio relay method, I was planning to take advantage of both of the ESP32's I2S engines, assigning one to audio input and the other to audio output. Turns out the onboard DAC and ADCs are only accessible through the base I2S engine, I2S0. The fun thing is, if you try to assign I2S1 to handle the built-in DAC, the ESP32 crashes, with useful debugging info in the backtrace, whereas if you assign I2S1 to handle the built-in ADC, it doesn't crash, it just silently doesn't work. This set me a back several days. After finally realizing that both streams would need to go through I2S0, it took another several more days to work out the right sequence of function calls (and timing!) to reliably hot swap the I2S0 interface between the DAC and the ADC.

The need to thread both audio-in and audio-out through a single I2S engine forced me to confront mode switching sooner than I had planned, but on the plus side, that turned out to be quite easy to do. In a nutshell, the switching logic is embedded in the UDP-receive loop; it transitions to transmitting mode when receiving non-zero data from the PC, otherwise falls back to receiving mode. In pseudocode:

For each iteration of the UDP-receive loop,

    If UDP received && valid VBAN packet && contains non-zero data,
        reset no-data counter
        if mode==RECEIVING,
            transition to TRANSMITTING
            send [N] zeroes to DAC to transmit quiescent tx-delay
        send packet to DAC

    Otherwise (no UDP or not VBAN or packet contains all zeroes),
        increment no-data counter
        if mode==TRANSMITTING && no-data counter >= persistence,
            transition to RECEIVING

In the above pseudocode, each of the "transition to" lines is a function call to a mode transition handler which basically shuts down i2s, clears the DMA buffer, triggers or releases PTT depending on the commanded new mode, and restarts i2s with either the DAC driver for transmission or the ADC driver for receiving. Note that, immediately after transitioning to TRANSMITTING mode, I send several DMA buffer widths of zeroes to I2S to have it stream quiescence for a brief period after triggering PTT and before writing packet data to analog-out. This is basically a crude, hardcoded-for-now implementation of what's commonly referred to as "tx-delay". I'll parameterize this later and make it configurable via html interface.

Another problem I ran into was configuring the DMA buffer lengths. In the outgoing audio stream (PC->ESP32), VBAN automatically selects packet widths between 64 samples per packet when the sampling rate is 11025Hz, and 256 samples per packet when sampling 44100 Hz or faster. So VBAN sampling rates of 11025, 22050, and 44100 each produce power-of-two samples per packet, whereas even-thousands sampling rates (i.g. 12000, 24000, 32000) generate packets with non-power-of-two samples per packet. Why does this matter? It seems I2S works best with power-of-two DMA buffer widths, otherwise it seems the ADC stutters. So if I set VBAN sampling rate to 12000 Hz, VBAN sends 70 or so (I don't recall precisely) samples per packet, and if I pipe those directly into DMA without interim buffering then I2S stutters. It's possible that this apparent pattern is due to something I misconfigured during my experimentation, but I basically concluded that this should only be used with VBAN sampling rates of 11025, 22050, 44100 Hz, or higher (all sample rates above 44100 produce 256 samples per packet).

The working code is posted here:
https://github.com/rkinnett/ESP32-2-Way-Audio-Relay

Here's the schematic for the audio ins-and-outs and PTT:

Not much to it. Note that the received-audio bias trim pot should be manually adjusted such that the quiescent voltage on pin 33 is about 1.65V, the middle of the ADC's ~3.3V range. Alternatively, this pot can be replaced by a voltage divider with the upper resistor resistance value roughly 2x that of the lower resistor. I arbitrarily trimmed the Mic-in bias pot to roughly 50% and that seemed to work. Using a second radio to listen to transmissions from the ESP32-UV5R pair, the audio level sounded just fine, not too quiet and not over-deviating. I did not use a coupling capacitor on Mic-In to take out the inherent 1.65V DC bias from the ESP32 DAC, and I'm not sure why that works unless the UV-5R has an internal DC-blocking cap.

On the PC side, I'm using VBAN Banana (more on that later..) to receive and send audio over Wifi to and from the ESP32, and VBAN Banana routes the incoming and outgoing streams to and from soundmodem which handles the AFSK packet modulation/demodulation. I then connect either Winlink Express or PinPoint APRS to soundmodem.

After dialing in the biases and working out some kinks in PTT control, I finally tried reaching the nearest Winlink gateway which is ~10 miles away with hills in the way - not trivial for a 5W HT, but doable. It took a few hours of fiddling, but eventually I got a response! I kept working it and tweaking things until Winlink Express was able to maintain a link, then I sent a position report and was thrilled to see my position marker show up on the Winlink map. Success! That was one successful message transfer out of dozens of attempts, so this still has a long way to go.

Next, I moved my setup outside so I could point my Arrow-II Yagi toward the Winlink gateway, but discovered a new problem: the UDP streams became far too unreliable after moving 30' away from my router. That's slightly disappointing, but not really a surprise, I guess. That lead me down another rabbit hole, setting up the ESP32 as an access point. That by itself isn't difficult, but for the life of me I could not get 2-way UDP streaming working when I connected my laptop to the ESP32 AP. After wasting a couple of hours on this, I eventually discovered an isolated note in a forum somewhere, saying that VBAN Voicemeeter doesn't play well with non-router access points. Didn't see that coming! Fortunately, it was an easy fix. VBAN Banana doesn't have this problem.

It was starting to feel like forces were conspiring against me on this project..

Anyways, after getting things working again with the ESP32 as its own AP, I set Winlink aside and started playing with PinPoint APRS through soundmodem, and got that working fairly quickly. That is fun! I'm easily picking up other people's APRS beacons from 30 miles away, and even hundreds of miles away with a digipeater in between. My own beacons are reliably picked up by an iGate about 9 miles away.

So, in summary, 2-way network audio is working semi-reliably and playing nicely with APRS, both receiving and transmitting. I have at least one existence-proof data point with Winlink but I think that will work reliably too when I get back to it, using a beam antenna. I might try this during an upcoming Winlink practice session with my local ARES group.

I'm also very much looking forward to bouncing some APRS messages off satellites!

I'm planning to write a separate log to recap progress and work-to-go. I'm thinking about stopping here, just using the ESP32 as a network audio relay, without finishing developing onboard modulation and demodulation.

Streaming Audio from PC to ESP32 Analog-Out

Looking for suggestions, collaborators

Discussions

Become a Hackaday.io Member