Audio streaming over serial + USB

A project log for 1 dollar TinyML

Can we build Machine Learning enabled sensor for under 1 USD?

jon-nordbyJon Nordby 06/17/2024 at 18:160 Comments

TLDR: Audio data can be streamed to computer over serial to USB. And using a virtual device in ALSA (or similar), we can record from the device as if it was a proper audio soundcard/microphone.

In a previous post we described the audio input of the prototype board, using the Puya PYF003 microcontroller. It consists of a 10 cent analog MEMS microphone, a 10 cent operational amplifier, and the internal ADC of the PY32. To check the audio input, we need to be able to record some audio data that we can analyze.

The preferred way to record audio from a small microcontroller system would be to implement audio over USB using the Audio Device Class, and record on a PC (or embedded device like RPi). This ensures plug & play with all operating systems, without needing any drivers. Alternatively, one could output the audio from microcontroller on a standard audio procotol such as I2S, and then use a standard I2S to USB device to get the data onto the computer. Example: MiniDSP USBStreamer.
However, the Puya PY32F003 (and most other sub 1 USD microcontrollers), does not support USB nor I2S. So instead we will stream the audio over serial, and use a serial-to-USB adapter to get it on the PC. This requires some custom code, as there are no standards for this (to my knowledge at least).

Streaming audio over serial

Since the serial stream is also our primary logging stream, it is useful keep it as readable text. This means that binary data, such as the audio PCM must be encoded. There are several options here. I just went with the most widely supported, base64. It is a bit wasteful (33% increase), but it is good-enough for our usages.
A default baudrate of 115200 in PY32 examples, on the other hand, will not do. The bandwidth needed for 8kHz sample rate of 16 bit PCM, base64 encoded is at least 2*8000*(4/3)*8 = 170 kbaud (ignoring overheads for message framing). Furthermore, the standard printf/serial communication is blocking:
So any time spent on sending serial data, is time the CPU cannot do other tasks.
It would probably be possible to set up DMA buffering here, but that would be additional complexity.
I tested the PY32 together with an FTDI serial-to-USB cable. It worked at least up to 921600 baud, which is ample.

The messages sent look like this going over the serial port. The data part is base64 encoded PCM for a single chunk of int16 PCM audio.


Receiving is the data done with a Python script, using pyserial. The script identifies which of the serial messages are PCM audio chunks, and then decodes and processes them. Other messages from the microcontroller are logged out as-is.

Virtual soundcard using loopback

Getting the audio into our script on the PC side is useful. But preferably, we would like to use standard audio tools, and not have to invent everything ourselves. So the processing script takes the received audio data, and write it to an output sound device, using the sounddevice library. This allows playing it back on our speaker, which allows for simple spot checking. But even more useful is to use a loopback device, to get a virtual sound card for our device.

I tested this using ALSA loopback, which creates a pair of ALSA devices. The script can then write to one device, and a standard program that supports ALSA (which is practically everything on Linux) can read the audio stream from the other device.

# read data from serial, output to ALSA virtual device
python User/ --serial /dev/ttyUSB0 --sound 'hw:3,0'

# record audio from ALSA virtual device
arecord -D hw:3,1 -f S16_LE -c 1 -r 8000 recording.wav

Note: There is nothing ALSA specific about the Python script, so this approach should also work with other sound systems that support virtual devices. Such as PulseAudio/PipeWire on Linux, or on Mac OS or Windows.

Audio recording using ADC with PY32

Audio recording must be done at high samplerates (8kHz+) and at precise timing (no/minimal jitter). For that, we use the timer peripheral in the PY32, and wire it up directly to the DMA subsystem. This way, our CPU and program is not involved at all in sampling, and we get the data as convenient blocks in a size we specify (ex: 64 samples). This is pushed onto a queue in the DMA interrupt, and can be processed at a leisurely pace in the main loop.
We used this example in the PyF0 template repository as starting point: ADC SingleConversion Trigger DMA.

Recorded audio

The following audio was recorded by playing back a song on a phone,
with the headphone jack connected to the ADC of a standard PY32F003 development board.

Audio recording of speech (as spectrogram)
Audio recording of speech (as spectrogram)

In the spectrogram, we can see the voice relatively clearly. However, there is also a bunch of noise. In particular, there is a lot of tonal noise, and occasional dropouts.

The tonal noise disappeared in one of the recordings, so it might be electromagnetic interference, for example over the USB, or from the multitude of switching power supplies around the device under test.
The occasional dropouts might be due to overflows in the audio processing queue, either on microcontroller or on host side. More buffering might fix it.


Now we can run more tests of the audio input to debug these issues. With a clean power supply and an oscilloscope at hand. Eventually, we also need to include the on-board amplifier and microphone in the tests. It would be good to have some basic measurements of the frequency response, sensitivity, and noise floor.
And we also need to write firmware that can process the audio stream and run a Machine Learning classifier.