A project log for Jetson Nano Convolution Reverb

Turn your unused Jetsons into a real time audio effect modules with CUDA

E/S PronkE/S Pronk 01/23/2022 at 20:170 Comments

I started working with NVIDIA Jetson a few years ago, when the Nano just came out. Learned CUDA and loved it. Soon I outgrew the Nano and after I got my first Xavier NX I never looked back. The Xavier NX was in turn shelved when I moved to the AGX Xavier.

What a waste though, they are wonderful little boards in their own right, I just need to find a nice job for them to do. I’ve been wondering if CUDA could be of any help for real time audio processing. I figured that the buffer sizes would most likely be too small to offset the overhead of copying the data to and from the GPU, and even if they weren’t, you’d have so little data that your grid size would probably be 1. 

On top of that, many filters simulate (to some extent) electronics or physics, where the new state depends on the previous state, thus calculating serially in the time domain. These algorithms are not easily parallelized. These are generally known as Infinite Impulse Response filters, or IIR. The only way to make proper use of the GPU for these filters is to have a whole stack of them, all independent of each other, and run them serially in parallel. I played with the idea of simulating all strings in a piano, or a symphonic orchestra with each violin processed in parallel.

The counterpart of IIR is FIR, Finite Impulse Response filters. It usually entails transforming a signal from the time domain to the frequency domain using Fast Fourier Transformations, then do some magic math and transform the result back. When you multiply in the frequency domain you get convolution in the time domain and vice versa. This can be used in in audio effects to simulate anything from amplifiers to stereo fields in a large church. They convolve your plain audio with the recorded response (echo) of an impulse (clap, snip, tick) that was played inside a large hall or through a guitar amplifier for instance. This causes the resulting audio to sound like it was recorded right there in that hall or with that amplifier.

Now, remember that GPU’s are very good at certain AI algorithms. For instance: CNN or Convolutional Neural Nets. Yup, that is the same convolution mentioned above, just generally 2D for images. GPU’s love doing convolution so it should come as no surprise that nvidia has a cuda library just for that, CUFFT. The way Fourier transformations can be parallelized makes it work equally well for a large 1D transform, or a bunch of small and ones (like scanlines). If we get a buffer with 256 samples and want to add about 1.5 sec of echo, your convolution would have a size of 65536, equivalent to a 256x256 image. A convolution can be compared to a single layer of a model, and I’m expecting to do about 200 buffers x 3 convolutions. How many filters could we run in parallel, what latency can we expect? Initial tests on the Jetson Nano suggest 1 or 2 filters in parallel with a latency of 1.5 milliseconds. Looks promising!!