Jetson Nano Convolution Reverb

Turn your unused Jetsons into a real time audio effect modules with CUDA

Similar projects worth following


  • real-time convolution reverb
  • 150+ impulse responses
  • 2 to 4 parallel channels on Nano
  • per channel impulse response selection
  • per channel predelay selection
  • per channel separate dry and wet mixer
  • per channel separate dry and wet panning
  • interpolates between impulse responses with settable interpolation time.
  • configurable MIDI mapping


  • NVIDIA Jetson (Nano, Xavier NX, AGX Xavier) or Desktop / Laptop with NVIDIA GPU
  • External sound card (required for Jetson, optional for other setups). This project has been tested with a Focusrite Scarlett 2i2 and a Roland TR6S (which also presents itself as a sound card)
  • MIDI controller (optional, can be software controller). This project was tested with a Novation Launch Control, and as such has mappings for this controller in settings.txt
  • Some dark shades to look cool while you're convolving those beats...

Setting up JACK

Install jackd and probably qjackctl using the package manager. Try to find your sound device. Nope... Reboot a few times and cross your fingers. Get loads of xruns. Cry a little and smack it with a hammer. You're jack setup should now magically work.

The jetson kernel comes without alsa seq support, so start jack with -Xnone. The convolution code talks to raw MIDI devices directly, without going through Jack.

Building the source code

MIDI Controls

There are 8 MIDI controls per channel. I've mapped them to 8 knobs and change the MIDI channel to switch between channels. You can also map a single knob to multiple channels, to have them in sync, particularly useful if you have a stereo input: Map everything except the pan controls to the same knobs, and then pan the dry signals (and optionally the wet signals) to left and right.

  • SELECT - select one of the impulse responses to convolve with. The impulse responses will be interpolated between, depending on the interpolation speed (SPEED)
  • PREDELAY - add a delay at the start of the wet signal. This is not interpolated and will cause clicks if done during live. (for now)
  • DRY - dry signal level. (The unaltered original signal)
  • WET - wet signal level. (The convolved signal, delayed by predelay)
  • SPEED - interpolation speed between impulse responses when a new one is selected
  • DRY PAN - pan (left to right) of the dry signal in the output mix
  • WET PAN - pan (left to right) of the wet signal in the output mix
  • LEVEL - level of the output mix (dry + wet) but not the residual

  • No more alsa seq requirement

    E/S Pronk02/06/2022 at 03:52 0 comments

    I have been trying to get the code working on a production Jetson Xavier NX, but it has a custom board and rebuilding / signing the kernel is a minefield I would rather not cross on my weekend off. I rather bypass seq altogether.

    So I tried to get jack to detect raw midi devices... No luck.

    So now I just wrote a rawmidi interface myself, it is pretty basic and has some potential issues if you want to do advanced routing, but for now this should remove the requirement for the kernel rebuild.

    (Still need to test on actual jetson, working on laptop right now)

    UPDATE: Now tested and functioning on jetson Xavier NX.

    PS: When testing audio applications, always test with headphones and wear them around your neck. Then monitor the signal by putting one of the cups on your ear, DJ - style, just in case a bug is causing noise. I once had a bug that produced so much noise the headphones were vibrating, protect your hearing!

  • Jetson Nano Test

    E/S Pronk01/25/2022 at 20:53 0 comments

    Finished testing on the Nano, works great!

    When running with a single instance, the convolution time is 1.9ms for 2 input channels. When running 2 instances in a single process the convolution time doubles, and this seems to be the maximum I can get out of the Nano. As soon as I start 3 instances, the screen starts flickering and the audio sounds like a buzzer. Yikes...

    So, to round up, 4 input channels / 4 output channels, should be doable! I tested with a (jack) buffersize of 512, this is as low as the TR-6S will go.

  • Testing the limits

    E/S Pronk01/25/2022 at 20:15 0 comments

    | NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |   0  Quadro P2000 wi...  On   | 00000000:01:00.0 Off |                  N/A |
    | N/A   61C    P0    N/A /  N/A |   1882MiB /  4040MiB |     69%      Default |
    |                               |                      |                  N/A |

    There are still some optimizations I could do, but they would require a bit of effort. I tried to see how many instances I could run in parallel before I would start to get xruns. Interestingly, on my laptop at least, if I run all instances in the same process, the whole thing breaks down at 4 instances, getting a lot of noise. The GPU seems capped on 50%.

    If I however run 2 instances per process and start multiple processes, I can go up to 5 processes, 10 instances = 20 (!) parallel convolution reverbs. GPU usage then tops out at 70%. When starting 4 instances over 2 processes, the GPU usages is only 25% in comparison to 50% in a single process. (Each instance has its own stream(s), so not sure how this is happening.

    Could it be that there are multiple engines that can only be used when the actual cuda context is different, that might run in parallel so the usage is halved?

    So I'm quite pleased with the performance so far, and don't see a good reason to start optimizing. Let's see how it performs on the Jetson.

  • Concept

    E/S Pronk01/23/2022 at 20:17 0 comments

    I started working with NVIDIA Jetson a few years ago, when the Nano just came out. Learned CUDA and loved it. Soon I outgrew the Nano and after I got my first Xavier NX I never looked back. The Xavier NX was in turn shelved when I moved to the AGX Xavier.

    What a waste though, they are wonderful little boards in their own right, I just need to find a nice job for them to do. I’ve been wondering if CUDA could be of any help for real time audio processing. I figured that the buffer sizes would most likely be too small to offset the overhead of copying the data to and from the GPU, and even if they weren’t, you’d have so little data that your grid size would probably be 1. 

    On top of that, many filters simulate (to some extent) electronics or physics, where the new state depends on the previous state, thus calculating serially in the time domain. These algorithms are not easily parallelized. These are generally known as Infinite Impulse Response filters, or IIR. The only way to make proper use of the GPU for these filters is to have a whole stack of them, all independent of each other, and run them serially in parallel. I played with the idea of simulating all strings in a piano, or a symphonic orchestra with each violin processed in parallel.

    The counterpart of IIR is FIR, Finite Impulse Response filters. It usually entails transforming a signal from the time domain to the frequency domain using Fast Fourier Transformations, then do some magic math and transform the result back. When you multiply in the frequency domain you get convolution in the time domain and vice versa. This can be used in in audio effects to simulate anything from amplifiers to stereo fields in a large church. They convolve your plain audio with the recorded response (echo) of an impulse (clap, snip, tick) that was played inside a large hall or through a guitar amplifier for instance. This causes the resulting audio to sound like it was recorded right there in that hall or with that amplifier.

    Now, remember that GPU’s are very good at certain AI algorithms. For instance: CNN or Convolutional Neural Nets. Yup, that is the same convolution mentioned above, just generally 2D for images. GPU’s love doing convolution so it should come as no surprise that nvidia has a cuda library just for that, CUFFT. The way Fourier transformations can be parallelized makes it work equally well for a large 1D transform, or a bunch of small and ones (like scanlines). If we get a buffer with 256 samples and want to add about 1.5 sec of echo, your convolution would have a size of 65536, equivalent to a 256x256 image. A convolution can be compared to a single layer of a model, and I’m expecting to do about 200 buffers x 3 convolutions. How many filters could we run in parallel, what latency can we expect? Initial tests on the Jetson Nano suggest 1 or 2 filters in parallel with a latency of 1.5 milliseconds. Looks promising!!

View all 4 project logs

Enjoy this project?



Sl_Postmann wrote 01/30/2022 at 07:30 point

I'm very interested in the work done, but what's the point of using advanced math reverb when the Nvidia board has enough RAM and the reverb can be done arithmetically?

  Are you sure? yes | no

E/S Pronk wrote 02/01/2022 at 02:42 point

I’m not sure what you mean by arithmetically, a convolution can be done on the cpu as well, but it would be slower. And i’m using a big chunk of RAM actually, all the impulse responses are precomputed and stored in memory… 

But for me the main reason to do this is that i really like to work with CUDA, wanted a silent passively cooled board as not to have any noise in the small room I record in, as well as something portable. And second, I would ask myself, why not? It would be a waste NOT to use the GPU on an nvidia board, and FFT’s are a nice fit :) Leave the CPU purely for jack and the realtime audio handling.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates