Manipulating audio sample data

Last time we figured out how sound works in code, so now we can get to the fun stuff and do some manipulation on that sound!

Now that we know that a sample is the amplitude, it was easy to figure out that by dividing or multiplying the value, you could make it louder or less loud. Now let's see what our sampler needs to be able to do.

When you use a sample on an Akai MPC1000, you can specify it's tuning during recording and manipulate the pitch after recording. On the MPC1000 this can sound quite grungy. I did not immediately know how pitch works in code, so I let it rest for moment and went on to re-watch some episodes of Community. My brain kept doing some thinking work while I was watching. I don't know if this will make sense to you, or if I'm even explaining this in a way that makes it clear, but let's have a shot. I figured when talking about music theory, for a certain tone to be an octave higher, the wave needs to be double the speed. So, when you want it be an octave lower, it needs to be half the speed.

Could it be really that simple?

It was already late, but ran to the basement where my 'lab' is to test my theory. I came up with this piece of code.

Sound[int(sPos++ * pitch)];

In case you havent figured it out, this need to be in a loop where sPos increases the sample data index.

And you know what? To my big surprise this actually sound just like as it would on the MPC1000. There is a reason for that. I suspect the MPC1000 does not have a low-pass filter or any fancy algorithm correcting the aliasing that occurs when resampling the audio. So the artefacts that your hear are imperfections when modifying the length of the wave sample data.

Figure out how pitching works: CHECK.

If you have listened to my music, you will probably notice that I often like to use aliasing as an effect. Take this song for instance: 'https://open.spotify.com/album/5rpGsYSNGsEcHBqkMMOj1d?si=qn2PIBugQymtz5FKivAzqA', you can hear it very clearly in the vocal buildup in the middle of the song at 1:40.

With the newly gained knowledge about how and when aliasing occurs, we can create decimator effect (also often present in bit crusher style effects). This is what I came up with:

// decimator
float decimate = 1.0f;         
if (decimate != 1) {
    int idx = int(int(sPos++ * decimate) / decimate);
    nLevel = Sound[int(idx * pitch)];
}

Now I'm not going to explain this in detail, just have a look at it, and try to figure it out knowing that 'Sound' is the signed 16-bit sample data, and nLevel is a signed short (or in other words a 16-bit signed int) that is going to be sent to the audio device's output. If you use a value of 0.1f for decimate, you will get a really low-fi gritty sound. just the way I like it.

Talking about bit crushing, how to approach that? Simple, by removing bits like this:

// bit reduction
int bits = 9;
nLevel = nLevel >> bits;
nLevel = nLevel << bits;

But on the MPC1000 the effect is called 'Bit Grunger', and does not sound like the bit reduction technique like in the code above. Instead, I think that they drive the sound by compressing the quiet parts, so that it fits into the new bitdepth. Think of it like rescaling the wave so it fits in the newly specified bit depth. Our code above just throws the bottom part away, but the code below adds some 'drive' to the sound.

// drive
float depth = 0;
if (depth > 0) {
    nLevel = 32767 * (tanh((float(nLevel)/32767)*depth));
}

If you do the driving part before the bit reduction, it will sound more like the 'Bit Grunger' effect on the MPC1000. By the way at this point I would like to thank Heikki Rasilo for helping me out with the math part of the drive. I did not even know what a tangent function was, and couldn't have done it without him. You know; I suck at math. In high school I even had extra after school classes for algebra, but it just didn't work. I failed maths that year, and went on to the lower grade wood workshop education course. But that said, I do have to take full credit for the pitching, decimator and bit reduction, even though it's pretty insignificant I'm pretty proud of my small accomplishment there.

Last thing I want to show you in this log is pitch shifting. Pitching also changes the playback speed. If you pitch a vocal down by an octave, it will take twice as long to playback from start to end. If you want to change the pitch but remain the timing, you can do that using fancy algorithms. But I just was toying with my own quick and dirty technique. It introduces some clicking at some points, but it'll do for now, it's just proof of concept.

// shift
//   we could add some sort af crossfading to minimize clicking artifacts
if (pitch != 1 && shift) {
    int size = 128 * 32; // a very small sample buffer such as 4 will result in a decimation effect too
    int mult = sPos / size;
    int offset = size * mult;
    int idx = int(offset + ((sPos - offset) * pitch));
    nLevel = Sound[idx] * gain;
}

What the code does is basically chop the sound up into very small parts, and just repeat or throw away these little parts to fit the original length of the sound when pitching.

I also tried to see if we could do some EQ on the sound, and found this handy class: 'https://www.kvraudio.com/forum/viewtopic.php?t=521184'. I am by no means capable of making that myself, so I'm going to kindly borrow it :-). Thanks to 'malcolm20' for writing it. He got the formulas from some kind of audio cookbook called the RBJ Cookbook. In this cookbook, I saw that there where also formulas for more types of parametric filters, so maybe someone with better math skills could help me out with that. We'll see what we need in the future, but for now I can do some early bench marking to see what the Pi is capable of. I didn't really have an idea of the polyphony. Well I did some quick tests, and a single core allows me to have 48 voices (mono) playing simultaneously, while gaining, pitching, EQing and mixing samples without drops or glitches. Also, all the code above relies on doing floating point operations, and these are expensive. So I can imagine a lot of optimisation could be done by people who know more about this kind of stuff. These are still very early days, but it at least we know that it's not just 1 voices, but a lot more. We have 4 cores available to do stuff, so if we can make two cores 48 voices simultaneously, that would be awesome.

By the way, mixing samples is also something special. Because the correct way is actually to divide each sample by two, and the sum of it is correct. but that would mean that when one of the sounds is silent, the other would be only half as loud as it should be. That is not the way our perception works. There is in fact a trick to do that, and for 16-bit signed integers we could do this:

MixSamples(TYPE a, TYPE b) {
    //source: http://atastypixel.com/blog/how-to-mix-audio-samples-properly-on-ios/
    return  
        // If both samples are negative, mixed signal must have an amplitude between the lesser of A and B, and the minimum permissible negative amplitude
        a < 0 && b < 0 ?
            ((TYPE)a + (TYPE)b) - (((TYPE)a * (TYPE)b)/(-32768)) :

        // If both samples are positive, mixed signal must have an amplitude between the greater of A and B, and the maximum permissible positive amplitude
        ( a > 0 && b > 0 ?
             ((TYPE)a + (TYPE)b) - (((TYPE)a * (TYPE)b)/32768)

        // If samples are on opposite sides of the 0-crossing, mixed signal should reflect that samples cancel each other out somewhat
        :
            a + b);
}

I tried many different fomulas I found on the internet, but the one above was the one that works the best. Still, some clipping can occur, so if anybody wants to pitch in, feel free!

Currently I'm working on implementing a mixer, and a UI working on independent cores. I have received the Raspberry Pi 7inch touchscreen display, and am working with that. But I like where this is going already! I also ordered a couple of Audio Injector Octo sound cards that work over i2s. There is another user on hackaday.io that already wrote a lot of useful code using these that I'm going to have a look at. You can find the project here: 'https://hackaday.io/project/165696-rpi-bare-metal-vguitar-rig/log/165126-first-ui-and-bare-metal-multi-core'. Thanks Patrick, love your work, and it's going to help me a lot!

Finding out how sound works in code

Update

Discussions

Become a Hackaday.io Member