Close

Audio Synthesis Engine Implementation

A project log for Vintage Toy Synthesiser

A wooden toy piano converted into a standalone digital synthesiser.

liam-laceyLiam Lacey 08/31/2018 at 17:150 Comments

(Original post date: 14/02/16)

Since my log a couple of weeks back where I highlighted the design for my audio synthesis engine I've been hard at work attempting to implement it using the C++ audio synthesis library Maximilian. I'm now at a stage where I have a working and controllable synthesis engine, so I thought it would be a good time to talk about how I've done it. I've managed to implement most of my original design plus a few extra parameters, however I've still got a few small things to implement as well as some niggling bugs to iron out.

Before I go on, just thought I'd mention that the code used at the end of my project may change slightly from the code example shown here, so for up-to-date and full code see the Github repository for this project.

The Synthesis Engine Application

In my last log on software architecture I briefly introduced the vintageSoundEngine application which is the program running on the BeagleBone Black that generates the sound for my synth. This application has two main tasks - receiving note and control messages and forwarding them onto the correct 'voice', and mixing together the audio output from each voice and sending it to the main audio output. This is all done within the main piece of code for the application, vintageSoundEngine.cpp, however the code that handles the audio processing for each voice is implemented as a C++ class/object, vintageVoice, and multiple instances of this object are created depending on the polyphony value of the synth. While I'm on the subject of polyphony, at the moment I've just got a polyphony value of two due to high CPU usage of each voice, however I'm hoping to increase this before the end of the project.

Processing Note and Control Messages

As mentioned in my last blogpost it is the vintageBrain application that handles voice allocation, therefore vintageSoundEngine doesn't have to do anything complicated in order to forward MIDI note messages to the correct voice - it just uses the MIDI channel of the note message to determine the voice number. This is also the same for MIDI control/CC messages, however I also use MIDI channel 15 here to specify that a message needs to go to all voices. Once the program knows which voice the message needs to go to, it calls a specific function within the desired voice to forward the message. Here is a snippet of the current code that handles this:

//================================
                    //Process note-on messages
                    if (input_message_flag == MIDI_NOTEON)
                    {
                        //channel relates to voice number
                        uint8_t voice_num = input_message_buffer[0] & MIDI_CHANNEL_BITS;
                        
                        vintageVoice[voice_num]->processNoteMessage (1, input_message_buffer[1], input_message_buffer[2]);

                    } //if (input_message_flag == MIDI_NOTEON)
                    
                    //================================
                    //Process note-off messages
                    else if (input_message_flag == MIDI_NOTEOFF)
                    {
                        //channel relates to voice number
                        uint8_t voice_num = input_message_buffer[0] & MIDI_CHANNEL_BITS;
                        
                        vintageVoice[voice_num]->processNoteMessage (0, 0, 0);
                        
                    } //if (input_message_flag == MIDI_NOTEOFF)
                    
                    //================================
                    //Process CC/param messages
                    else if (input_message_flag == MIDI_CC)
                    {
                        //channel relates to voice number. Channel 15 means send to all voices
                        uint8_t voice_num = input_message_buffer[0] & MIDI_CHANNEL_BITS;
                        
                        for (uint8_t voice = 0; voice < NUM_OF_VOICES; voice++)
                        {
                            //if we want to send this message to voice number 'voice'
                            if (voice_num == 15 || voice_num == voice)
                            {
                                //TODO: check if this param/CC num is a sound param, and in range.
                                //At this point it always should be, but it may be best to check anyway.
                                
                                //set the paramaters voice value
                                vintageVoice[voice]->setPatchParamVoiceValue (input_message_buffer[1], input_message_buffer[2]);
                                
                            } //if (voice_num == 15 || voice_num == voice)
                            
                        } //for (uint8_t voice = 0; voice < NUM_OF_VOICES; voice++)
                        
                    } //if (input_message_flag == MIDI_CC)

Mixing Voices

Mixing the audio output of the voice objects is done in the audio callback function that is called for each audio sample by the audio streaming thread of the application, handled by the RtAudio API. This is done in the same way as that of the Maximilian examples, however their code for generating and controlling audio was not split into separate objects. Here is the current code that handles this:

void play (double *output)
    {
        double voice_out[NUM_OF_VOICES];
        double mix = 0;
        
        //process each voice
        for (uint8_t voice = 0; voice < NUM_OF_VOICES; voice++)
        {
            vintageVoice[voice]->processAudio (&voice_out[voice]);
        }
        
        //mix all voices together (for some reason this won't work if done in the above for loop...)
        for (uint8_t voice = 0; voice < NUM_OF_VOICES; voice++)
        {
            mix += voice_out[voice];
        }
        
        //set output
        for (uint8_t i = 0; i < maxiSettings::channels; i++)
        {
            output[i] = mix;
        }
    }

The code is fairly simply, and just does three things:

  1. Calls the audio processing function of each voice, passing in the variable that the voices audio sample will be store in
  2. Mixes the audio samples of each voice into a single sample
  3. Puts the sample into all channels of the audio output buffer

Voice Design Implementation

Now I'm going to talk about the more interesting code - the code that generates and controls the synthesised audio within each voice. As stated above this is all within the vintageVoice class, and relies mostly on the Maximilian library for the implementation of the essential components of the synthesis engine. When talking about all the features here, remember that this is for each voice.

To implement the synthesis engine I needed the following Maximilian objects:

As previously mentioned vintageSoundEngine is a multithreaded application. The main thread handles the receiving and processing of MIDI messages, whereas the second thread handles all the audio streaming and processing.

Processing Control Messages

As stated above, MIDI CC messages are sent to the voices to control the parameters of the sound. When the CC messages get to a voice they are converted into a value that the voice parameters understand (e.g. from the typical MIDI CC value of 0-127 to the typical filter cutoff value of 20-20000Hz), and then stored in an array of parameter data that is used throughout the rest of the code, most importantly within the audio processing callback function. For certain CC messages other particular things need to be done, e.g. if it is an oscillator coarse tune control message the pitch of the oscillator needs to be updated. To make developing the audio processing code easier, macros are used instead of parameter numbers and the array of parameter values are stored as part of a struct which contain variables for storing other data about each parameters such as the range of the value. See the globals.h file for more info.

This task is handled in the main thread. Here is the current code that processes MIDI CC messages:

//==========================================================
//==========================================================
//==========================================================
//Sets a parameters voice value based on the parameters current user value

void VintageVoice::setPatchParamVoiceValue (uint8_t param_num, uint8_t param_user_val)
{
    patchParameterData[param_num].user_val = param_user_val;
    //FIXME: this could probably be done within vintageSoundEngine.cpp instead of within the voice object,
    //as each voice will probably be given the same value most of the time, so it would save CPU
    //to only have to do this once instead of for each voice.
    patchParameterData[param_num].voice_val = scaleValue (patchParameterData[param_num].user_val,
                                                          patchParameterData[param_num].user_min_val,
                                                          patchParameterData[param_num].user_max_val,
                                                          patchParameterData[param_num].voice_min_val,
                                                          patchParameterData[param_num].voice_max_val);
    
    //==========================================================
    //Set certain things based on the recieved param num
    
    if (param_num == PARAM_AEG_ATTACK)
    {
        envAmp.setAttack (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_AEG_DECAY)
    {
        envAmp.setDecay (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_AEG_SUSTAIN)
    {
        envAmp.setSustain (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_AEG_RELEASE)
    {
        envAmp.setRelease (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_FEG_ATTACK)
    {
        envFilter.setAttack (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_FEG_DECAY)
    {
        envFilter.setDecay (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_FEG_SUSTAIN)
    {
        envFilter.setSustain (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_FEG_RELEASE)
    {
        envFilter.setRelease (patchParameterData[param_num].voice_val);
    }
    
    else if (param_num == PARAM_OSC_SINE_NOTE)
    {
        convert mtof;
        oscSinePitch = mtof.mtof (rootNoteNum + (patchParameterData[param_num].voice_val - 64));
    }
    
    else if (param_num == PARAM_OSC_TRI_NOTE)
    {
        convert mtof;
        oscTriPitch = mtof.mtof (rootNoteNum + (patchParameterData[param_num].voice_val - 64));
    }
    
    else if (param_num == PARAM_OSC_SAW_NOTE)
    {
        convert mtof;
        oscSawPitch = mtof.mtof (rootNoteNum + (patchParameterData[param_num].voice_val - 64));
    }
    
    else if (param_num == PARAM_OSC_PULSE_NOTE)
    {
        convert mtof;
        oscPulsePitch = mtof.mtof (rootNoteNum + (patchParameterData[param_num].voice_val - 64));
    }
    
    else if (param_num == PARAM_OSC_SQUARE_NOTE)
    {
        convert mtof;
        oscSquarePitch = mtof.mtof (rootNoteNum + (patchParameterData[param_num].voice_val - 64));
    }
    
    else if (param_num == PARAM_OSC_PHASE_SPREAD)
    {
        //FIXME: I need to properly understand what the phase value represents in order to implement a definitive algorithm here.
        //But basically what it does is, the higher the param value, the more spread the phases are of each oscillator from one another.
        //Sine will always stay at 0, tri will change of a small range, saw over a slightly bigger range, and so on.
        
        oscSine.phaseReset(0.0);
        oscTri.phaseReset (patchParameterData[param_num].voice_val * 0.002);
        oscSaw.phaseReset (patchParameterData[param_num].voice_val * 0.004);
        oscPulse.phaseReset (patchParameterData[param_num].voice_val * 0.006);
        oscSquare.phaseReset (patchParameterData[param_num].voice_val * 0.008);
    }
    
    else if (param_num == PARAM_MOD_VEL_AMP)
    {
        //vel->amp env modulation
        velAmpModVal = getModulatedParamValue (param_num, PARAM_AEG_AMOUNT, voiceVelocityValue);
    }
    
    else if (param_num == PARAM_MOD_VEL_FREQ)
    {
        //vel->amp env modulation
        velAmpModVal = getModulatedParamValue (param_num, PARAM_FILTER_FREQ, voiceVelocityValue);
    }
    
    else if (param_num == PARAM_MOD_VEL_RESO)
    {
        //vel->amp env modulation
        velAmpModVal = getModulatedParamValue (param_num, PARAM_FILTER_RESO, voiceVelocityValue);
    }
}

Processing Note Messages

Processing MIDI notes messages within the voices are a little bit more complicated than processing MIDI CC messages.

The following main things happen for each note message:

  1. If a note-on message:
    1. The pitches of the five oscillators are set based on the received MIDI note number as well as the oscillators coarse tune values
    2. The MIDI note velocity value (0-127) is converted into a voice amplitude value (0-1)
    3. Velocity modulation depth parameter values are used to generate the realtime parameter modulation values that need to be added to the parameter patch values
    4. The LFO oscillator phase is reset to 0
  2. The amplitude envelope trigger value is set. If a note-on message, this opens the envelope and causes sound to start playing in the audio thread, however if a note-off message it triggers the envelope to go to the release phase, eventually silencing the audio.
  3. The filter envelope trigger value is set.

Again this task is handled in the main thread. Here is the function that handles this:

//==========================================================
//==========================================================
//==========================================================
//Function that does everything that needs to be done when a new
//note-on or note-off message is sent to the voice.

void VintageVoice::processNoteMessage (bool note_status, uint8_t note_num, uint8_t note_vel)
{
    //==========================================================
    //if a note-on
    if (note_status == true)
    {
        //============================
        //store the root note num
        rootNoteNum = note_num;
        
        //============================
        //set the oscillator pitches
        convert mtof;
        oscSinePitch = mtof.mtof (rootNoteNum + (patchParameterData[PARAM_OSC_SINE_NOTE].voice_val - 64));
        oscTriPitch = mtof.mtof (rootNoteNum + (patchParameterData[PARAM_OSC_TRI_NOTE].voice_val - 64));
        oscSawPitch = mtof.mtof (rootNoteNum + (patchParameterData[PARAM_OSC_SAW_NOTE].voice_val - 64));
        oscPulsePitch = mtof.mtof (rootNoteNum + (patchParameterData[PARAM_OSC_PULSE_NOTE].voice_val - 64));
        oscSquarePitch = mtof.mtof (rootNoteNum + (patchParameterData[PARAM_OSC_SQUARE_NOTE].voice_val - 64));
        
        //TODO: vintage amount paramater - randomly detune each oscillator and/or the overall voice tuning
        //on each note press, with the vintage amount value determining the amount of detuning.
        
        //============================
        //set the note velocity
        voiceVelocityValue = scaleValue (note_vel, 0, 127, 0., 1.);
        
        //============================
        //work out velocity modulation values
        
        //vel->amp env modulation
        velAmpModVal = getModulatedParamValue (PARAM_MOD_VEL_AMP, PARAM_AEG_AMOUNT, voiceVelocityValue);
        
        //vel->cutoff modulation
        velFreqModVal = getModulatedParamValue (PARAM_MOD_VEL_FREQ, PARAM_FILTER_FREQ, voiceVelocityValue);
        
        //vel->resonance modulation
        velResoModVal = getModulatedParamValue (PARAM_MOD_VEL_RESO, PARAM_FILTER_RESO, voiceVelocityValue);
        
        //============================
        //reset LFO osc phase
        lfo.phaseReset(0.0);
        
    } //if (note_status == true)
    
    //==========================================================
    //if a note-off
    else if (note_status == false)
    {
        //reset aftertouch value
        aftertouchValue = 0;
    }
    
    //==========================================================
    //set trigger value of envelopes
    envAmp.trigger = note_status;
    envFilter.trigger = note_status;
}

Generating and Processing Audio

As previously mentioned all audio processing is handled within an audio callback function which is repetitively called by the audio processing thread for each sample in the audio stream. Here I'm going to outline each section of the audio callback function within the voice class, which relies heavily on the Maximilian library.

LFO

The LFO is generated and set in the following way:

  1. An output sample of an oscillator object is generated using the following parameters:
    1. LFO shape controls which maxiOsc shape is used
    2. LFO rate controls the frequency/pitch of the maxiOsc object
  2. The oscillator output (-1 to +1) is converted into the range needed for an LFO (0 - 1).
  3. The LFO output sample is multiplied by the LFO depth parameter value
//==========================================================
    //process LFO...
    
    //set shape and rate
        //FIXME: for LFO rate it would be better if we used an LFO rate table (an array of 128 different rates).
    if (patchParameterData[PARAM_LFO_SHAPE].voice_val == 0)
        lfoOut = lfo.sinewave (patchParameterData[PARAM_LFO_RATE].voice_val);
    else if (patchParameterData[PARAM_LFO_SHAPE].voice_val == 1)
        lfoOut = lfo.triangle (patchParameterData[PARAM_LFO_RATE].voice_val);
    else if (patchParameterData[PARAM_LFO_SHAPE].voice_val == 2)
        lfoOut = lfo.saw (patchParameterData[PARAM_LFO_RATE].voice_val);
    else if (patchParameterData[PARAM_LFO_SHAPE].voice_val == 3)
        lfoOut = lfo.square (patchParameterData[PARAM_LFO_RATE].voice_val);
    
    //convert the osc wave into an lfo wave (multiply and offset)
    lfoOut = ((lfoOut * 0.5) + 0.5);
    
    //set depth
    lfoOut = lfoOut * patchParameterData[PARAM_LFO_DEPTH].voice_val;

Amplitude Envelope

The amplitude envelope is generated and set in the following way:

  1. The LFO->amplitude modulation depth parameter value is used to generate the realtime parameter modulation value that needs to be added to the amplitude envelope amount parameter value
  2. The envelope amount value is worked out by adding the realtime amplitude modulation values (generated by both velocity and LFO modulation) to the amplitude envelope amount parameter value
  3. An output sample of the envelope is generated using a maxiEnv object, passing in the envelope amount value to control the depth, and the envelope trigger value that was set by the last received MIDI note message to set the current phase of the envelope.
//==========================================================
    //Amp envelope stuff...
    
    //process LFO->amp env modulation
    double amp_lfo_mod_val = getModulatedParamValue (PARAM_MOD_LFO_AMP, PARAM_AEG_AMOUNT, lfoOut);
    
    //Add the amp modulation values to the patch value, making sure the produced value is in range
    double amp_val = patchParameterData[PARAM_AEG_AMOUNT].voice_val + amp_lfo_mod_val + velAmpModVal;
    amp_val = boundValue (amp_val, patchParameterData[PARAM_AEG_AMOUNT].voice_min_val, patchParameterData[PARAM_AEG_AMOUNT].voice_max_val);
    
    //generate the amp evelope output using amp_val as the envelope amount
    envAmpOut = envAmp.adsr (amp_val, envAmp.trigger);

Filter Envelope

This is generated in essentially the same way as the amplitude envelope, however it uses a different maxiEnv object, and a static value of 1 as the envelope depth.

    //==========================================================
    //process filter envelope
    envFilterOut = envFilter.adsr (1.0, envFilter.trigger);

Oscillators

The oscillators are generated and set in the following way:

  1. An output sample of each of the five oscillator objects is generated using the following parameters:
    1. Each oscillator uses a different shape of the maxiOsc class
    2. The frequency/pitch of each oscillator are set to the pitch values generated with the last received MIDI note-on message
    3. The oscillator mix/level parameters multiply the output sample
    4. For the pulse oscillator, the pulse amount is set using the pulse amount parameter
  2. The five samples are mixed into a single sample, and divided by the number of samples to prevent gain clipping.

This is the point in the audio processing callback function that sound is initially generated.

//==========================================================
    //process oscillators
    oscSineOut = oscSine.sinewave (oscSinePitch) * patchParameterData[PARAM_OSC_SINE_LEVEL].voice_val;
    oscTriOut = (oscTri.triangle (oscTriPitch) * patchParameterData[PARAM_OSC_TRI_LEVEL].voice_val);
    oscSawOut = (oscSaw.saw (oscSawPitch) * patchParameterData[PARAM_OSC_SAW_LEVEL].voice_val);
    oscPulseOut = (oscPulse.pulse (oscPulsePitch, patchParameterData[PARAM_OSC_PULSE_AMOUNT].voice_val) * patchParameterData[PARAM_OSC_PULSE_LEVEL].voice_val);
    oscSquareOut = (oscSquare.square (oscSquarePitch) * patchParameterData[PARAM_OSC_SQUARE_LEVEL].voice_val);
    
    //mix oscillators together
    oscMixOut = (oscSineOut + oscTriOut + oscSawOut + oscPulseOut + oscSquareOut) / 5.;

Filter

The filter is generated, set, and used in the following way:

  1. The LFO->cutoff modulation depth parameter value is used to generate the realtime parameter modulation value that needs to be added to the cutoff parameter value
  2. The filter cutoff value is worked out by adding the realtime cutoff modulation values (generated by both velocity and LFO modulation) to the filter cutoff parameter value
  3. The maxiSVF object cutoff value is set using the cutoff value multiplied by the current output sample of the filter envelope
  4. The LFO->resonance modulation depth parameter value is used to generate the realtime parameter modulation value that needs to be added to the resonance parameter value
  5. The filter resonance value is worked out by adding the realtime resonance modulation values (generated by both velocity and LFO modulation) to the filter resonance parameter value
  6. The maxiSVF object resonance value is set using the resonance value
  7. An output sample of the filter applied to the mixed oscillator sample is generated by calling play() on the maxiSVF object using the following parameters:
    1. The passed in audio sample is the output of the oscillators
    2. The filter LP, HP, BP, and notch mix parameters are used to set the mix of the filter
//==========================================================
    //process filter (pass in oscOut, return filterOut)
    
    //================================
    //process LFO->cutoff modulation
    double cutoff_lfo_mod_val = getModulatedParamValue (PARAM_MOD_LFO_FREQ, PARAM_FILTER_FREQ, lfoOut);
    
    //Add the cutoff modulation values to the patch value, making sure the produced value is in range
    double cutoff_val = patchParameterData[PARAM_FILTER_FREQ].voice_val + cutoff_lfo_mod_val + velFreqModVal;
    cutoff_val = boundValue (cutoff_val, patchParameterData[PARAM_FILTER_FREQ].voice_min_val, patchParameterData[PARAM_FILTER_FREQ].voice_max_val);
    
    //set cutoff value, multipled by filter envelope
    filterSvf.setCutoff (cutoff_val * envFilterOut);
    
    //================================
    //process LFO->reso modulation
    double reso_lfo_mod_val = getModulatedParamValue (PARAM_MOD_LFO_RESO, PARAM_FILTER_RESO, lfoOut);
    
    //Add the reso modulation values to the patch value, making sure the produced value is in range
    double reso_val = patchParameterData[PARAM_FILTER_RESO].voice_val + reso_lfo_mod_val + velResoModVal;
    reso_val = boundValue (reso_val, patchParameterData[PARAM_FILTER_RESO].voice_min_val, patchParameterData[PARAM_FILTER_RESO].voice_max_val);
    
    //set resonance value
    filterSvf.setResonance (reso_val);
    
    //================================
    //Apply the filter
    
    filterOut = filterSvf.play (oscMixOut,
                                patchParameterData[PARAM_FILTER_LP_MIX].voice_val,
                                patchParameterData[PARAM_FILTER_BP_MIX].voice_val,
                                patchParameterData[PARAM_FILTER_HP_MIX].voice_val,
                                patchParameterData[PARAM_FILTER_NOTCH_MIX].voice_val);

Distortion

The current implementation of applying distortion to the voices is as follows:

  1. An output sample of distorted audio is generated by passing the filtered audio sample into the maxiDistortion::atanDist function with a static shape value of 200.
  2. The distorted audio sample is mixed with the undistorted filtered audio sample, using the distortion amount parameter value to set the gain/mix of each audio sample
    //==========================================================
    //process distortion...
    //FIXME: should PARAM_FX_DISTORTION_AMOUNT also change the shape of the distortion?
    distortionOut = distortion.atanDist (filterOut, 200.0);
    
    //process distortion mix
    //FIXME: is this (mixing dry and wet) the best way to apply distortion? Or should I just always be running the main output through the distortion function?
    //FIXME: probably need to reduce the disortionOut value so bringing in disortion doesn't increase the overall volume too much
    effectsMixOut = (distortionOut * patchParameterData[PARAM_FX_DISTORTION_AMOUNT].voice_val) + (filterOut * (1.0 - patchParameterData[PARAM_FX_DISTORTION_AMOUNT].voice_val));

However, as per the comments in the above code, I may change this implementation so that I don't mix a 'dry' audio sample with the distorted sample, and instead just use the distortion amount parameter value to control the shape of the distortion.

Output

Lastly the generated audio sample needs to be applied to audio sample that goes to the main audio output. This is done by setting the output sample to be the generated audio sample multiplied by the current output sample of the amplitude envelope.

    //==========================================================
    //apply amp envelope, making all channels the same (pass in effectsMixOut, return output)
    for (uint8_t i = 0; i < maxiSettings::channels; i++)
    {
        output[i] = effectsMixOut * envAmpOut;
    }

Changes from the Initial Synthesis Engine Design

As can be seen from above I've managed to implement the majority of my initial design, however there have been a few changes:

  1. I've added coarse tune parameters for each of the oscillators
  2. Due to the last point, I've renamed the sub oscillator to just be called the square oscillator
  3. I've added a  'phase spread' parameter to the oscillators, allowing the phase of the oscillators to be different from each other at varying amounts
  4. I've added velocity->cutoff and velocity->resonance modulation
  5. I've removed all aftertouch modulation (for now), as currently the audio glitches fairly bad when attempting to process aftertouch messages. However I'm hoping to put this back in eventually if I have time to figure out what the issue is.

What's Next

There are a couple of parameters within my initial synth engine design that I haven't mentioned here, simply because I haven't yet implemented them. This includes:

Also there are a couple of bugs I need to address, the main one being random frequent audio glitches. I'm not sure whether this is related to CPU usage, audio buffer size, thread priority, or something else, but it's the main thing that's holding me back putting out some audio examples of my synthesis engine.

Discussions