Close

Setting Up PWM Output

A project log for Look Who's Talking 0256

A BluePill Driver/Simulator/Emulator for the GI SP0256-AL2

ziggurat29ziggurat29 05/16/2020 at 21:500 Comments

Summary

For the first step of attempting to simulate the SP0256-AL2 (e.g., no physical chip needed), I wanted to test the PWM output, sending pre-recorded data.  For the second step, I wanted to full-on simulate the physical chip.

Deets

First, I did a quicky experiment to prove that the hardware was configured correctly.  This involved setting up PWM on TIM3, and using TIM4 for pacing the samples out via interrupt.  I setup TIM3 to have similar frequency as SP0256 (about 45 KHz), and for 8-bit resolution (which matched my audio files).  As such, I should be able to use existing output filter from SP0256 and just move the jumper between the two.  I did an initial test with my scope both to verify that PWM was working, and that the TIM4 was interrupting at the expected rate.

I modded my Python prototype to do a couple short text-to-speech once again, but this time emitting a C source file with data in both the PCM format and ADPCM format.  These were very short recordings of just the word "Hello", so I didn't really have to worry about flash space.  I tested the PCM first, since that was a no-brainer; e.g. in my HAL_TIM_PeriodElapsedCallback() I just add the clause

  /* USER CODE BEGIN Callback 1 */
	else if (htim->Instance == TIM4)
	{
		//quicky sample PCM test
		extern const uint8_t g_abyPCM[];	//len = 8027
		static int sl_nIdx = 0;
		__HAL_TIM_SET_COMPARE(&htim3, TIM_CHANNEL_3, g_abyPCM[sl_nIdx]);
		++sl_nIdx;
		if ( 8027 == sl_nIdx ) {	//loopy-doopy
			sl_nIdx = 0;
		}
	}
  /* USER CODE END Callback 1 */

so, it just outputs the current sample via PWM (__HAL_TIM_SET_COMPARE), increments the index to the current sample, and loops around.  This worked fine, so now it was time to try out the ADPCM approach.

First, I ported the ADPCM decoder routine and setup to do the same thing with the ADPCM data.  This was slightly more complicated.  The ADPCM is by nybble, rather than byte, and the algorithm as designed is created for signed 16-bit samples.  But it wasn't too bad:

  /* USER CODE BEGIN Callback 1 */
	else if (htim->Instance == TIM4)
	{
		//quicky sample ADPCM test
		struct ADPCMstate
		{
			int prevsample;
			int previndex;
		};
		int adpcm_decode_sample ( int code, struct ADPCMstate* state );

		//len = 4014, origlen = 8027
		extern const uint8_t g_abyADPCM[];
		static struct ADPCMstate state = { 0, 0 };
		static int sl_nIdxNyb = 0;
		int code = g_abyADPCM[sl_nIdxNyb>>1];
		if ( sl_nIdxNyb & 1 )
			code &= 0x0f;
		else	//upper nybble first
			code >>= 4;

		int samp = adpcm_decode_sample ( code, &state );
		uint8_t samp8 = (uint8_t) ( ( samp / 256 ) + 128 );
		__HAL_TIM_SET_COMPARE(&htim3, TIM_CHANNEL_3, samp8);

		++sl_nIdxNyb;
		if ( 8027 == sl_nIdxNyb ) {	//loopy-doopy
			sl_nIdxNyb = 0;
			state.prevsample = 0;
			state.previndex = 0;
		}
	}
  /* USER CODE END Callback 1 */

The sample data was half the size as PCM (about 4 KiB), and the binary size increase was about 5 KiB over the baseline, so I infer that means the ADPCM decoder incurs about 1 KiB code.  I can work with that.

Now it's time to get the hands really dirty, and emulate the SP0256-AL2.  This was actually a bit of work -- both in coding and debugging.

The first thing I did was create notion of 'mode' for the speech processor.  There are two modes: 'physical' and 'simulated'.  The existing task_sp0256 had a tight coupling with the hardware, but really there were just three points of contact:  resetting the synth, strobing in data, determining if 'Load Request' (nLRQ) is asserted, and being notified if LRQ transitions from a negated to asserted state.  I factored the code for the first three cases into a generic method that does one or the other based on current mode, and the last case was inbound and already decoupled from hardware specifics.  Now it was time to put flesh on the bones.

I declared another circular buffer to represent the fifo on the chip.  Strictly, I don't think the chip actually has a fifo -- I think it just has a one incoming byte that it can process, but it does offload that processing and allows accepting of another phoneme before the first is completed processing, so it's vaguely like a two-phoneme deep fifo.  I had already coded to assume that there was a fifo, so it was straightforward for me to implement one of an arbitrary depth of 4.  I don't know why I chose that, and really the overhead of the circular buffers greatly dwarfs the buffer size, so maybe I should make it bigger.  I didn't want to make it too big, because in a way it is redundant relative to the existing buffering, however I did want to make a fifo specific to the emulation because otherwise there would be coupling that would complicate the existing physical chip management.  I have RAM to spare, so there went 20 or so bytes -- I suspect a reasonable sacrifice for the sake of maintainability.

Next I needed to get the PWM outputting data.  As per usual, I prefer to do less work in an ISR than more, and moreover in FreeRTOS there are complications when using synchronization objects.  I decided to use a two buffer approach, where there is one buffer that is 'active' (i.e. the ISR is clocking samples out of it), and a 'next' buffer that is available to be filled at one's leisure.  These buffers are PCM sample -- not ADPCM -- so the the ISR really is just plucking values from the 'active' buffer and pushing them into the PWM, and then switching over to the 'next' buffer when it runs out.  This did still require a couple shared state variables between the 'task' (user-mode thread) and the ISR (supervisor mode code).  In this case, I chose to use 'critical sections' to make the access to the relevant data atomic.  These (OK, they're not a distinct thing in FreeRTOS) are much lighter weight but much more heavy handed than semaphores or whatnot.  Basically they are a 'disable all interrupts' operation.  Since I lock only around a couple state variables so as to read-modify-update (with a little logic in the modify), it seemed like a reasonable approach.

The design is still 'Hail Mary' with respect to being able to reliably produce pending buffers of samples such that the sample driving ISR is always fed so long as there are things to feed it.  I think this is appropriate because if you fundamentally cannot provide buffers fast enough, then the system is a fail overall, anyway.  However, you can amortize the cost of production on the part of the producer over time.  So I added a couple task notifications:  one for 'half-way done' and one for 'completed'.  I didn't wind up using the latter for anything -- by the time you get that you are already too late.  But the 'half-way done' notification is a useful heads-up that now would be a good time to produce another buffer, if possible, to have at the ready for the sample driver to automatically switch to when the current (active) buffer is completed.

An additional complication is that there are two buffers of samples, but samples are not bounded by phonemes.  So the goal is to fill the sample buffer as much as possible, possibly (probably as it turns out) not completing a phoneme, and then later picking up where one left off in a particular phoneme before moving on to the start of another phoneme.  So there's a little bit of state that keeps track of where the buffer filling routine left off in a previous call.  It also has the logic to try to pluck more phonemes from the fifo to carry on filling the buffer to the maximum extent possible.

Because this device is strapped for RAM, I have provided for just two samples buffers of 512 bytes each.  This means that at the present 11025 sps rate that they are just over 46 ms worth of samples, and since there is a 'half complete' notification, that is about 23 ms heads-up that you need to get a buffer ready.  I did a rudimentary profile of the '_prepareNextBufferLoad()' routine using the Cortex Debug unit's cpu cycle counter DWT->CYCCNT, and found that it took 112,263 cycles to fill a 512 KiB buffer.  Running at 72 MHz, that translates to about 1.6 ms, and means there is about 220 clocks per sample.  This is doing all the ADPCM stuff.  I did repeat the test with all the various optimization options (except -O0 since that was too big for flash) and got very similar results.  (Since this is a live system, there is noise in this test since background interrupts would still be happening.)  Either way, this seems like more than fast enough to keep up.  I can't remember the scheduling quantum on FreeRTOS, but I think it is something like 1 ms, so the task should definitely have an adequate opportunity to get notified of the need for a new bufferload with also enough overhead to actually produce it.  Mostly for my own comfort, I decided to raise the priority of the SP0256 task to 'high' so that it will take precedence over the monitor task, though I don't think this is really necessary.

I got all that stuff wired in (and debugged! lol) and tested.  Now I can do phoneme and text-to-speech directly from the BluePill -- no physical SP0256-AL2 chip required!  It uses the same LPF that the SP0256 uses, namely two 33 K resistors and two 0.022 uF caps.  That output then goes through a DC blocking cap of 1 uF into a LM386 amp.  I already had this on my breadboard, so it was just moving a wire from the SP0256 to the PB0 pin.

All built (in debug) results in:

arm-none-eabi-size "BluePillSP0256AL2.elf"
   text	   data	    bss	    dec	    hex	filename
 112092	   2012	  16424	 130528	  1fde0	BluePillSP0256AL2.elf

So there is still some 18 KB flash left for improvements.

Next

Improvements

Discussions