Look Who's Talking 0256

A BluePill Driver/Simulator/Emulator for the GI SP0256-AL2

Similar projects worth following
This is a mini project for testing out some SP0256-AL2 simulation. The work is for another project, but this seemed like something that might be useful for other folks, so I decided to make a separate project of it.

I have a separate project which is a TRS-80 Model I emulator.  Back in the day, there was a voice synthesizer for it.  I have been meaning to add that emulation, and was recently spurred into action by another hackadayer [Michael Wessel] who had produced a hardware version for those with physical machines.  While in my case I am making an emulator, his input  was still useful to me because I didn't have the phoneme list for the original synth.  Anyway, I needed to do a bunch of exploratory work before producing the final code, and part of that involved hooking up a physical SP0256-AL2 to a BluePill so I could record phonemes.  That activity seems potentially useful to others irrespective of my TRS-80 code, so I decided to make it a separate project.

For the first iteration I am simply hooking up a SP0256-AL2 to a BluePill as a USB-serial-to-SP0256 bridge.  This was to setup for recording phonemes, since I am planning to play pre-recorded signals rather than recreate the LPC filter and whatnot.  As such I consider this a 'simulator' rather than an 'emulator'.

It did then occur to me that if that works out, I might be able to put the simulator in toto into the Blue Pill, and not even require the physical '0256.  We'll see how that goes -- the BluePill is strapped for space, and even at low bit rates the sound files will eat it up quickly.  I will experiment with some compression -- probably ADPCM.  We'll see how much the quality suffers.

I also have some Text-to-Speech code from years back, so I will also try my hand at porting that over, too.


Schematic of the project (but I'm sure you could have managed already)

Adobe Portable Document Format - 43.17 kB - 05/20/2020 at 17:11



pre-built firmware; UART version on PA9 (TX) PA10 (RX). By default, expects physical chip to be present, so remember to 'set spmode simulated' if you want that.

hex - 257.47 kB - 05/20/2020 at 15:30



pre-built firmware; USB CDC version. By default, expects physical chip to be present, so remember to 'set spmode simulated' if you want that.

hex - 255.83 kB - 05/20/2020 at 15:29



how to connect and use

plain - 5.31 kB - 05/19/2020 at 22:02



demo ADPCM of doors 'hello'

Waveform Audio File Format (WAV) - 1.38 MB - 05/15/2020 at 23:18


View all 7 files

  • 1 × BluePill the (in?)famous $3 minimal development board
  • 1 × STLink-v2 One of those cheap Chinese clones works just fine. Be sure it is actually an STLink, and not an 'STC Autoprogrammer'. Several folks have gotten burned on this.
  • 1 × sundry passives there's various resistors and capacitors in the audio path, and you'll probably want some bypass caps, and I am using an external transistor for reset of the SP0256 (OK, that's not passive, but whatever)
  • 1 × LM386 this is an audio-out project, so you'll need a minimal amp to hear the zounds
  • 1 × SP0256-AL2 if you're going physical, you'll need the venerated and mostly unobtainium GI synthesizer chip. Fortunately, I have a few on-hand.

View all 6 components

  • Downsampling the Audio to 8 Ksps

    ziggurat2905/18/2020 at 17:24 0 comments


    It occurred to me that the original 11025 sps files probably represent a higher-than-needed sampling rate.  I downsampled to 8 ksps and gain back about 15 K flash, and some relaxed execution requirements.


    It had bugged me for a bit that the sampling rate for the existing audio is 11025 ksps.  This is quite appropriate for the SP0256 in general, which is advertised as having a 5 KHz output bandwidth, however on the -AL2 I find the signal to use less than that.

    I discovered an audio processing application I use has a 'batch convert' mode, so I batch converted all the samples to an 8 KHz sampling rate (appropriate anti-aliasing filtering is done prior).  The app I am using is 'Cool Edit', which to wit is long since discontinued (and it has a kind of interesting back story), but I would suspect that the free alternatives such as Audacity would similarly be useful.

    This reduced the audio size by about 25%, and it seems to sound more-or-less the same.  I modded the Python PoC to re-compute the ADPCM version and added that to the project.  The sample rate timer, TIM4, needed to be adjusted.  Everything else worked as-is.

    This allowed me to gain back 15 K flash, and also reduced the responsivity requirements for preparing sample buffers (since they are now last 64 ms, up from 46).  I wasn't in a crisis in either of those two areas, but it's still nice to have a little more wiggle room.


    Who knows?  Is it done, now?

  • Using DMA for the PWM

    ziggurat2905/17/2020 at 17:50 0 comments


    As an improvement, I switch to using DMA for driving the sample data output instead of code in an ISR.


    The system worked and seemed responsive on the monitor, but it tweaked my spidey sense to have so much stuff going on in the ISR.  The ISR gets triggered every sample -- 11025 times per second -- and each time context has to be saved/restored, and code run.  The STM32F103 has a lovely DMA unit, so I might as well make use of it!

    Initial Tests

    Getting DMA running is a multi-step process of fiddling with stuff in CubeMX (in this case, setting the DMA tab in the TIM4 resource, and configuring it as memory-to-peripheral, and the input size as 'byte' and the output size as 'half word', and setting the memory address to increment, but not the peripheral address.  This is all obscure because the documentation for CubeMX leaves a lot to be desired, so a bunch of experimentation had to be done to figure out the appropriate recipe.

    After that, then the right code incantations need to be spoken.  Much as with CubeMX, the HAL library's documentation leaves a lot to be desired -- it has a 'doxygen' taste to it, so I find that it's more or less fruitless to go to the documentation, and rather I wind up reverse-engineering the source to see what the methods actually do.  Really, I think that just programming at the register level and commenting the code would be clearer in almost all cases /except/ for USB, which is a beast.  It definitely would cut down on the flash and ram usage.  Anyway, at length I found the relevant incantations, which in general are 'turn on the timer 4 clock', then 'set up the DMA controller to transfer', then 'set timer 4 to trigger the DMA' (this starts output immediately).  There are several interrupts from the DMA that you need to handle:
    • transfer complete -- you'll want to send another buffer
    • half-transfer complete -- this is your heads-up to start preparing another buffer so you'll be ready to go when 'transfer complete' comes in
    • transfer aborted -- if you care
    • error -- if you care

    The DMA has a nifty 'circular' mode that seems like it should be useful, but I can't figure out how I can actually make use of it in this project.

    I did a trivial test of using DMA with the PCM test data:  start a transfer, then start another in the 'transfer complete' event.  I used the PCM 'hello' sample set from before, and it worked as expected.

    On to Implementation...

    My existing interrupt-driven design was based on receiving a 'half-complete' event that alerted the producing task to prepare additional buffer loads.  This worked out neatly because the DMA similarly generates a 'half-complete' event and a 'fully complete event'.  Most of the code from the timer ISR was reused/restructured into the 'fully complete' event, and the 'half-complete' simply sent a task notification to the SP0256 task.  Actually, a bunch of code was deleted, since we didn't have to clock out individual samples -- we just needed to start a new transfer when the present one completed.  The only addition was to 'kick start' the first transfer.  This wasn't needed in the previous design because it was effectively always polling in the ISR as to whether there was a buffer to transfer at all.  In this case, once it's done it's done, so you have to explicitly start the first transfer to kick off the process.  So long as you keep producing buffers in a timely manner, the process is continuous.

    Most of the time was spent figuring out how to use the DMA peripheral appropriately (the initial test), and the porting of the actual design fortunately went rather quickly since I had done the interrupt-driven design in a sort of dma way.  The build is bigger; thanks HAL:

    arm-none-eabi-size "BluePillSP0256AL2.elf"
       text    data     bss     dec     hex filename
     113328    2024   16496 131848  ...
    Read more »

  • Setting Up PWM Output

    ziggurat2905/16/2020 at 21:50 0 comments


    For the first step of attempting to simulate the SP0256-AL2 (e.g., no physical chip needed), I wanted to test the PWM output, sending pre-recorded data.  For the second step, I wanted to full-on simulate the physical chip.


    First, I did a quicky experiment to prove that the hardware was configured correctly.  This involved setting up PWM on TIM3, and using TIM4 for pacing the samples out via interrupt.  I setup TIM3 to have similar frequency as SP0256 (about 45 KHz), and for 8-bit resolution (which matched my audio files).  As such, I should be able to use existing output filter from SP0256 and just move the jumper between the two.  I did an initial test with my scope both to verify that PWM was working, and that the TIM4 was interrupting at the expected rate.

    I modded my Python prototype to do a couple short text-to-speech once again, but this time emitting a C source file with data in both the PCM format and ADPCM format.  These were very short recordings of just the word "Hello", so I didn't really have to worry about flash space.  I tested the PCM first, since that was a no-brainer; e.g. in my HAL_TIM_PeriodElapsedCallback() I just add the clause

      /* USER CODE BEGIN Callback 1 */
    	else if (htim->Instance == TIM4)
    		//quicky sample PCM test
    		extern const uint8_t g_abyPCM[];	//len = 8027
    		static int sl_nIdx = 0;
    		__HAL_TIM_SET_COMPARE(&htim3, TIM_CHANNEL_3, g_abyPCM[sl_nIdx]);
    		if ( 8027 == sl_nIdx ) {	//loopy-doopy
    			sl_nIdx = 0;
      /* USER CODE END Callback 1 */

    so, it just outputs the current sample via PWM (__HAL_TIM_SET_COMPARE), increments the index to the current sample, and loops around.  This worked fine, so now it was time to try out the ADPCM approach.

    First, I ported the ADPCM decoder routine and setup to do the same thing with the ADPCM data.  This was slightly more complicated.  The ADPCM is by nybble, rather than byte, and the algorithm as designed is created for signed 16-bit samples.  But it wasn't too bad:

      /* USER CODE BEGIN Callback 1 */
    	else if (htim->Instance == TIM4)
    		//quicky sample ADPCM test
    		struct ADPCMstate
    			int prevsample;
    			int previndex;
    		int adpcm_decode_sample ( int code, struct ADPCMstate* state );
    		//len = 4014, origlen = 8027
    		extern const uint8_t g_abyADPCM[];
    		static struct ADPCMstate state = { 0, 0 };
    		static int sl_nIdxNyb = 0;
    		int code = g_abyADPCM[sl_nIdxNyb>>1];
    		if ( sl_nIdxNyb & 1 )
    			code &= 0x0f;
    		else	//upper nybble first
    			code >>= 4;
    		int samp = adpcm_decode_sample ( code, &state );
    		uint8_t samp8 = (uint8_t) ( ( samp / 256 ) + 128 );
    		__HAL_TIM_SET_COMPARE(&htim3, TIM_CHANNEL_3, samp8);
    		if ( 8027 == sl_nIdxNyb ) {	//loopy-doopy
    			sl_nIdxNyb = 0;
    			state.prevsample = 0;
    			state.previndex = 0;
      /* USER CODE END Callback 1 */

    The sample data was half the size as PCM (about 4 KiB), and the binary size increase was about 5 KiB over the baseline, so I infer that means the ADPCM decoder incurs about 1 KiB code.  I can work with that.

    Now it's time to get the hands really dirty, and emulate the SP0256-AL2.  This was actually a bit of work -- both in coding and debugging.

    The first thing I did was create notion of 'mode' for the speech processor.  There are two modes: 'physical' and 'simulated'.  The existing task_sp0256 had a tight coupling with the hardware, but really there were just three points of contact:  resetting the synth, strobing in data, determining if 'Load Request' (nLRQ) is asserted, and being notified if LRQ transitions from a negated to asserted state.  I factored the code for the first three cases into a generic method that does one or the other based on current mode, and the last case was inbound and already decoupled from hardware specifics.  Now it was time to put flesh on the bones.

    I declared another circular buffer to represent the fifo on the chip.  Strictly,...

    Read more »

  • Compressing the Phoneme Data with ADPCM

    ziggurat2905/15/2020 at 20:04 0 comments


    The phoneme data in its raw form is about 116 KB.  This doesn't leave enough room for code on even an 128 KB BluePill.  ADPCM should reduce that size by 1/2, but will it sound good enough?  Back to Python land to prototype and find out.


    As a stretch goal, I wanted to see if I could have the BluePill could function as the SP0256-AL2 itself, without requiring the physical part.  Way back I had proven that concatenation of pre-recorded phonemes was a viable approximation to the actual digital filter originally used, but the recordings are too big for flash.  Even with a 128 KiB BluePill, the inclusion of the uncompressed data would leave only 12 KiB flash for everything else, which would not be enough for what we have now.  So some sort of compression would be useful.  But before that, you will need to have a 128 KiB BluePill.  Do you have one?  Almost certainly you do.  Some background...

    Getting 128 KiB Flash on the STM32F103C8

    The 'C8 is spec'd as having 64 KiB flash, and this is what it reports over the debug (SWD/JTAG) interface.  However, it is an open secret that the device actually has 128 KiB.  I have never heard a definitive answer as to why this is, but it seems plausible that it was simply a marketing decision on ST's part to offer the lower-capacity device, but scrimp on wafer NRE and merely burn a fuse that causes it to report one way or the other.  Maybe at one time there were actual 64 KiB devices -- it's a rather old part number -- but I've never seen one in life.

    Also, one should be cautioned that there are counterfeits out there.  China has a different viewpoint on trademarks and part numbers, and generally is of the opinion that if something behaves the same as a part number XXX, then you get to market it as a part number XXX.  To a degree, I can appreciate this viewpoint regarding part numbers, however /marking/ the chip in a way that visually causes brand confusion strikes me as overtly deceptive.  At any rate, I mention it because it seems plausible that a counterfeit could actually only have 64 KiB, since that is the advertised capability.

    Some folks fairly recently made a tester to prove that you do have a 128 KiB device, and also to exercise the extra flash to give some confidence that it is reliable.  I won't go into elaborate details other than to say you simply flash it on your BluePill and connect via USB CDC with a terminal and drive a menu.  More details and download at this link:

    Once you have satisfied yourself that you do have a closeted 128 KiB BluePill, you just need to make a couple hacks to make use of it.

    1. 1)  modify your linker definition file to indicate the 128 KiB
      For example, in this project the file is 'STM32F103C8Tx_FLASH.ld', and early in it is the line
          FLASH (rx)      : ORIGIN = 0x8000000, LENGTH = 64K
      which is fairly obvious that you should change to
          FLASH (rx)      : ORIGIN = 0x8000000, LENGTH = 128K
      And that's it for modifying your project!  But this is not enough in itself, because your toolchain probably needs modifications.  If you're using OpenOCD (and who isn't these days, at least under the covers), then you need to do some more.
    2. 2)  modify your toolchain to ignore what the device reports
      In this case, OpenOCD is being used by System Workbench for STM32, and deep within its bowels is the config file
      which must be modified.  Note that for some reason there are two of these.  Only one of them is active -- I can't remember which.  You can modify both to be certain (or modify one at a time to figure out which one is the one System Workbench actually uses -- I think it's the one with 'st_scripts' in the name).
      Way down around line 65 or so is where the flash stuff is kept.  There will be some comments that will make sense. ...
    Read more »

  • TTS Rulez Redux

    ziggurat2905/14/2020 at 17:46 4 comments


    With the 'compact' form of the rules in-hand, is is time to use them.


    I ported the code that processes the rules into C.  This was a bit more trouble than I anticipated because the Python version uses some conveniences in that environment -- especially with dynamically sized arrays and string concatenation.  Since this code is going to be running in an embedded environment, I wanted to avoid as much copying to temporary and dynamically allocated buffers as much as possible, and rather try to process directly out of any buffers or constant definitions.  Additionally, there was a hack in the original rules that required a space to be prepended and appended to the word.  This hack allowed using the space as a meta-character for 'Nothing', which was used to indicate that a context pattern needed to be at the very beginning and end of the text.  I wound up creating a separate meta-character for that '$' and updated all the rules accordingly.  That addition cause me to generate a new distinct string, so I incurred a two-byte penalty to 9385 bytes for the compactified rules.

    Incrementally building the code shows these numbers for flash usage:

    • 40816 baseline
    • 50208 rules included; delta = 9392
    • 51908 tts code; delta = 1700
    • 51964 simple test code to use TTS to translate a sentence; delta = 56

    So this is not too bad; about 2 KB for the actual code, and the simple test (which is fairly representative of how it would be used in practice) is quite small at about 56 bytes.

    This means that there is about 12 KB more flash for code growth before the next crisis.  I think this might be OK for the remaining stuff I have planned.  I've got a little more that 7 KB ram left, and I think this will be enough, too, to finish things up.

    The simple test code:

    static const char achGettysburg[] = 
    "four score and seven years ago our fathers brought forth on this continent \
    a new nation, conceived in liberty, and dedicated to the proposition that all \
    men are created equal.";
    const char* pszText = achGettysburg;
    int nTextLen = COUNTOF(achGettysburg);
    //quicky test running through text
    const char* pchWordStart, * pchWordEnd;
    int eCvt;
    while ( 0 == ( eCvt = pluckWord ( pszText, nTextLen, 
            &pchWordStart, &pchWordEnd ) ) )
        int nWordLen = pchWordEnd - pchWordStart;
        static uint8_t sl_abyPhon[64];    //semi-arbitrarily sized long word
        int nProduced = ttsWord(pchWordStart, nWordLen,
                g_abyTTS, sl_abyPhon, COUNTOF(sl_abyPhon) );
        //stick on a space between words if there is not already a pause
        if ( sl_abyPhon[nProduced-1] > 4 )    //all pauses are code 0 - 4
            sl_abyPhon[nProduced++] = '\x03';
            sl_abyPhon[nProduced++] = '\x02';
        size_t nIdxPhon = 0;
        size_t nRemaining = nProduced;
        while ( nRemaining > 0 )
            size_t nConsumed = SP0256_push ( &sl_abyPhon[nIdxPhon], nRemaining );
            nRemaining -= nConsumed;
            nIdxPhon += nConsumed;
            if ( 0 != nRemaining )
                osDelay ( 200 );    //sleep a little to let the synth catch up
        nTextLen -= pchWordEnd - pszText;
        pszText = pchWordEnd;

    So the gist of using it is to crack the text word-by-word (there is a convenience function pluckWord() provided for this), and then for each word 'plucked' from the buffer, push it into ttsWord() to translate it into a phoneme sequence.  You can then send this sequence off to the SP0256 task (or whatever).

    I added some debug code to make it send the plucked word and text-to-speeched phoneme sequence to the serial for debugging.  E.g. for the first sentence of the Gettysburg address:

    four    28 35 33 03 02      
        FF OW ER2 PA4 PA3
    score   37 08 35 33 03 02   
        SS KK3 OW ER2
    and     1a 0b 15 03 02      
        AE NN1 DD1
    seven   37 07 23 07 0b 03 02    
        SS EH VV EH NN1
    years   0c 13 33 2b 03 02   
        IH IY ER2 ZZ
    ago     1a 3d 35 03 02      
        AE GG2 OW
    our     20 33 03 02         
        AW ER2
    fathers 28 1a 36 01 34 2b 03 02     
        FF AE DH2 PA2 ER2 ZZ
    brought 1c 27 17 0d 03 02   
        BB1 RR2 AO TT2
    forth   28 17 17 33 1d 03 02    
        FF AO AO ER2 TH
    on 17 0b 03 02 
    Read more »

  • Text-to-Speech Rulez!

    ziggurat2905/12/2020 at 20:18 0 comments


    For today's goose-chase, I am porting over the text-to-speech rules.  Some effort was put forth towards reducing their flash footprint.


    Having realized the primary impetus of the project, I'm faced with several other directions to take it next.  Semi-arbitrarily, I decided to try getting text-to-speech capability in place.  As mentioned in a previous post, I have some old TTS code which I ported to Python for a sanity check, and now I am porting it to C for inclusion on the BluePill.  The first step is just encoding the rules as static data to be burned into flash.  Transcoding the rules took a little over a day of mundane reformatting and some considerations of how to work with the C language itself, e.g. there's a bunch of variable-length arrays -- in the other languages the length is an intrinsic property of the array object, but that's not the case in C.  For strings, there is the implicit NUL-terminator, but that is not the case for any other array.  Eventually, I worked out some macros that exploit string-merging to fake it enough to have a result that looks manageable.

    By straightforward inclusion of rules as C-defined structures shows that they take 19,300 bytes of flash.  This is too much.  When I had originally written this code (and by 'written' I mean 'ported some existing work and extended'; credits in the source), it was for a platform called 'dotNet Micro Framework'.  It was somewhat interesting, but it lacked a lot of const-friendliness, and tended to put things in live objects (i.e. in RAM) no matter how much 'readonly' qualifier you would apply.  So in that case I pre-processed the rules into an alternative form that would cause the compiler to leave almost all the stuff in flash.  On that platform, I had an abundance of flash (and comparatively an abundance of RAM, too) relative to here.  Those transformations are not meaningful here, but I wanted to see if a similar compactification could reduce the footprint.  The gist would then be that the desktop app would be the 'master' copy of the text-to-speech rules, encoded in C-structs/arrays in a straightforward way, and then they would be pre-processed into the compact form for embedded.  That way the rules can continue to be developed and maintained in a sane way, albeit with the additional pre-processing step.

    First I did some basic statistics including raw counts and distinct counts:

    Rules: 706
    strs: 2118, bins: 706
    dstrs: 484, dbins: 400

     So, 706 rules, 2118 strings (the various 'contexts') and 706 phoneme sequences.  Of the 2118 strings 484 were distinct, and of the 706 phoneme sequences 400 were distinct.  This seems like that the strings could be reduced to about 25%, but really that is just count.  The devil is in the details.  Truthfully, a lot of the strings are for exception cases, and these tend to be longer.  So deduping short strings might not really squeeze that much.  Having the program tabulate the lengths showed:

    Rules: 706
    strs: 2118, bins: 706
    dstrs: 484, dbins: 400
    strlen: 2783, binlen: 1688
    dstrlen: 1549, dbinlen: 1246

     So, 1549/2783 really reduces about 44% rather than the hoped 75%.  But that's still an improvement.  A similar story is told for the phoneme binaries at 26% rather than 43%.  But it occurred to me that this is not considering the nul-terminators, so I reworked it:

    Rules: 706
    strs: 2118, bins: 706
    dstrs: 484, dbins: 400
    strlen: 4901, binlen: 2394
    dstrlen: 2033, dbinlen: 1646

     Here, the space reduction is better (58% vs 44%, and 31% vs 26%), but wow! the size taking into consideration nul-terminators really added some overhead!  That's what bunch of single-characters strings/bins will do.  But another tale is to be told:  even disregarding de-duping, the total of strings and binaries is 4901+2394 = 7295.  But comparing the flash size before and after including...

    Read more »

  • Speech Commands 101

    ziggurat2905/11/2020 at 18:32 0 comments


    A command 'ph' is implemented to spew phonemes to the synthesizer, enabling basic experiments over the terminal.


    Having gotten the SP0256 task in place, now it's time to use it.  My eventual plan is to have the BluePill accept a binary stream over the serial port and directing it into the SP0256, but that will mean making a client side app (I guess I could make one quickly enough with python -- surely there is serial IO capability there).

    In the short term, though, I decided I can implement a command on the command processor.  This command takes a hex sequence which is the stream of phonemes.  Since it's on the command processor, it's quite limited (the command processor has a hard line-length limit of 128 chars), but it's quite serviceable for interactive testing.

    Because of the line length limitation, I shortened the name of the command to 'ph' and it takes a contiguous stream of hex chars.  E.g., sending:

    ph 1B072D350302 

    Will cause the synth to speak 'hello'.


    In the strictest sense, I have now achieved what was my original motivation for doing this project:  set up to cause the real synth to generate audio that I can record, but of course the project has grown beyond that motivation and now I'm off chasing geese.


    Chase geese, maybe binary steaming of phonemes over the serial port (instead of the textual way here), or maybe text-to-speech.

  • Implementing the SP0256 'Task'

    ziggurat2905/10/2020 at 15:40 0 comments


    A FreeRTOS 'task' (aka thread) is used to manage the interface with the physical SP0256 and stream phonemes from a circular buffer.


    Since this project is ultimately going to have several functions, including the previously described command processor, but also the phoneme receiver and text-to-speech component, I decided to make the handler for the physical SP0256 a task-oriented component.  The gist is that there is an interface where you 'push' phoneme data to your heart's content, and internally there is a 'thread' that removes that data and sends it on to the chip in a way that works with the chip's hardware flow control signals.  I've used this approach in some other projects, and it helps keep the design/implementation modular and more loosely coupled.

    In this case, there are several hardware resources that the SP0256 Task manages:
    • the 'address' lines.  Since there are only 64 phonemes, I decided to relinquish the top two bits for other purposes, so this would up using PA 0-5.  These are not 5V tolerant, but being as they are strictly output, this is OK.
    • the 'not Address Load' (nALD) line.  This is what strobes the data into the SP0256.  This is put on PB 1, which similarly is not 5V tolerant, but since it's going out of the BluePill and into the SP0256, this is OK.
    • the 'not Load Request' (nLRQ) line.  This is part of the hardware handshaking, and it goes low to indicate that it is OK to send data to the SP0256.  Since this is an input, it needs to be on a 5V tolerant pin, and it is put on PB 11.
    • the 'Standby' (SBY) line.  This indicates that the SP0256 is finished with all phonemes, and could be put into low(er!)-power mode.  I don't plan on using it, but nonetheless I wired it to PB 10 in case I change my mind.
    • I also decided to manage the reset line explicitly, and I put that on PA 6 with an NPN transistor open collector.  The data sheet seems to imply that Reset needs to go up to 5V, not just be a digital high, so that's why I did this.

    This chip is really slow, and we are wiggling the lines programmatically, so I use some delay loops.  One way I tend to do that on these ARM parts when possible is use the 'Debug module'.  This is an optional module intended for debugging, but one handy thing it has is a cycle counter.  This is a 32-bit up counter that is clocked by the CPU clock.  By using this (if available on your particular part) I can avoid using the timer resources.  For short delays and even profiling code it can be quite handy.  The module has to be explicitly enabled, and that is done very early in main().

    I use a circular buffer to receive the phonemes from outside this module.  This is some common code I have written that I use across projects.  Since this is manipulated by two threads, I protect it with a mutex.  OK, some things about FreeRTOS:  many functions have two variants:  an 'ordinary' variant, and a 'ISR-friendly' variant.  The synchronization-related stuff in particular is in this class.  Mutexes are what FreeRTOS calls a 'binary semaphore', and you use the semaphore-related functions to acquire and release them.  HOWEVER, for reasons that are not clear to me, mutexes are incompatible with ISRs.  If you really need to do mutual exclusion and within an ISR, you must use the binary semaphore.  FreeRTOS suggests that mutexes are useful for 'simple mutual exclusion'.  Well, I think my application is 'simple' so I am going with the mutex, but I put a caveat in the comments on the API that the various methods are NOT to be called from an ISR.  This isn't a problem for my project, but one day I may re-use this and forget and somehow deadlock the system and have to spend time debugging.  Best to comment.

    Speaking of ISRs, there is presently one interrupt source used:  the nLRQ line is configured as an EXTI source, on falling edge. ...

    Read more »

  • Physical UART 'Monitor'

    ziggurat2905/09/2020 at 18:06 0 comments


    Mini-update:  debugging while connected via USB CDC drives me bonkers, so I made an alternative configuration where a physical UART is used.


    USB involves a lot of stuff to go on to maintain the connection between host and device.  The on-chip peripheral handles some of that, but other parts are handled in code.  Fortunately, I don't have to write most of that -- it's provided in the libraries, however it does have to be running to maintain the connection.  If it doesn't, the host gives up trying to talk to the apparently malfunctioning device.

    And that's the rub:  when debugging (i.e. single stepping) the code, the servicing of the USB is concomitantly halted, and the host gives up on the connection.  Some productive debugging can continue in many cases, however you will ultimately be faced with having to disconnect/reconnect the device to the host (i.e. 're-enumerate') to get the host to recognize it.  Moreover, whatever application was using the USB will likely need to be restarted as well, because the old device handles are of no use anymore.

    Because of this, I decided to spend a little time getting a physical UART running, and connecting it to a separate USB-to-serial bridge device (I keep a handful of FTDI boards on-hand for these sorts of things).  In that way, the board can reset, cycle power, whatever, and not require futzing around on the host machine, because it's connected to the external device, not the actual project.

    I had already made an adapter of the STM UART library code to my stream abstraction in a separate project, so I merged that it.  At that point is was straight-forward to simply bind the UART stream to the monitor instead of the USB CDC stream.  I modded the project to switch between the two based on a preprocessor definition.  I expect that USB CDC would be what is used in 'production' and the UART would be used in 'development', but who knows?  There's probably general use for both.


    Back to implementing the SP0256 interface.

  • Building the Basic BluePill Interface

    ziggurat2905/08/2020 at 22:18 0 comments


    For my first amazing feat, I am going to make the interface as originally planned:  i.e. as a USB CDC to SP0256 'bridge'.


    The original motivation of this mini project was to make a USB CDC to SP0256 'bridge' so I could record individual phoneme signals.  An application on the PC would send the phoneme sequence over serial and then also record the output.  As mentioned, I found a collection of recorded phonemes, but I'm going to carry on with this.  Maybe I'll get some better recordings.  Anyway, I have future plans beyond the original bridge, but this is a good first step towards those.

    Of HALs and Hacks and Heaps

    I've covered this before in other projects, but I'm not a big fan of the STM32 HAL libraries.  I still use them anyway for projects like these because they are convenient, but they strike me as bloated.  For example, after configuring the chip and generating the project and doing a build:

    debug 'optimize for debug'

    arm-none-eabi-size "BluePillSP0256AL2.elf"
       text    data     bss     dec     hex filename
      36140    1156   13576   50872    c6b8 BluePillSP0256AL2.elf

    so, 36 K flash (out of 64K) is consumed, and something 14.5 K ram (out of 24K) is used.  Yikes!  Even doing a release build:

    release, 'minimize size'

    arm-none-eabi-size "BluePillSP0256AL2.elf"
       text    data     bss     dec     hex filename
      30856    1148   13552   45556    b1f4 BluePillSP0256AL2.elf

    doesn't improve it much.  But c'est la vie.  For things like USB the HAL library is pretty much the only option unless maybe you went with an alternative library.  The USB peripheral is a beast.

    The HAL also has quirks, so I have a set of hacks which I use that make USB CDC and UART work the way I want them to.  Bit since the code is generated, those hacks get overwritten.  I have a batch file that re-applies them after every time I re-generate the code.  You will regenerate often in the beginning, because you'll change your mind about peripherals, etc.  The batch file makes it tolerable.

    The last hack is an enhancement -- I have my own heap (i.e. malloc) implementation.  I needed this a while back for a library that required realloc(), and the one that comes with FreeRTOS does not provide that.  Additionally, I added some debugging enhancements that let me do a 'heapwalk' of the blocks and also to fill blocks with a pattern so they are easier to visually inspect.  The last bit of legerdemain with the heap is using a nifty feature of the gcc linker.  You can tell the linker to redirect symbols to a 'wrapper' function that you must provide.  This trick allows me to re-direct calls to malloc() that are even in pre-compiled code (e.g. libc) into my implementation.  This is important, because otherwise you will have two heaps:  the one you conscientiously use that you provide, and the default implementation that is in libc.  I am less fond of the libc implementation because it will 'grow' the heap as needed upwards into the stack.  There isn't a hard limit on the arena size (well, up until crashing -- that's a hard limit!).

    Some folks have asked for my heap implementation.  It's in the source (in the github project in the links section).  It goes where FreeRTOS normally places heap_4.c:


     the 'fixup.bat' does the deletion of heap_4.c and replaces it with heap_x.c.  The linker stuff is documented in main.c near the top.

    OK with all that setup, I generally start with a common design where the 'default' task (which apparently cannot be deleted via the tool, so I just work with it) handles things like the LEDs (of which this board has just one on the...

    Read more »

View all 14 project logs

  • 1
    Way 1: burn pre-built firmware

    if you just want to kick the tires, there are two pre-built firmware images in the 'files' section.  Pick one of them (USB CDC or UART).  You can burn this with whatever flashing tool you have, e.g. I use the "STM32 ST-LINK Utility" with a ST-Link-V2.
    There is also a way of doing this over the UART with the built-in bootloader, though I've never done that myself.  Google "Programming an STM32F103 board using its USB port (Blue Pill)" and you'll find some instructions if you want to give that a whirl.

  • 2
    Way 2: built the firmware yourself

    The project is built with "System Workbench for STM32", which is a free toolchain from ST Microelectronics.  You'll need to install that.  The source code is complete and you should be able to build out-of-box except  note that I also hacked the Open OCD config file to allow me to use the 128 KiB flash.  (there's a post on doing that)
    Then you can burn the image you built with whatever.
    You might be able to alternatively use PlatformIO -- it looks like it can import the STM project, but I've never tried this myself.

  • 3

    The firmware presents a command-line interface over whichever serial mechanism you have chosen as per build options or pre-built firmware image.  The interface is described in detail in 'manual.txt' in the files section.
    Note that the firmware by default will operate in 'physical' mode -- i.e. it expects an actual SP0256-AL2 to be connected.  You can change the mode to 'simulated' which will allow you to play with the synthesizer with just the BluePill alone (well, you still need your LPF and audio amp).  If you do this, you probably will want to 'persist' the settings so that the choice will survive a reboot.

View all 3 instructions

Enjoy this project?



Frank N. Stein wrote 08/23/2020 at 20:22 point

Thank you very much for this fantastic project! I am a great fan of this old TTS stuff! SP0256 is just as great as S.A.M.! ( you know this version for a PIC32MX170?

Your USB version did not work for me - my Blue-Pill is not recognized by my PC! Do you need a special USB-driver for this?

You should mention that your UART version works with 115000 bps/8/N/1 - this version works great for me!! Could you please also create a version with 9600 or 19200 bps? That would be very useful for small (slow) microcontrollers!

Your TRS80 on the Duinomite is just as great! Does the UART work there to connect your SPO256?


  Are you sure? yes | no

Dan Maloney wrote 05/05/2020 at 16:55 point

That picture gave me a 1980s nostalgia overdose.

Honestly, I always coveted the Speak and Spell. Seriously wanted one of those things to experiment with.

  Are you sure? yes | no

ziggurat29 wrote 05/05/2020 at 17:00 point

I did too back in the day! But I had to make do with the SP0256. Anyway, I couldn't resist the aesthetic of ca. 1982 geek chic when I saw the ad, even though I'm not actually using the TI chip in this project.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates