Close

Compressing the Phoneme Data with ADPCM

A project log for Look Who's Talking 0256

A BluePill Driver/Simulator/Emulator for the GI SP0256-AL2

ziggurat29ziggurat29 05/15/2020 at 20:040 Comments

Summary

The phoneme data in its raw form is about 116 KB.  This doesn't leave enough room for code on even an 128 KB BluePill.  ADPCM should reduce that size by 1/2, but will it sound good enough?  Back to Python land to prototype and find out.

Deets

As a stretch goal, I wanted to see if I could have the BluePill could function as the SP0256-AL2 itself, without requiring the physical part.  Way back I had proven that concatenation of pre-recorded phonemes was a viable approximation to the actual digital filter originally used, but the recordings are too big for flash.  Even with a 128 KiB BluePill, the inclusion of the uncompressed data would leave only 12 KiB flash for everything else, which would not be enough for what we have now.  So some sort of compression would be useful.  But before that, you will need to have a 128 KiB BluePill.  Do you have one?  Almost certainly you do.  Some background...

Getting 128 KiB Flash on the STM32F103C8

The 'C8 is spec'd as having 64 KiB flash, and this is what it reports over the debug (SWD/JTAG) interface.  However, it is an open secret that the device actually has 128 KiB.  I have never heard a definitive answer as to why this is, but it seems plausible that it was simply a marketing decision on ST's part to offer the lower-capacity device, but scrimp on wafer NRE and merely burn a fuse that causes it to report one way or the other.  Maybe at one time there were actual 64 KiB devices -- it's a rather old part number -- but I've never seen one in life.

Also, one should be cautioned that there are counterfeits out there.  China has a different viewpoint on trademarks and part numbers, and generally is of the opinion that if something behaves the same as a part number XXX, then you get to market it as a part number XXX.  To a degree, I can appreciate this viewpoint regarding part numbers, however /marking/ the chip in a way that visually causes brand confusion strikes me as overtly deceptive.  At any rate, I mention it because it seems plausible that a counterfeit could actually only have 64 KiB, since that is the advertised capability.

Some folks fairly recently made a tester to prove that you do have a 128 KiB device, and also to exercise the extra flash to give some confidence that it is reliable.  I won't go into elaborate details other than to say you simply flash it on your BluePill and connect via USB CDC with a terminal and drive a menu.  More details and download at this link:
stm32f103c8-diagnostics

Once you have satisfied yourself that you do have a closeted 128 KiB BluePill, you just need to make a couple hacks to make use of it.

  1. 1)  modify your linker definition file to indicate the 128 KiB
    For example, in this project the file is 'STM32F103C8Tx_FLASH.ld', and early in it is the line
        FLASH (rx)      : ORIGIN = 0x8000000, LENGTH = 64K
    which is fairly obvious that you should change to
        FLASH (rx)      : ORIGIN = 0x8000000, LENGTH = 128K
    And that's it for modifying your project!  But this is not enough in itself, because your toolchain probably needs modifications.  If you're using OpenOCD (and who isn't these days, at least under the covers), then you need to do some more.
  2. 2)  modify your toolchain to ignore what the device reports
    In this case, OpenOCD is being used by System Workbench for STM32, and deep within its bowels is the config file
        stm32f1x.cfg
    which must be modified.  Note that for some reason there are two of these.  Only one of them is active -- I can't remember which.  You can modify both to be certain (or modify one at a time to figure out which one is the one System Workbench actually uses -- I think it's the one with 'st_scripts' in the name).
    Way down around line 65 or so is where the flash stuff is kept.  There will be some comments that will make sense.  There is a line that by default is:
        flash bank $_FLASHNAME stm32f1x 0x08000000 0 0 0 $_TARGETNAME
    The first '0' is what tells OpenOCD 'ask the chip how much we have'.  You can change that value to explicitly state that there is 128 KiB like this:
        flash bank $_FLASHNAME stm32f1x 0x08000000 0x20000 0 0 $_TARGETNAME
    And that's it for modifying your toolchain!  This only needs to be done once, and all your projects will be affected.  Note that it appears that the version number of System Workbench is in the path name, so I suspect it is entirely possible that if you upgrade System Workbench that you might have to re-apply this patch.

With that done, you can continue on merrily with 128 KiB flash available.  Let's fill it up!

Back to Compressing Audio Files

One simple compression is 'differential pulse-code modulation' ('DPCM').  The PCM part just means digitally sampled signal, so the interesting bit is the 'differential' part.  The idea is that instead of storing the samples as records, you instead store the differences between two consecutive samples.  If the signal doesn't change too wildly sample-to-sample, the differences will be much smaller than the magnitude of the signal itself, and so that difference can be encoded in fewer bits.  This sort of works, but audio signals sometimes can have wild swings, so a modification is 'adaptive differential pulse code modulation' ('ADPCM').  The 'adaptive' part means that the magnitude of a change can be greater or smaller; e.g. a '1' might mean a change in +1 now, but later on it might mean a change in +256.  The adaption works by changing that step size based on how far off one was in the previous prediction.  This scheme is typically a fixed-compromise, and so there is no additional data in the stream indicating when to make a change in step size -- both the encoder and decoder will make the adaption in the same way.  So again, only the differences are transmitted.

The algorithm has been around a while (e.g. CCITT G.721), and typically involved floating point calculations for the adaption curves, however the Interactive Multimedia Association (IMA) came up with a version that used table lookups instead of logarithms, so this is more tractable for embedded -- especially with no FPU!

Much as before, to prove the concept I use python to flesh out the implementation of the encoder and decoder.  Then I encoded all the phoneme data in ADPCM, and then I decoded it back to PCM.  With this second version of the audio, I ran through the same test case to hear side-by-side the straight-PCM version against the ADPCM-version.  This is a lossy compression scheme, so the actual numbers are different, but audibly it sounds pretty much the same.  E.g.: testcase_001a.wav  So that's promising!

After that, I added some code to the python PoC script to emit C source that contains all the ADPCM data a constant arrays, and included it in the project to see what the flash impact is.  It turns out to be about 58 KB.  So there's hope; that will take up most of the newly unlocked flash, with a few K to spare.  That with the 12 or so K we already have is probably enough to get the project completed.

Building with the ADPCM data shows we have broken the 64 KiB boundary with about 20 K to spare:

arm-none-eabi-size "BluePillSP0256AL2.elf"
   text    data     bss     dec     hex filename
 109992    1984   15360 127336   1f168 BluePillSP0256AL2.elf

arm-none-eabi-size "BluePillSP0256AL2.elf"
   text    data     bss     dec     hex filename
 109992    1984   15360 127336   1f168 BluePillSP0256AL2.elf

Next

PWM output of the sound data.

Discussions