Close

On Phoneme Coding

A project log for μTTS

Speech synthesis on a microcontroller. Talking projects, cheap as chips!

greg-kennedyGreg Kennedy 03/17/2016 at 01:420 Comments

Not all is forgotten here...

The process of coding pronunciation has a long history. Obviously you can't just start with "HELLO" and know how the mouth sounds should be formed to speak it. Instead, you break it down into phonemes, then speak them separately. So how to notate phonemes?

Dictionary printers each have their own particular system, typically borrowing letters from the language and adding some extras where needed to disambiguate (like the schwa - ə - to sound like the "a" in "about"). Merriam-Webster pronunciation for HELLO is

hə-ˈlō or he-ˈlō

which retains most English characters (h,e,l,o) and is easy for English speakers to understand.

If you ever used S.A.M. for C64 / A800 / etc, you may be familiar with ArpaBet. This describes phonemes with one- or two-letter codes, space separated, using ASCII uppercase letters. Numbers denote stress (accent... aka "loud"). Invented in the 1970s, it's a bit primitive but still functional, and the CMU Pronouncing Dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) uses it. For example, HELLO in ArpaBet is:

HH AH0 L OW1

These systems have a commonality, though, which is that they are tied to a language. If you want to go truly universal (well, human universal anyway) you need to use the International Phonetic Alphabet. This system covers every sound humans can or do use in a language. To do so requires lots of funny symbols. Hello becomes something like:

hɛˈloʊ̯ or həˈloʊ̯

which is why English dictionaries don't use IPA: it's hard to read and way overkill for English - would be better to have "e" instead of "ɛ" right? Well, if µTTS is to be useful everywhere, it should speak IPA. Only... that's hard, because most IPA characters are Unicode and who wants Unicode in their microcontroller?

Enter X-SAMPA (https://en.wikipedia.org/wiki/X-SAMPA). This is a system which encodes IPA to 7-bit ASCII strings. Some characters are the same, others are substitutions, and more complicated IPA symbols are "decomposed" into multiple X-SAMPA characters by adding modifier characters (typically, "\" or "_" etc). Best of all, it's a quasi-standard, so there are tools already to convert between IPA and X-SAMPA. So the IPA HELLO above, in X-SAMPA, is:

hE"loU_^ or h@"loU_^


All this is informing design decisions for µTTS, which is to say that it should take X-SAMPA as input and produce audio as output. Since X-SAMPA fits in 7 bits ASCII, I can use the top bit for "control" sequences (pitch, speed, etc). The hardware interface would be something like 9600-8-N-1 serial. I sort of envision it all fitting together like this:
Well, that's enough for now. Time to get coding!

Discussions