Low bit-rate Speech playback for AVR

Description

There are several ways of making a computer talk. The simplest is to record whatever you want to say and play it back at about 8000 samples/sec. The problem with this approach is that it takes a lot of memory and is not very flexible. Another way is to compress speech so that the MCU can directly do the decompression on the fly. I used differential, pulse-code modulation (DPCM). The motivation for sending samples of the first derivitive of the speech signal, rather than the signal itself, is that the derivitive changes relatively little between samples so fewer bits are required. I implemented a DPCM scheme with 4:1 compression which sends 2-bit derivitive samples. It sounds acceptable, but a little scratchy. I also implemented 8:1 compression (1-bit derivitives). The quality is lower, but still understandable most of the time.

Details

DPCM (2-bit samples)

A version of the DPCM algorithm can be implemented using very little processing time. A 2-bit/sample compressor/decompressor was written in Matlab to encode and to make a packed C header file, and then to do a test-decode. Note that the quantization break-points and reconstruction values are made up by me. You can change them, but you must be consistent in the encoder and decoder. An optimization (program + function) based on the histogram of first derivitives suggests that quantization breakpoints of [-0.05, 0, 0.05] and reconstruction values of [-0.16, -0.026, 0.026, 0.16] are about right for demo wav file given below. A decoder written in GCC for the Mega644 uses the packed code format to generate speech. Each second of speech takes 2 kByte of flash.

To use this system:

If you want to have the Mega644 just speak the numerical digits, skip this list and use the code in the next paragraph.
Get some clean, noise-free speech. You could record your own voice or use this TextToSpeech demo.
Make sure the audio sample rate is 8kHz and save it in a wav file. This little matlab program downsamples a wav file by 2:1. If you use the text-to-speech demo in step (2) you will need to downsample.
Run the Matlab compressor on the wav file. The compressor output file will be a table in C header format. You could, of course, have several short compressed tables in flash, or you could index into a long table to say just one word.
Resynthesize on Mega644.
1. Include the compressor output file from step (4) in your c program.
2. Attach PORTB.3 to a low pass filter, and then to an audio amplifier. The low pass should cutoff at about 18,000 radians/sec (3000 Hz). Sometimes you can skip the lowpass and use the input characteristics of the audio amp to lowpass.

The file <a href="http://people.ece.cornell.edu/land/courses/ece4760/Speech/GCC644/DPCMAllDigits.h">DPCMAllDigits.h</a> has a GCC flash array for the digits zero to nine. If you include this in a test program, you have available all the spoken digits. The sample index boundaries for the digits in the array are given below. Using this table you can speak individual digits by decompressing only part of the flash array.

<a href="http://people.ece.cornell.edu/land/courses/ece4760/Speech/GCC644/DPCMAllDigits.h">DPCMAllDigits.h</a> is based on the TextToSpeech demo page using the simulated voice "Claire". Commas were placed between the digit names for synthesis. The original synthesis result (wav at 16 Ksamples/sec) and reduced rate result (wav at 8 Ksamples/sec) used as input to the compressor are included for reference.

DPCM (1-bit samples)

A version of the encoder was written that simply sends one bit/sample depending on the sign of the first derivitive. The reconstructed speech has noticably higher noise than the 2-bit version, but is still understandable. The 8Ksample/sec speech waveform (from the TextToSpeech demo page using the simulated voice "Mike") is compressed with a matlab program to produce a C header file, which is included in a mega644 test program. About 60 seconds of speech should fit into flash on a mega644. Attach PORTB.3 to a low pass filter, and then to an audio amplifier. The low pass should cutoff at about 18,000 radians/sec (3000 Hz). Sometimes you can skip the lowpass and use the input characteristics of the audio amp to lowpass.

Discussions

Jim Thompson wrote 06/11/2019 at 10:18

I took your 1-bit compression and ran it through Audacity's noise reduction filter, and cranked the reduction up to 30 dB. This sounded pretty muddy, so I ran it through a high-pass filter at 3000 Hz with a 3 dB/octave slope, then normalized that. The result was a marked improvement over the original. It did have some artifacts that made it sound like it was on the verge of oscillating, but this was less objectionable than the noise.

Now, I don't know how sophisticated Audacity's noise reduction filter is, so I don't have any idea whether this is within the realm that an 8-bit AVR can handle, but the high-pass filter is pretty easy. Point is, you can sometimes get away with more compression if you can afford some additional processing power.

Are you sure? yes | no

Jim Thompson wrote 06/11/2019 at 09:17

Does anybody know if the AT&T demo is still available anywhere online? I'm glad I saved recordings of some of the things I managed to get it to say (I got "Claire" to speak English with a French accent, for example.)

Are you sure? yes | no

Daniel wrote 06/12/2017 at 12:31

Assembly code to include a file without using a header (works in GCC):

.global Foobar

.global FoobarSize

Foobar:

.incbin "foobar.wav"

FoobarSize =.-Foobar

Then, in C, "extern int Foobar" and take the address of Foobar to find the data.

Are you sure? yes | no

Yvan256 wrote 01/15/2017 at 22:52

By only storing phonetics you might be able to store a lot into only a few KB of space. You might want to check out how the Intellivoice (https://en.wikipedia.org/wiki/Intellivoice) works.

Are you sure? yes | no

Leonard wrote 04/04/2015 at 13:23

Ah! the Natural Voices AT&T demo, fabulous speech :) Nice exploration of audio on an mcu!

Are you sure? yes | no

Bruce Land wrote 04/04/2015 at 15:07

Thanks. It is amazing how low a bit rate can support speech.

Are you sure? yes | no

Low bit-rate Speech playback for AVR

Description

Details

Discussions

Similar Projects

proximity controlled sampler

Speech playback on PIC32, 8:1 compression

PICSPEAK

Reference voltage output plays WAV files on PIC32

Low bit-rate Speech playback for AVR

Become a Hackaday.io member

Just one more thing

Description

Details

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

proximity controlled sampler

Speech playback on PIC32, 8:1 compression

PICSPEAK

Reference voltage output plays WAV files on PIC32

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member