Close
0%
0%

Inexpensive Lightweight Speech Recognition

A mostly hardware based solution suitable for micro controllers and other small systems.

Similar projects worth following
The light weight Audio and Speech Recognition system. It is just that, a hardware based solution that with the use of a small micro-controller can perform some of the heavy lifting.

When you fire up a spectrum analyzer and look at various audio frequencies you will see that some peeks are higher than others for a given sound. These peeks are what I will call an audio signature. The question is, how much "resolution" do you really need to differentiate one audio signature from another?

That is where this system comes in, by utilizing a commonly available part, namely a graphic equalizer chip found in stereos, we can take a sample of the audio (FFT) spectrum at set intervals. This snapshot can then be compared to known values such as phonemes with the use of a small micro controller and a look up table. That controller can then act upon this signature by either performing the task itself or relaying that information to another system.

NOTE: I know this can be done easily in other ways, and in fact the project that brought this up has a pi 3 on board, more than enough power to do audio recognition on its own. This is a more "can I do it this way.. ?" project :)

The novelty of this approach is the use of the spectrum analyzer IC commonly used in graphic equalizer circuits found in your common stereo. The chip I will be using for initial testing of the concept is the MSGEQ7. This particular IC will find the peaks for 7 bands, then output those as an analog signal every “clock tick”, sequentially stepping through the bands. That analog voltage is then sensed by the micro-controller and decoded. By using this chip, the micro does not need to do the heavy lifting of performing a FFT function. It simply needs to compare the values to a set lookup table inside the controller. To account for volume differences, peaks will be converted to their lowest ratios. These ratios are then compared to the tables, taking into account a margin of error.

A couple examples of an audio spectrum for two sounds.

The first is me whistling.

The next two are snapshots of me saying different words.

The Microchip Pic16F1459 that I am using includes a 10 bit analog to digital converter. By reading each of the 7 analog signals we can think of the whole thing as a two dimensional array of seven bands, each containing a voltage value.

Frequency Response of the MSGEQ7:

signal[band number][10 bit voltage value].

The resulting numbers are a simplified visual of the full spectrum, such as those seen above.

We then convert these 7 values to their lowest common ratio before utilizing the look up table.

This gives you a huge variety of signatures from which to choose. In the initial design the goal is to be able to recognize common words and outputting the phonemes via USB to the host system.

This project came about as a component in P.A.L, ( https://hackaday.io/project/12383-pal-self-programming-ai-robot) . From comments and questions, I have noticed that people have a hard time separating the various parts of the project. So much goes into designing and building an autonomous robot from scratch that I feel the best solution will be to break out each of the major systems into smaller project chunks for easier digestion.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 United States License.

More to come! :)

  • The PCB's arrived yesterday in the mail, and Etching thoughts.

    ThunderSqueak08/22/2016 at 08:04 1 comment

    Well, the blanks did anyway. I know I could have just ordered this from a board house and been done with it, but what fun is that :) I am going to be using a toner transfer method to create the board layouts. I have not decided on which etching method I will use. In the past I have readily utilized Ferric Chloride. I am looking at other options however that may have less of an environmental impact. One possible solution is to use cupric chloride which can be reused. There is a nice article on it here http://www.instructables.com/id/Stop-using-Ferric-Chloride-etchant!--A-better-etc/?ALLSTEPS

    Before I jump into mixing chemicals listed on a website, I plan on talking to a friend of mine who is a chemistry professor at the University of Alaska to see what he thinks about the above solution and if it really is "more friendly".

    For those who are not familiar with toner transfer PCB etching, http://www.dr-lex.be/hardware/tonertransfer.html

    Those of you out there may say "why bother with all this, why not just order them?" Part of that is cost, I can etch many PCB's for a few dollars in components. The other reason is that this is a hobby and how I spend my free time. I enjoy the entire process of creating an object and much of it is about the journey. Without the journey, you don't learn. Learning is where the most enjoyment comes from.

    I will get images up later after the board is etched as well as the results of the initial testing.

    Cheers!

  • Thoughts external Band Pass Filters

    ThunderSqueak08/22/2016 at 07:52 0 comments

    I am curious to know if a multiplexer could be used along with a set of external band pass filters could be used to create the same effect as the commonly available IC's on the market. By having each filter go high if the tones are "heard" and then stepping through them in a circular pattern with a simple looping binary counter could we create enough points that can be used to differentiate very subtle differences in the audio world around the "ear" ?

    For those not familiar with band pass filters I recommend reading https://en.wikipedia.org/wiki/Band-pass_filter

    For those not familiar with a Multiplexer or MUX, I recommend reading https://en.wikipedia.org/wiki/Multiplexer

    The counter could be powered by either a micro-controller or you could create one using several flip flops and a clock pulse.

    If you look at the above links, you can see the simplicity of this sort of system. No idea if it will work, but I am making a note here to remind myself to give it a try :)

  • Round Robin approach.

    ThunderSqueak08/22/2016 at 07:39 0 comments

    In initial tests using a spectrum analyzer I was able to differentiate between several different sound occurrences based upon the spectrum of each of those components. I was considering the way that the chip I am using captures the audio. It occurs in such a way that it uses a round robin approach in order to scan the spectrum, each a few ns apart. This might be useful when trying to capture words directly as well as other sounds as in a word, there is definitely a progression of different tones that go together to create a recognizable phonetic.

  • Created a quick and dirty board layout

    ThunderSqueak08/22/2016 at 07:34 0 comments

    After looking at the wiring that needs to be done, I decided that using a veroboard would be mildly annoying. In the spirit of DIY I have decided to try etching a simple double sided PCB. I took a look at the current free software offerings both in price and "in spirit" and gave several of the PCB tools out there a quick try.

    Eventually I ended up using EAGLE, it simply had the parts libraries already there. I did look at several other tools including but not limited to KiCAD, Ftritzing, and Circuitmaker. They all have their ups and downs. After the board is etched and tested, I will be uploading the PCB files used.

View all 4 project logs

Enjoy this project?

Share

Discussions

Jon Buford wrote 04/15/2017 at 12:09 point

Sensory and then a few other companies had VR happening on DSPs and 16-bit MCUs back around 2000. The drawback was that they were pretty sensitive to noise and they didn't have a wide range of matching on a trained word. The best responses were when the known word list was short, within 5-10 words, and for each to have some unique features that allowed it to be different enough from the others on the list. 

  Are you sure? yes | no

Paul Stoffregen wrote 04/05/2017 at 23:28 point

Just curious, did you make any progress on this?  Still interested?

  Are you sure? yes | no

ThunderSqueak wrote 04/07/2017 at 19:12 point

Have not lost interest, just in the middle of moving to a new home. :)

  Are you sure? yes | no

Paul Stoffregen wrote 04/09/2017 at 14:07 point

There's a thread over on Arduino's forum where someone claims to have Matlab code which matches sequences of FFTs to templates for words.  So far, no code or algorithm details shared....

  Are you sure? yes | no

shlonkin wrote 07/18/2016 at 23:32 point

I love the idea, but doubt the implementation. Seven bands will not give you anywhere near enough resolution for distinguishing phonemes, I think. Please prove me wrong. If you can pull it off you deserve a prize. 

Why not choose a microcontroller capable of quick FFT like the Teensy 3.2? You could even add the matching audio shield to handle things like microphone amplification and high resolution, fast sampling. I know it takes some of the fun out of hardware design, but you'll still have a huge software challenge left. Good luck.

  Are you sure? yes | no

K.C. Lee wrote 07/19/2016 at 00:01 point

BTW I am running real time FFT + sampling on my audio switch project using a lowly $0.44 STM030F4 ARM.  It actually doesn't need much horse power for that.

I will be using FFT for my Alarm Recognition project for 40Hz resolution in the 0-5kHz band.  I am more than happy to share code and hardware design.

  Are you sure? yes | no

ThunderSqueak wrote 07/19/2016 at 01:51 point

Sure, right now though this is more of an experiment  to see if it can be done :)  I figured, why not try it and see what happens.  If nothing else, I learn something.  The project this is for actually has a pi3 on board, so it has enough power to do audio recognition on its own. ^^

  Are you sure? yes | no

Yann Guidon / YGDES wrote 07/20/2016 at 01:13 point

Well, using an external analog filterbank *IS* a hack :-)

An even better hack would be to run another bank or two at shifted frequencies to increase the frequency resolution ^_^

  Are you sure? yes | no

wickedcody123456 wrote 09/26/2016 at 06:05 point

pls

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates