The Visual Ear

My goal for this project is to create a Visual version of the function performed by the Cochlear structure of the inner ear.

Public Chat
Similar projects worth following
The Visual Ear project.

My goal for this project is to create a Visual version of the function performed by the Cochlear structure of the inner ear.

The Cochlear performs the task of converting pressure waves into electrical impulses that the human brain can use to recognize sounds like music and speech. Because of its macro and micro structure, the Cochlea actually detects audio frequencies (rather than raw pressure waves), and this is the information that it passes to the brain.

So, my idea was to see how far I can push the concept of a visual audio spectrum analyzer. That is, to enable a hearing impaired person to “SEE” sounds. Essentially create a Visual Ear (VE).

A search for related projects

I always start out new projects be making sure I don't re-invent the wheel.

A quick check of ongoing Hackaday (and other) projects returned many examples of visual spectrum analyzers.  Some of them are simple LED strips, and others are full Matrix displays.  Some use a microphone for their input; others use line inputs, Bluetooth or even audio files.  Other major variations between different analyzers were the number of specific frequency bands that are measured, and how the signal strengths are displayed.  

None of the ones I found had anywhere near enough Bands form my liking, so I decided it was worth continuing with my project. 

Sometimes it's more fun to show where I'm at now, and then explain how I got there.. So here are my current specs.

  1. Bands:  59 Bands, spanning 55 Hz - 17163 Hz (8.25 Octaves)
  2. Audio Samples per FFT:  4096 samples @ 36 kHz
  3. FFT Frequency Bins: 2048 bins (8.79 Hz each)
  4. Display Update interval: 27 mSec (37 Hz)
  5. Audio to Visual Latency:  Max 40 mSec (14 mSec acquisition + 26 mSec FFT & display)

But, going back to the beginning... I find it useful to start my projects with a rough set of system Requirements.  This helps me to clarify my goals, and it gives me a place to return to when it's time for me to evaluate how well (or otherwise) the project is going.

So my first Project log will start there, and then we'll see how it goes.

  • Log 2. Things I learned along the way.

    Phil Malone2 days ago 0 comments

    Required processing power:

    My first concern was how long the FFT calculation was going to take.  I knew that this algorithm requires floating point calculations, and also some Sine, Cosine and Square root calls.  So whatever processor I chose to use would definitely need a hardware Floating Point Unit (FPU). 

    My initial choice was the ESP32, since I was familiar with this MCU from other projects, and it has an FPU.  The ESP32 also supports both I2S and PDM inputs (for the microphone) and the FASTLED library.  I was pleasantly surprised to also find an FFT library that I could use with the ESP32. 

    I’ve used both the Arduino IDE and ESP-IDF programming interfaces on other projets, but since this was likely going to be an open source project, I chose to use the Arduino IDE to make it a bit more “mainstream”.  I found some projects using an FFT to process audio input data so I started there.  

    My first test was to determine how long an FFT took to run.  I stared with a 1024 sample input buffer, and discovered that it took about 37 mS to run one conversion cycle.  I doubled the size of the input buffer to 2048 samples, and now it clocked in at 74 mS.  So I was pleased to see that the relationship between input buffer size and processing time was linear, and not exponential.  Since the FFT input buffer has to be a power of 2 in size, to get more samples you need to at least double the buffer size.  I was a bit concerned that there may not be “enough” processing power as I was already eating into my 50 mS latency goal.

    Bins and Bands.

    Next I started looking at how the results of the FFT (frequency Bins) get turned into an LED display (frequency Bands).  The first thing I discovered here was that there is NOT a 1:1 ratio between Bins and Bands.  It turns out that humans sense audio frequencies logarithmically.  For example, in the Do, Ra, Me, Fa, Sol, La, Te, Do scale, the second Do, has twice the frequency of the first Do, and this is frequency doubling is repeated for each Octave.  So after 8 octaves, the frequency is 256 times what you started out with.  But with an FFT, the frequency Bins that are generated have equal frequency spacing (they increase linearly, not exponentially).  To map Bins into Bands you need to figure out which bins fit into each Band.  I found a great spreadsheet for doing this.  I utilized a tweaked version of this spreadsheet extensively in my progression to more and more Bands.

    Rules of thumb.

    As I slowly increased the number of Bands in my Visual Ear, I developed some basic rules of thumb which provided the best allocation of Bins to Bands.  Remember:  Bins are what get generated by the FFT and Bands are what get displayed using LEDs.

    1. More Bins give you better Band clarity.   When allocating Bins into Bands, if you don’t have enough Bins, then at the lower Band frequencies you find that there is LESS than one Bin per Band.  This really doesn’t work well, because it means that several Bands will look exactly the same.  At a base minimum you need at least one Bin per Band.
    2. Don’t sample your audio too fast or too slow.  For a given audio sample rate, the resulting Bins will span from DC to ½ the sampling frequency.  So you want to choose a sample rate that is just higher than 2 times your highest Band frequency. For example, if your top band is for 16 kHz, then you should sample just a bit faster than 32 kHz.  If you choose a rate that is too high, there will be a bunch of unused Bins at the top of the frequency spectrum. If you choose a rate that is too low, then there won’t be any Bins that go high enough to be included in you upper bands.
    3. Don’t start your bottom Band too low.  The frequency of your first band will define how close your first few Bands are to each other.  This effects...
    Read more »

  • Log 1. System Requirements

    Phil Malone2 days ago 0 comments


    To begin my journey, I decided on my primary requirements:

    1. The VE should employ a Display Strategy that uses all possible visual cues.  This includes LED a) Position, b) Color and c) Intensity.
    2. I want the VE to respond to ambient sounds, so I need to use a microphone.
    3. The VE should respond to the full human audio spectrum, so I need a high quality microphone with a flat response over as wide a range as possible.
    4. In order to discriminate specific sounds, I want a LOT of visible frequency Bands.  If you consider the keys of a piano as discrete notes, then to be able to perceive a piano tune, you would need 88 spectral bands (covering 9 octaves).
    5. To be able to perceive rhythms as well as melodies, the sound-to-vision response time must be very short.  I want the VE to be able to show a fast percussive beat, like a drum roll.  However, at some point our eye’s “Persistence of Vision” will mask rapid visual changes, so I’d like to get as responsive as possible without going overboard.
    6. Since sounds levels vary wildly (many orders or magnitude) the user should be able to adjust the visual “gain” (intensity) of the display.  Some means of automatic gain control may be possible and desirable.

    Requirement 1: Display Strategy

    To keep things simple to begin, I decided that I would utilize a single RGB LED Strip with high LED density for the display. 

    The LED strip would represent the human-audible spectrum range, and each LED would display one frequency band.  Each LED frequency band will have a “Center Frequency” (CF) and each CF will be a constant multiplier of the previous band’s CF.  This approach provides a very popular logarithmic frequency display. 

    Since not all LEDs would be on at any one time, the color of each LED will reinforce its position in the spectrum.  The lowest frequency would be Red, and the highest frequency would be Violet.  The classic rainbow ROYGBIV color spectrum would be spread across the audio spectrum. 

    The actual strength of each frequency band will be indicated by the brightness of the LED.

    Requirement 2:  The Microphone.

    I’m not a big fan of analog circuitry, and I know that to get a good audio input for Analog to Digital conversion you need to make sure your pre-amplification circuitry is well designed.  So to avoid that problem I researched some available digital microphones (the type you might find in a good phone).  I discovered that there were two basic interface types: Inter-IC-Sound (I2S) and Pulse Density Modulation (PDM).

    I2S is a serial data stream that sends a series of bits that make up a binary words that represent instantaneous sound levels.  PDM is also a digital stream, but it’s more like a Pulse Width Modulation (PWM) strategy where the changing duty cycle of the stream indicates the sound level.  Both methods are best performed by a processor that has a native device interface that it can use. 

    I found an example of each microphone type at  Both were mounted on a basic breakout board for easier prototyping. 

    I2S:  Adafruit I2S MEMS Microphone Breakout - SPH0645LM4H (PRODUCT ID: 3421)

    PDM: Adafruit PDM MEMS Microphone Breakout – SPK0415HM4H (PRODUCT ID: 3492)

    Requirement 3: Flat Spectral Response

    Both of the previously identified microphones show a relatively flat response (+/- 2dB) from 100Hz to 10Khz, with degraded response continuing to the 50-16kHz range.  Until it’s demonstrated otherwise, these microphones will be suitable for prototype testing.

                                                                     I2S:  - SPH0645LM4H


    Read more »

View all 2 project logs

Enjoy this project?



Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates