Close
0%
0%

On Air

Keyword activated status display

Similar projects worth following
I am training a Tensorflow speech detection model so I can display a status screen.

_

IMG_2737.jpg

Display Stairwll

JPEG Image - 1.87 MB - 03/31/2020 at 01:09

Preview
Download

IMG_2739.jpg

Display new case and hanging

JPEG Image - 1.21 MB - 03/31/2020 at 01:09

Preview
Download

IMG_2740.jpg

Display new case and hanging

JPEG Image - 1.64 MB - 03/31/2020 at 01:09

Preview
Download

Graphics Interchange Format - 4.69 MB - 03/31/2020 at 01:09

Preview
Download

Graphics Interchange Format - 1000.09 kB - 03/31/2020 at 01:09

Preview
Download

View all 11 files

  • 1 × ESP-EYE
  • 1 × Adafruit PyPortal

  • Getting there

    Alexander Hagermana day ago 0 comments

    After spending some time orchestrating training and data generation to make the process flow as fast as possible I spent this week working on integrating the model, the control flow and the final network call when the right sequence of words is detected. I ran into a few different issues with the network call. Earlier in the project I verified the HTTPS PUT behavior using the examples in the esp-idf project repo. Since then `tcp_adapter_init` has been replaced with netif_adapter_init. Additionally I haven't spent much time using a modern compiler with C++ calling C code. Because of that I spent some time chasing down struct init errors going from the pure C example to calling `esp_wifi` and `esp_netif` from C++ code with the TF Lite template. Outside of working through those errors I spent some time looking at the difference between the models score on words and when the model sets `new_command` to `true`. Outside of one last http bug I think the project is ready for a `0.1` version status. After that I'm going to take stock of what all I've learned, the gaps I identified in my knowledge and figure out what to hack on next.

    Project code continues to be updated and made available in this repo.

    I'll also have new post up on my blog in the next few days detailing a couple other topics wrapping up this learning adventure.

  • End of the week progress update

    Alexander Hagerman03/22/2020 at 22:18 0 comments

    Last week I was able to get some work done on the ESP-EYE and to start generating synthetic voice data. This week I focused in on the voice model itself, and that's been an adventure. Tensorflow has an example micro_speech project that I figured I would be able to use as a starting point. Working through the demo exposed some quirks such as the code relying on TF 1.x modules while the micro libraries are in TF 2.x. Also the first time through the micro_speech demo it didn't pick up audio on the eye. That lead me to consider alternatives where I found the ESP Skainet project. Eventually I made my way back to the ESP WHO application which the EYE comes flashed with. I spent some time with esp-idf and who to get that back to working on the board, partially to confirm the microphone was picking up good since TF didn't get any audio input when I flashed that. Along with that adventure I dug into esp-adf and esp-sr a bit to see if those might be better options. While both are interesting the sr component lacks a training step (which is great for projects, but not one where I'm working to learn more about training :D ) and they add new complexities. They're starred and marked for revisit another day.

    Eventually I came back to Tensorflow, blew away the local repo, did a fresh clone, new python venv and went back through micro_speech. This time the audio was picked up and this gave me a good starting point. I started to work on modifying the code to load my custom model, but started to run into some issues. Instead of digging in I thought it was a good time to back track through the week. I have made a lot of notes that will turn into future post. I've created some shell alias commands to assist in my esp workflow, and I turned a lot of separate data generation, transformation and training steps into a series of shell scripts.

    I also decided to train a new model taking "hi" as my synthetic word, and "on" from the Tensorflow prelabeled keywords. This gives me the same dimensions as the micro_speech demo so I can easily load this model onto the eye and see what kind of performance I get with the synthetic word recognition. If that goes well then the next step will be training the model with both synthetic words then working back through the TF Micro docs to setup loading of the model on the EYE at which point the last thing to do will be wiring the HTTP POST (found in /sig) on the right series of words. If the synthetic words don't perform well then I'll debate recording words modifying this project to run locally, or using the commands available to train a new model and change the word detection.

    The code updates for training, the esp-eye projects and portal updates are available here.

  • Creating words

    Alexander Hagerman03/19/2020 at 01:00 0 comments

    After getting the display and worker up and running I started down the path of training my model for keyword recognition. Right now I've settled on the wake words `Hi Smalltalk`. After the wake word is detected the model will then detect `silence`, `on`, `off`, or `unknown`.

    My starting point for training the model was the [`micro_speech`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech) and [`speech_commands`](https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/sequences/audio_recognition.md) tutorials that are part of the Tensorflow project. One of the first things I noticed while planning out this step was the lack of good wake words in the speech command dataset. There are [many](https://github.com/jim-schwoebel/voice_datasets) voice datasets available online, but many are unlabeled or conversational. Since digging didn't turn up much in the way of open labeled word datasets I decided to use `on` and `off` from the speech commands [dataset](https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html) since that gave me a baseline for comparison with my custom words. After recording myself saying `hi` and `smalltalk` less then ten times I knew I did not want to generate my own samples at the scale of the other labeled keywords.

    Instead of giving up on my wake word combination I started digging around for options and found an interesting [project](https://github.com/JohannesBuchner/spoken-command-recognition) where somebody had started down the path of generating labeled words with text to speech. After reading through the repo I ended up using [espeak](http://espeak.sourceforge.net/) and [sox](http://sox.sourceforge.net/) to generate my labeled dataset.
    The first step was to generate the [phonemes](https://en.wikipedia.org/wiki/Phoneme) for the wake words:

    $ espeak -v en -X smalltalk
     sm'O:ltO:k
    


    I then stored the phoneme in a word file that will be used by `generate.sh`.

    $ cat words
    hi 001 [[h'aI]]
    busy 002 [[b'Izi]]
    free 003 [[fr'i:]]
    smalltalk 004 [[sm'O:ltO:k]]

    After modifying `generate.sh` from the spoken command repo (eliminating some extra commands and extending the loop to generating more samples) I had everything I needed to synthetically generate a new labeled word dataset.

    #!/bin/bash
    # For the various loops the variable stored in the index variable
    # is used to attenuate the voices being created from espeak.
    
    lastwordid=""
    
    cat words | while read word wordid phoneme
    
    do
        echo $word
        mkdir -p db/$word
    
        if [[ $word != $lastword ]]; then
            versionid=0
        fi
    
        lastword=$word
    
        # Generate voices with various dialects
        for i in english english-north en-scottish english_rp english_wmids english-us en-westindies
        do         # Loop changing the pitch in each iteration
            for k in $(seq 1 99)
            do
                # Change the speed of words per minute
                for j in 80 100 120 140 160; do                 
                    echo $versionid "$phoneme" $i $j $k
                    echo "$phoneme" | espeak -p $k -s $j -v $i -w db/$word/$versionid.wav
                    # Set sox options for Tensorflow
                    sox db/$word/$versionid.wav -b 16 --endian little db/$word/tf_$versionid.wav rate 16k
                    ((versionid++))
                done
            done
        done
    done

    After the run I have samples and labels with a volume comparable to the other words provided by Google. The pitch, speed and tone of voice changes with each loop which will hopefully provide enough variety to make this dataset useful in training. Even if this doesn't work out learning about `espeak` and `sox` was interesting. I've already got some future ideas on how to use those. If it does work the ability to generate training data on demand seems incredibly useful.

    Next up, training the model and loading to the ESP-EYE. The code, docs, images etc for the project can be found [here](https://git.sr.ht/~n0mn0m/on-air) and I'll be posting updates as I continue along to [HackadayIO](https://hackaday.io/project/170228-on-air) and this blog. If you have any questions or ideas reach [out](mailto:alexander@unexpextedeof.net...

    Read more »

  • Weekend update

    Alexander Hagerman03/09/2020 at 20:16 0 comments

    Over the weekend I had some time to work with the ESP-EYE and start talking with my signal endpoint. For now it's not "smart" but I have a button that will let me set the signal status manually, and the PyPortal updates accordingly. This at least proves out the MVP of the data flow between systems, and got me more comfortable with the ESP-IDF tools and libraries.

    One bump I ran into was a `RunTimeError` from the ESP32 on the PyPortal. For now I'm using the circuitpython supervisor module to reload when this happens. Since it's a read only operation the only idiosyncrasy is that the screen loads green for the default background, then switches on the status fetch if the endpoint indicates busy. I may remove loading a default background to prevent this, and longer term look into what's happening with the ESP32 on the PyPortal.

    Next up is training my own custom speech model and running the keyword detection model on the ESP-EYE.

    Source available.

  • The fun begins

    Alexander Hagerman03/06/2020 at 02:17 0 comments

    So far on my blog I have documented the initial work for my Train all the Things project. With the initial research and setup done I'm off into the unknown. I got my ESP-EYE last week and I've been able to setup the esp-idf tool chain. So far I'm liking it. I don't feel locked in to certain editors and tools like I have with other boards. I was able to get it to work as a station and doing some basic http calls. This weekend/next week I plan to have it sending status signals to my endpoint and figure out any TLS road bumps that may be hiding. After that I should be able to solely focus on Tensorflow. I've done a little bit of early model training and testing. So far things are promising, and it helps that this is just for me, so if it over trains to my voice that's not a problem like it would be in many real world applications (although I will try to avoid that). I am curious if I'll have the HTTP POST being sent inside of my model code in the FreeRTOS task, or if I'll be able to setup a different TaskHandler in FreeRTOS and message pass. That seems to be one of the bigger unknowns to me so far, and while FreeRTOS is really interesting it's a whole new thing, with a lot to learn.

    So far so good, stay tuned and feel free to reach out.

View all 5 project logs

View all instructions

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates