Close

End of the week progress update

A project log for On Air

Keyword activated status display

alexander-hagermanAlexander Hagerman 03/22/2020 at 22:180 Comments

Last week I was able to get some work done on the ESP-EYE and to start generating synthetic voice data. This week I focused in on the voice model itself, and that's been an adventure. Tensorflow has an example micro_speech project that I figured I would be able to use as a starting point. Working through the demo exposed some quirks such as the code relying on TF 1.x modules while the micro libraries are in TF 2.x. Also the first time through the micro_speech demo it didn't pick up audio on the eye. That lead me to consider alternatives where I found the ESP Skainet project. Eventually I made my way back to the ESP WHO application which the EYE comes flashed with. I spent some time with esp-idf and who to get that back to working on the board, partially to confirm the microphone was picking up good since TF didn't get any audio input when I flashed that. Along with that adventure I dug into esp-adf and esp-sr a bit to see if those might be better options. While both are interesting the sr component lacks a training step (which is great for projects, but not one where I'm working to learn more about training :D ) and they add new complexities. They're starred and marked for revisit another day.

Eventually I came back to Tensorflow, blew away the local repo, did a fresh clone, new python venv and went back through micro_speech. This time the audio was picked up and this gave me a good starting point. I started to work on modifying the code to load my custom model, but started to run into some issues. Instead of digging in I thought it was a good time to back track through the week. I have made a lot of notes that will turn into future post. I've created some shell alias commands to assist in my esp workflow, and I turned a lot of separate data generation, transformation and training steps into a series of shell scripts.

I also decided to train a new model taking "hi" as my synthetic word, and "on" from the Tensorflow prelabeled keywords. This gives me the same dimensions as the micro_speech demo so I can easily load this model onto the eye and see what kind of performance I get with the synthetic word recognition. If that goes well then the next step will be training the model with both synthetic words then working back through the TF Micro docs to setup loading of the model on the EYE at which point the last thing to do will be wiring the HTTP POST (found in /sig) on the right series of words. If the synthetic words don't perform well then I'll debate recording words modifying this project to run locally, or using the commands available to train a new model and change the word detection.

The code updates for training, the esp-eye projects and portal updates are available here.

Discussions