Experiments with the endpointer

A project log for Android offline speech recognition natively on PC

Porting the Android on-device speech recognition found in GBoard to TensorFlow Lite or LWTNN

biemsterbiemster 03/26/2019 at 19:250 Comments

My focus at the moment is on the endpointer, because I can bruteforce its parameters for the signal processing a lot faster than when I use the complete dictation graph. I added a script to the github repo which should initialize it properly. I'm using a research paper which I believe details the endpointer used in the models as a guide, so I swapped to using log-Mel filterbank energies instead of the plain power spectrum as before.

I believe the endpointer net outputs two probabilities: p(speech) and p(non speech) as given in this diagram from the paper:

The results from the are still a bit underwhelming:

so some more experiments are needed. I'll update this log when there are more endpointer results.