Finding the models

A project log for Android offline speech recognition natively on PC

Porting the Android on-device speech recognition found in GBoard to TensorFlow Lite or LWTNN

biemsterbiemster 03/15/2019 at 09:550 Comments

The update to the on-device speech recognition comes as an option to GBoard, called "Faster voice typing" but is only available on Pixel phones as of now. I downloaded the latest version of the GBoard app, extracted it with apktool and started grepping for word like "faster" and "ondevice".

After a while the following link came up during my searches:

Following this link presented me with a small json file with a single link to a zip file of 82 MB containing the following files:

Well this looks promising! The size is about correct as mentioned in the blog post, and there seem to be two encoders, a joint and a decoder, just like in the described model.

More things that can be speculated are:

The encoder network is supposed to be four times as large as the prediction network, which is called the decoder. In the file list the 'enc1' file is about four times the 'dec' file, so my guess is the 'dec' file is the predictior network and the 'enc1' is the encoder on the bottom of the diagram. The 'joint' file is almost certain the Joint Network in the middle, and that would leave the 'enc0' file being the Softmax layer on top.

Fortunately the dictation.config file seems to specify certain parameters on how to read all files listed here, so my focus will be on how to interpret this config file with some TensorFlow Lite loader.