SODA finally landed, client working

UPD: wine is not necessary anymore.

So SODA finally landed, sort of, and for a couple weeks already apparently. I've been on the lookout for the Linux library, since that is my preferred environment and I was under the impression that the development was taking place on that platform. But I was wrong, and the Windows and macOS libraries were available since late November.

Since I'm much more capable on a Linux machine, I've searched (and found!) a way to use either one of those available libraries. In my last post I reported on quite a successful project with the Google TTS library, which resulted in a very lightweight client for it. And fortunately the same can be said for the SODA client, resulting in a very small code base with only the library as dependency. This enabled me to work with wine, and have it pipe the data straight from whatever Linux application I wanted to use to the Windows DLL.

Just issue the following command:

$ ecasound -f:16,1,16000 -i alsa -o:stdout | wine gasr.exe

and watch your conversations roll over the screen:

W1215 22:58:43.683654      44 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
>>> hello
>>> hello from
>>> hello from
>>> hello from sod
>>> hello from soda
>>> hello from soda
>>> final: hello from soda

The SODA client I wrote is developed in a separate repository (gasr), as it will be mostly just a tool to do the full reverse engineering of the RNN and transducer. But having an actual working implementation will greatly improve my ability to figure out the inner workings of the models.

Using wine as an intermediate is still far from ideal, but I guess that the Linux library will also pop up soon considering ChromeOS would depend on it.

UPDATE:

As @a1is pointed out, the Linux library is also out there already, so no need to go the wine way anymore. And as an added bonus, the GBoard models are working with these libraries as well! That opens up a whole world of experimentation, since there are already quite a few of those spotted in the wild!

UPDATE2:

Now with a python client in the repo, for easier integration with home automation and such.

Discussions

Prateek Xaxa wrote 02/23/2022 at 15:35

Hi @biemster , I'm using the chrome Version 98.0.4758.102 (Official Build) (64-bit)
and the "SODA.dll" got the APIs changed.

Is there a way I can get the older version of google chrome or the DLLs ? Thanks

Are you sure? yes | no

tamburini.fabio wrote 11/29/2021 at 15:39

This project is exactly what I am looking for, white compliments!

I downloaded and installed google-chrome-stable_90.0.4430.72-1 for Linux, but I did not find any "libsoda.so" anywhere. I am pretty sure that, once getting the library, I will be able to do the job, but... Is there an older 90 version than that available online?
Please, give me some pointer in finding the library... :P
Thanks!

Are you sure? yes | no

Khaled wrote 04/13/2021 at 03:12

I have worked on patching the dll but I am getting the follwing :

W0413 03:04:16.243065 6984 soda_async_impl.cc:260] Creating soda_impl
E0413 03:04:16.321433 6984 soda_impl.cc:258] SODA needs a positive sample rate in mics audio format.
E0413 03:04:16.327124 6984 descriptor.cc:4079] Invalid proto descriptor for file "":
E0413 03:04:16.327283 6984 descriptor.cc:4082] : Missing field: FileDescriptorProto.name.
F0413 03:04:16.328347 6984 generated_message_reflection.cc:3158] Check failed: file != nullptr
*** Check failure stack trace: ***
@ 00007FFD5B41672F (unknown)
@ 00007FFD5B415FE0 (unknown)
@ 00007FFD5B416CAB (unknown)
@ 00007FFD5B23C440 (unknown)
@ 00007FFD5B23E487 (unknown)
@ 00007FFD5B23C1D7 (unknown)
@ 00007FFD5B13FA92 (unknown)
@ 00007FFD58C90B30 (unknown)
@ 00007FFD58C83198 (unknown)
@ 00007FFD58C82B85 (unknown)
@ 00007FFD58C834F4 (unknown)
@ 00007FFD58C83328 (unknown)
@ 00007FFD58C8565F (unknown)
@ 00007FFDA8434461 (unknown)
@ 00007FFDA843418D (unknown)
@ 00007FFDA8434042 (unknown)
@ 00007FFDA5862CC5 (unknown)
@ 00007FFDA5862A27 (unknown)
@ 00007FFDA586267C (unknown)
@ 00007FFD5C6F0735 (unknown)
@ 00007FFD5C6EE0D4 (unknown)
@ 00007FFD5C6E95A3 (unknown)
@ 00007FFD5C6AB815 (unknown)
@ 00007FFD5C69B49B (unknown)
@ 00007FFD5C69B3F9 (unknown)
@ 00007FFD5C69B11A (unknown)
@ 00007FFD5C69B09A (unknown)
@ 00007FFD5C72E4AF (unknown)
@ 00007FFD5C72E11C (unknown)
@ 00007FFD5C72EBF3 (unknown)
@ 00007FFD5C730860 (unknown)
@ 00007FFD5C730E7E (unknown)

I am using the python wrapper. Is this because I didn't bypass the call stack verification or there is something else I am missing.

Are you sure? yes | no

biemster wrote 04/14/2021 at 14:42

I've never seen this error before, there seems to be something wrong with the SodaConfig you are feeding it. This likely has to do with the way you patched it, but that's a wild guess actually.

Are you sure? yes | no

haroldfinch wrote 02/26/2021 at 07:39

First of all, you are a genius man! Great work ♥

I followed your instructions on GitHub (I'm using windows)

Downloaded chrome canary build, got the soda.dll file and soda models folder in the project directory

Used snowman with IDA disassembler on the DLL file, to get the API key( i have no idea if this is how we get it or how to locate the key, just picked the first key like string I could find xD )

Ran the python file, but getting the following error:

W0226 07:26:04.505873 16308 soda_async_impl.cc:261] Creating soda_impl
W0226 07:26:04.692540 16308 terse_processor.cc:278] TISID disabled.
W0226 07:26:04.704322 16308 soda_async_impl.cc:420] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
Traceback (most recent call last):
2 got <gasr.LP_c_byte object at 0x000002A5D0DF7CC0>
File "app.py", line 8, in <module>
client.start()
File "C:\Users\Username\Desktop\My Projects\google_stt\gasr\gasr.py", line 39, in start
self.sodalib.ExtendedSodaStart(self.handle)
OSError: exception: access violation reading 0x00000000D0EC6D98

Am I in the right direction with the IDA thingy? Any help is appreciated.

Are you sure? yes | no

biemster wrote 02/27/2021 at 21:17

You're definitely on the right track here, nicely done! However, that key like string is not the actual API key. And even if you did manage to deduce the correct key somehow (I couldn't), there is still a call stack verification in the library that will prevent you from running it (the library wants to be called only by a chrome process). So you should find these checks using your disassembler (there are three in a row), and patch the binary so those if statements are either skipped, or rendered non-functional (use something like a NOP sled or similar). And don't forget to init the result of those checks to true, since it is initialized to false in the original library.

That said, I never tested the python wrapper on Windows, so you might run in to additional issues. If possible it might be best to start testing with the C code using MinGW.

Good luck!

Are you sure? yes | no

woopdio wrote 12/31/2020 at 00:11

Do you happen to know if the api key is the same between platforms and does ChromeOS already include SODA?

I'm on Mac OSX/Hackintosh and after a lot of trial and error I've got the Soda and en_us model components downloaded and the Verify calls patched out of libsoda so I think I'm only missing the api key at this point hopefully.

I tried getting it from the .so alone but my reverse engineering knowledge turned out to be too limited for that so far so I thought getting it from the actual call via debugger would be easier but it doesn't look like the Mac Chrome Canary build has it yet. So I'm just wondering if it would be even worth to set up a ChromeOS VM and tracing in it to get it from there or if I should just wait for the Mac release.

Also thank you for this project and sticking with it!

Are you sure? yes | no

woopdio wrote 01/19/2021 at 21:18

Canary 90 out now still without MacOS Soda it seems :(

time to wait some more I guess

Are you sure? yes | no

biemster wrote 01/21/2021 at 14:03

Soda on MacOS is out there for a while as far as I know, just behind some experimental flags. The api key check should be skipped the same way as the other verification calls, did you maybe forget to init the result of the 3 checks to 'true'?

I think it should be in ChromeOS as well already, but that will not help you on osx since the binaries are not compatible. (plus I actually did not find the chromeos lib either yet)

Are you sure? yes | no

Jude Ashly wrote 12/25/2020 at 17:38

That's an incredible amount of persistence and hard work put in !!! I need some help, I'm stuck with lagging issues here

ecasound -f:16,1,16000 -i alsa -o:stdout | ./gasr
********************************************************************************
* ecasound v2.9.3 (C) 1997-2020 Kai Vehmanen and others
********************************************************************************
(eca-chainsetup) Chainsetup "untitled-chainsetup"
(eca-chainsetup) NOTE: Real-time configuration, but insufficient privileges to utilize real-time scheduling (SCHED_FIFO). With small buffersizes, this may cause audible glitches during processing.
(eca-chainsetup) "rt" buffering mode selected.
(eca-chainsetup) Opened input "alsa", mode "read". Format: s16_le, channels 1, srate 16000, interleaved.
(audioio-raw) Outputting to standard output [rw].
(eca-chainsetup) Opened output "stdout", mode "read/write (update)". Format: s16_le, channels 1, srate 16000, interleaved.
[* Connected chainsetup: "untitled-chainsetup" *]
[* Controller/Starting batch processing *]
[* Engine - Driver start *]
WARNING: Logging before InitGoogle() is written to STDERR
W1225 10:35:31.357862 49008 soda_async_impl.cc:231] Creating soda_impl
I1225 10:35:31.357999 49008 soda_impl.cc:275] Maximum audio history (ms): 30000
I1225 10:35:31.358021 49008 soda_impl.cc:304] Adding Resampler from 16000 to 16000
I1225 10:35:31.358100 49008 soda_impl.cc:482] Enabling power evaluator.
I1225 10:35:31.358103 49008 soda_impl.cc:492] Adding preamble processor.
I1225 10:35:31.358106 49008 soda_impl.cc:512] Enabling On Device ASR
W1225 10:35:31.358137 49008 language_pack_utils.cc:103] Error reading from ./SODAModels/configs: error 'No such file or directory' while opening directory './SODAModels/configs': No such file or directory
I1225 10:35:31.358155 49008 terse_processor.cc:634] Config file: ./SODAModels/dictation.config
I1225 10:35:31.358282 49008 terse_processor.cc:163] Loaded PipelineDef.
I1225 10:35:31.358297 49008 dir_path.cc:52] Checking FileExists: ./ep
I1225 10:35:31.358303 49008 dir_path.cc:57] Not Found FileExists: ./ep
I1225 10:35:31.358306 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/ep
I1225 10:35:31.358313 49008 dir_path.cc:54] Found FileExists: ./SODAModels/ep
I1225 10:35:31.358318 49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.358575 49008 dir_path.cc:52] Checking FileExists: ./ep_mean_stddev
I1225 10:35:31.358595 49008 dir_path.cc:57] Not Found FileExists: ./ep_mean_stddev
I1225 10:35:31.358599 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/ep_mean_stddev
I1225 10:35:31.358605 49008 dir_path.cc:54] Found FileExists: ./SODAModels/ep_mean_stddev
I1225 10:35:31.358643 49008 dir_path.cc:52] Checking FileExists: ./syms
I1225 10:35:31.358651 49008 dir_path.cc:57] Not Found FileExists: ./syms
I1225 10:35:31.358655 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/syms
I1225 10:35:31.358662 49008 dir_path.cc:54] Found FileExists: ./SODAModels/syms
I1225 10:35:31.358807 49008 dir_path.cc:52] Checking FileExists: ./embedded_fix_ampm.mfar
I1225 10:35:31.358815 49008 dir_path.cc:57] Not Found FileExists: ./embedded_fix_ampm.mfar
I1225 10:35:31.358820 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_fix_ampm.mfar
I1225 10:35:31.358826 49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_fix_ampm.mfar
I1225 10:35:31.358892 49008 dir_path.cc:52] Checking FileExists: ./embedded_class_denorm.mfar
I1225 10:35:31.358901 49008 dir_path.cc:57] Not Found FileExists: ./embedded_class_denorm.mfar
I1225 10:35:31.358905 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_class_denorm.mfar
I1225 10:35:31.358911 49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_class_denorm.mfar
I1225 10:35:31.358959 49008 dir_path.cc:52] Checking FileExists: ./embedded_normalizer.mfar
I1225 10:35:31.358968 49008 dir_path.cc:57] Not Found FileExists: ./embedded_normalizer.mfar
I1225 10:35:31.358972 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_normalizer.mfar
I1225 10:35:31.358978 49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_normalizer.mfar
I1225 10:35:31.359079 49008 dir_path.cc:52] Checking FileExists: ./embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359088 49008 dir_path.cc:57] Not Found FileExists: ./embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359093 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359101 49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359147 49008 dir_path.cc:52] Checking FileExists: ./offensive_word_normalizer.mfar
I1225 10:35:31.359155 49008 dir_path.cc:57] Not Found FileExists: ./offensive_word_normalizer.mfar
I1225 10:35:31.359158 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/offensive_word_normalizer.mfar
I1225 10:35:31.359164 49008 dir_path.cc:54] Found FileExists: ./SODAModels/offensive_word_normalizer.mfar
I1225 10:35:31.359206 49008 dir_path.cc:52] Checking FileExists: ./enc0
I1225 10:35:31.359213 49008 dir_path.cc:57] Not Found FileExists: ./enc0
I1225 10:35:31.359216 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/enc0
I1225 10:35:31.359221 49008 dir_path.cc:54] Found FileExists: ./SODAModels/enc0
I1225 10:35:31.359225 49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.381130 49008 dir_path.cc:52] Checking FileExists: ./enc1
I1225 10:35:31.381161 49008 dir_path.cc:57] Not Found FileExists: ./enc1
I1225 10:35:31.381164 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/enc1
I1225 10:35:31.381168 49008 dir_path.cc:54] Found FileExists: ./SODAModels/enc1
I1225 10:35:31.381170 49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.458703 49008 dir_path.cc:52] Checking FileExists: ./dec
I1225 10:35:31.458773 49008 dir_path.cc:57] Not Found FileExists: ./dec
I1225 10:35:31.458781 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/dec
I1225 10:35:31.458795 49008 dir_path.cc:54] Found FileExists: ./SODAModels/dec
I1225 10:35:31.458802 49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.488069 49008 dir_path.cc:52] Checking FileExists: ./joint
I1225 10:35:31.488111 49008 dir_path.cc:57] Not Found FileExists: ./joint
I1225 10:35:31.488116 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/joint
I1225 10:35:31.488123 49008 dir_path.cc:54] Found FileExists: ./SODAModels/joint
I1225 10:35:31.488127 49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.489495 49008 dir_path.cc:52] Checking FileExists: ./input_mean_stddev
I1225 10:35:31.489523 49008 dir_path.cc:57] Not Found FileExists: ./input_mean_stddev
I1225 10:35:31.489528 49008 dir_path.cc:52] Checking FileExists: ./SODAModels/input_mean_stddev
I1225 10:35:31.489534 49008 dir_path.cc:54] Found FileExists: ./SODAModels/input_mean_stddev
I1225 10:35:31.489595 49008 terse_processor.cc:173] Initialized ResourceManager.
I1225 10:35:31.489660 49008 terse_processor.cc:184] Initialized GoogleRecognizer.
W1225 10:35:31.489709 49008 terse_processor.cc:242] TISID disabled.
I1225 10:35:31.489715 49008 terse_processor.cc:718] Domain: CAPTION
E1225 10:35:31.490875 49008 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1225 10:35:31.492369 49008 terse_processor.cc:1293] Resetting Terse Processor
I1225 10:35:31.492396 49008 terse_processor.cc:838] Cancelling session.
W1225 10:35:31.492675 49008 decoder_endpointer_stream.cc:35] Acoustic ep reader thread cancelled.
I1225 10:35:31.492849 49008 terse_processor.cc:755] Setup completed
I1225 10:35:31.492860 49008 soda_impl.cc:558] Server ASR Disabled
I1225 10:35:31.492867 49008 soda_impl.cc:606] Initializing audio logger
W1225 10:35:31.492877 49008 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
I1225 10:35:31.492881 49008 soda_async_impl.cc:577] Session parameters updated. Reconfiguring SODA.
I1225 10:35:31.696233 49013 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1225 10:35:31.701219 49013 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
W1225 10:35:32.685130 49013 lag_detector.cc:30] Pipeline lagging by 827 ms. Continue processing samples.
W1225 10:35:33.138113 49013 lag_detector.cc:30] Pipeline lagging by 1060 ms. Continue processing samples.
W1225 10:35:33.505378 49013 lag_detector.cc:30] Pipeline lagging by 1227 ms. Continue processing samples.
W1225 10:35:33.871842 49013 lag_detector.cc:30] Pipeline lagging by 1393 ms. Continue processing samples.
I1225 10:35:33.963726 49013 soda_async_impl.cc:1090] Forcing a SODA sync to get the final event and reset ASR.
I1225 10:35:33.963874 49013 terse_processor.cc:1046] Flushing pending events ..
I1225 10:35:33.964031 49013 terse_processor.cc:1063] Start longform loop on remaining audio: 1.16s
I1225 10:35:33.965874 49031 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1225 10:35:33.966170 49028 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1225 10:35:33.989725 49027 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1225 10:35:34.008353 49026 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1225 10:35:34.123794 49030 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1225 10:35:34.123921 49033 terse_processor.cc:469] Final recognition has been created.
I1225 10:35:34.123928 49013 terse_processor.cc:1558] Longform resets session because secs in this session are: 0.84
I1225 10:35:34.123950 49013 terse_processor.cc:838] Cancelling session.
I1225 10:35:34.123941 49033 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
W1225 10:35:34.124055 49013 log_creator-internal.cc:423] Failed to merge results for logging:
UNKNOWN: Result times overlap [type.googleapis.com/util.ErrorSpacePayload='SpeechErrorSpace::SpeechError(-73560)']
E1225 10:35:34.124817 49013 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1225 10:35:34.129116 49071 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1225 10:35:34.129125 49068 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1225 10:35:34.159723 49067 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1225 10:35:34.196678 49066 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1225 10:35:34.269877 49069 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1225 10:35:34.269959 49072 terse_processor.cc:469] Final recognition has been created.
I1225 10:35:34.269974 49072 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
I1225 10:35:34.269974 49013 terse_processor.cc:1088] Stop looping and end session. Audio left: 50ms
I1225 10:35:34.270332 49013 soda_impl.cc:1035] Got pipeline signal out
I1225 10:35:34.270389 49013 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1225 10:35:34.270869 49013 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
W1225 10:35:34.272062 49013 lag_detector.cc:30] Pipeline lagging by 1614 ms. Continue processing samples.
W1225 10:35:34.601753 49013 lag_detector.cc:30] Pipeline lagging by 1743 ms. Continue processing samples.
^C[* Controller/Batch processing finished (0) *]
[* Engine exiting *]
(eca-control-objects) Disconnecting chainsetup: "untitled-chainsetup".

Are you sure? yes | no

biemster wrote 12/26/2020 at 14:49

Thanks! It's always nice to be rewarded with a working end product after such a long project :)

From your log I see that you're using a model with a 'dictation.config', so I guess you're trying a gboard model? (I did not know that libsoda would automatically pick up those though, so no need for symlinking anymore :)).

When I get a lagging pipeline it is usually because I'm calling AddAudio with the wrong parameters, maybe your len parameter is off by a factor 2? Maybe this specific model requires 32 bit input, or 8 bit?

It would help if you'd specify which RNNT model you're trying to run.

Are you sure? yes | no

Jude Ashly wrote 12/26/2020 at 17:00

I've changed chuck_size to 1024. Now my log looks like this

77857 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:02.418223 77857 terse_processor.cc:1293] Resetting Terse Processor
I1226 09:55:02.418240 77857 terse_processor.cc:838] Cancelling session.
W1226 09:55:02.418537 77857 decoder_endpointer_stream.cc:35] Acoustic ep reader thread cancelled.
I1226 09:55:02.418698 77857 terse_processor.cc:755] Setup completed
I1226 09:55:02.418710 77857 soda_impl.cc:558] Server ASR Disabled
I1226 09:55:02.418720 77857 soda_impl.cc:606] Initializing audio logger
W1226 09:55:02.418736 77857 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
I1226 09:55:02.418743 77857 soda_async_impl.cc:577] Session parameters updated. Reconfiguring SODA.
I1226 09:55:02.461880 77862 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1226 09:55:02.462976 77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:03.161415 77862 soda_async_impl.cc:1090] Forcing a SODA sync to get the final event and reset ASR.
I1226 09:55:03.161525 77862 terse_processor.cc:1046] Flushing pending events ..
I1226 09:55:03.161595 77862 terse_processor.cc:1063] Start longform loop on remaining audio: 1.16s
I1226 09:55:03.162907 77875 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1226 09:55:03.163285 77882 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1226 09:55:03.196991 77878 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1226 09:55:03.232416 77880 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1226 09:55:03.341455 77877 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1226 09:55:03.341546 77876 terse_processor.cc:469] Final recognition has been created.
I1226 09:55:03.341556 77862 terse_processor.cc:1558] Longform resets session because secs in this session are: 0.84
I1226 09:55:03.341573 77876 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
I1226 09:55:03.341574 77862 terse_processor.cc:838] Cancelling session.
W1226 09:55:03.341698 77862 log_creator-internal.cc:423] Failed to merge results for logging:
UNKNOWN: Result times overlap [type.googleapis.com/util.ErrorSpacePayload='SpeechErrorSpace::SpeechError(-73560)']
E1226 09:55:03.342467 77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:03.346847 77884 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1226 09:55:03.346826 77891 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1226 09:55:03.380860 77886 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1226 09:55:03.404026 77887 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1226 09:55:03.479328 77888 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1226 09:55:03.479369 77885 terse_processor.cc:469] Final recognition has been created.
I1226 09:55:03.479385 77885 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
I1226 09:55:03.479379 77862 terse_processor.cc:1088] Stop looping and end session. Audio left: 50ms
I1226 09:55:03.479728 77862 soda_impl.cc:1035] Got pipeline signal out
I1226 09:55:03.479788 77862 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1226 09:55:03.480237 77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:03.799547 77862 soda_async_impl.cc:1090] Forcing a SODA sync to get the final event and reset ASR.
I1226 09:55:03.799596 77862 terse_processor.cc:1046] Flushing pending events ..
I1226 09:55:03.799660 77862 terse_processor.cc:1063] Start longform loop on remaining audio: 1.33s
I1226 09:55:03.800333 77862 terse_processor.cc:1088] Stop looping and end session. Audio left: 1.33s
I1226 09:55:03.800760 77895 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1226 09:55:03.801030 77897 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1226 09:55:03.817305 77900 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1226 09:55:03.850583 77893 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1226 09:55:04.015813 77894 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1226 09:55:04.015937 77899 terse_processor.cc:469] Final recognition has been created.
I1226 09:55:04.015959 77899 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
W1226 09:55:04.016060 77862 log_creator-internal.cc:423] Failed to merge results for logging:
UNKNOWN: Result times overlap [type.googleapis.com/util.ErrorSpacePayload='SpeechErrorSpace::SpeechError(-73560)']
I1226 09:55:04.016357 77862 soda_impl.cc:1035] Got pipeline signal out
I1226 09:55:04.016403 77862 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1226 09:55:04.016863 77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
^C[* Controller/Batch processing finished (0) *]

Im using lp_rnnt-20181012 model.

Are you sure? yes | no

biemster wrote 12/26/2020 at 17:04

I see a couple deprecation errors, do you need to use this specific (old) model? There are some much better new ones out there.

Are you sure? yes | no

0xEb0 wrote 12/17/2020 at 19:35

Thanks a lot for sharing!

On gasr repo, you say that there's also a fr_fr model, but i can't get my hands on it.

Latests GBoard/Recorder APK (from APKMirror) seems to reference en_us package only.
Some hint perhaps ? :)

P.S. Am I on the right track xxx_cb8c332f_8612b6f5_392665af_xxx ?

Are you sure? yes | no

biemster wrote 12/18/2020 at 13:40

You're definitely on the right track! The block around that ctx needs a little nudge. The link to the fr_fr model was added by accident to a gboard superpack json, did you figure out how to search for those?

Are you sure? yes | no

0xEb0 wrote 12/19/2020 at 14:33

I thought I figured out how to get them, but the fr_fr one seems well hidden. It may be a specific/unique GBoard version. I tried (randomly) several from APKMirror

What I do : GBoard APKs > apktool > grep superpacks-manifests > get some JSON files that lead me to URLs xxx/en_us/ondevice_recognizer / lp_rnnt-<date>.zip.

But these are always en_us / 2019 versions.

Are you sure? yes | no

biemster wrote 12/19/2020 at 20:37

@0xEb0 try building a script that sweeps all the <dates>, you'll find some nice surprises! One of those will be significantly better than what's included in soda, or the 2019 gboard model.

Are you sure? yes | no

Abraham Devos wrote 12/17/2020 at 15:50

This is amazing progress!

Followed your footsteps on GitHub, gtts works great (confirmed) ; gasr does not (see below).

Managed to download libsoda & us-en model.

Biemster, can it be my versions are off vs. your setup (mine: SODA 0.08, SODAModel us-en 0.04).

I noticed on gtts one minor version difference meant working/not working.

W1217 14:34:36.006657 6252 soda_async_impl.cc:231] Creating soda_impl
W1217 14:34:36.084269 6252 terse_processor.cc:242] TISID disabled.
W1217 14:34:36.108929 6252 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
W1217 14:34:36.135319 11444 soda_async_impl.cc:765] Soda session stopped due to: STOP_CALLED
E1217 14:34:36.136378 6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.170122 6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.170666 6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.185747 6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.187252 6252 mapped-file.cc:44] Failed to unmap region: 0
W1217 14:34:36.187736 6252 soda_async_impl.cc:793] Deleting soda_impl

Are you sure? yes | no

biemster wrote 12/17/2020 at 17:42

Thanks! Your soda versions are the same as mine, but most likely the api key and call stack verification is blocking you now. Time to whip out your disassembler!

Are you sure? yes | no

a1is wrote 12/17/2020 at 00:47

linux libsoda.so already avalible, but i'am stuck with quest for searching api key..
can you give more info?...

Are you sure? yes | no

biemster wrote 12/17/2020 at 10:04

Seriously you found the Linux library? I still don't see it when I use the extension tools, are you sure it's not the placeholder file you found? As for the API key, this is where the project enters a gray area, which I outlined in the code repo:

https://github.com/biemster/gasr/issues/1

The google speech team is not fond of "unauthorized repurposing" of their work, which is understandable.

Are you sure? yes | no

goddade wrote 12/16/2020 at 10:50

Significant progress!

Can you tell me how to get libsoda file?

I searched the chrome directory, but did not find libsoda.

Are you sure? yes | no

biemster wrote 12/16/2020 at 17:35

I've just now addressed this in an issue in the gasr repository on github, but can't go into too much detail unfortunately as the speech team at Google does not want this to be repurposed. So I can't put it here in public.

Are you sure? yes | no

ChromeVox Next offline TTS client, a sister project

Discussions

Become a Hackaday.io Member