Transcript: Hacking Voice UIs

Stephen Tranovich12:05 PM
Hello hello hello, everybody, and welcome to another epic Hack Chat. This week we're talking about hacking voice user interfaces with @Nadine !

Thom12:05 PM
Thank you for putting this chat together.

Stephen Tranovich12:05 PM
@Nadine can you start by giving us a little bit of background about yourself and why you want to share this topic with us?

Stephen Tranovich12:05 PM
You're welcome, @Thom !

Nadine12:06 PM
Sure! I'm a designer / technologist working out of Toronto. I've done quite a bit of installation work, prototyping and DIY game peripherals, and lately I've been combining that w/ voice interfaces.

Nadine12:07 PM
I'm really interested in how we can use voice in non-standard ways.

Nadine12:07 PM
ie: in more art-like contexts, or how they work in an installation context, or even just doing some DIY stuff with them vs just home automation.

Stephen Tranovich12:08 PM
Could you share with us some of the applications you've been hacking voice UIs into recently?

Nadine12:09 PM
Well right now, its mostly controlling different peripherals. so I've been looking at how to control stuff like printers, or making applications where maybe you have to bring the device specific objects (nfc). I've also started playing w/ some embedded versions of them, vs just the consumer devices.

Stephen Tranovich12:11 PM
Awesome stuff!

Stephen Tranovich12:11 PM
Let's dive into the community questions, shall we?

Nadine12:11 PM
Sure!

Stephen Tranovich12:12 PM
We'll start out with the first question thrown up by @Thom , even though you touched at the answer in the discussion section already. Can we adapt devices to speak to other devices, yet? UI to UI

Nadine12:14 PM
You can. Its a little futzy, but the devices to recognize their wake words / phrases if they are within ear shot of one another.

Nadine12:14 PM
I've used Siri to control a google home and its pretty bizarre.

Stephen Tranovich12:15 PM
What are the good use cases and drawbacks for connecting systems in this way?

Nadine12:15 PM
Also you can rope them into an on going circle by using something like Eliza, or writing your own skills to keep them answering one another.

Nadine12:15 PM
Hmm. drawbacks /cases...

Nadine12:16 PM
well one is just entertainment. They aren't that great at context, so its sorta like having parrots in the room. You could also have a use case of just trying to streamline your commands.

Nadine12:17 PM
So rather than having to double up your programs for diff devices, just have one queen bee. Drawback tho...it could just not work, or stop working depending on if the platform is updated (this happens a lot).

Stephen Tranovich12:17 PM
Hah, I love the parrot analogy.

Nadine12:18 PM
You could also try to trip them off programaticially through notifications sent from a skill, tho again, its kind of touch and go because notifications are still sorta new, and the big two really want you to use them in certain ways.

Nadine12:19 PM
hmm, i guess other drawbacks are just how many internet microphones do you want in your house? heh.

Stephen Tranovich12:21 PM
They leads well into our next question

Stephen Tranovich12:21 PM
Our next question has been asked in various ways by multiple members of the community, including @Andrew and @Paul Stoffregen : what options are there for people who want to use their own voice UI but aren't comfortable with putting an internet-connected microphone in their house? Any ideas or tips about doing voice recognition locally, without any internet connectivity?

Nadine12:23 PM
Yeah! you could try something like Pocket Sphinx. Its being developed by carnegie mellon and its meant for doing commands locally.

Nadine12:24 PM
I've tinkered with it, but I'll be honest, it was a bit of a pain to get going in a virtual environment, I might try w/ out one. As I think its interesting.

Nadine12:25 PM
Other things like Mycroft also work, but I've only seen it as a service, ATM. But if it exists as not a serivce I would give it a shot.

Nadine12:25 PM
Also Pocket Sphinx link: https://github.com/cmusphinx/pocketsphinx

Stephen Tranovich12:26 PM
Are those local or still internet connected? Open Source?

Nadine12:26 PM
the pocket sphinx one is open source and not internet connected. Mycroft is still connected AFAIK.

Stephen Tranovich12:28 PM
To build off this, @Thom asks: Is there a voice to text attribute in these devices so that speaking in my UI delivers the message to the IoT device instead of that device listening all the time or even needing to be there?

Nadine12:30 PM
There is if you build your own skills / agents. Because you're controlling the responses, you can also grab that response as text.

Nadine12:31 PM
But for default response at least w/ Alexa, they are still not providing a text response, which suuuuuuucks. google has started to do it w/ their embedded sdk.

Nadine12:31 PM
You can also write programs that will send cards, or display additional text on say, your phone. So there is a text component, but its not really straight forward which consistently bothers me.

Stephen Tranovich12:33 PM
Is this the sort of thing that keeps changing with frequent updates?

Nadine12:33 PM
yup.

Nadine12:34 PM
both platforms change _a lot_. I'm not even sure if some of my stuff will work in 3 months time. Its like a weird adventure.

Stephen Tranovich12:34 PM
That is an adventure!

Nadine12:35 PM
So I think in the near future doing more text related stuff w/ these devices will get better, but for now...crap shoot!

Stephen Tranovich12:36 PM
Do these updates seem to be moving each platform in a distinct direction, or at this stage do they feel pretty haphazard? As in, are these changes clear steps to new (and possibly distinct) functionality?

Nadine12:38 PM
The updates honestly feel like they are racing one another.

Nadine12:38 PM
So like, Google home started beefing up their embeded SDK and amazon was like "oh shit, we gotta get on that". I think really, they're both vying to be like ubiqutious OS vs just an interface.

Nadine12:39 PM
So it almost feels like that functionality is becoming more general, with the stretch goal of getting their services to work more w/ context.

Nadine12:39 PM
vs just command / control.

Nadine12:40 PM
But the implementations sure do feel haphazard sometimes!

Stephen Tranovich12:41 PM
That race to more functionality sounds generally great for hackers, as it gives us more ways to interface with the platform. Would you say that's true?

Stephen Tranovich12:41 PM
(haphazard implementations aside for the moment :p )

Nadine12:42 PM
Yeah I'd say that's true, because it makes the platform more accessible / flexible. Right now you have to string together a program, or pop stuff through other platforms to do things, but I'd much rather have just something built into their API.

Stephen Tranovich12:44 PM
This ties in well with the question from @Lutetium : How can we use these existing voice-command platforms to control functions of our own hardware hacks?

Nadine12:44 PM
Oh there's a few different ways.

Nadine12:45 PM
So if you're using Alexa, you can obvs write your skills, and use webhooks, and in those webhooks you can make calls to other services, or host them locally on your own server and ping local devices.

Nadine12:46 PM
w/ google you can do the same, or you can use the embdded sdk and make some custom actions for the device itself.

Nadine12:46 PM
You can take them apart, and use the mics or turn them into different looking devices that fit an ahestetic.

Stephen Tranovich12:48 PM
What suggestions do you have for interfacing these into common electronics boards like arduino, raspberry pi, etc?

Nadine12:50 PM
Well for arduino, if you're just doing automation, there's a library called fauxmo, which will emulate a wemo switch, which is handy.

Nadine12:50 PM
you can also just toss a json blob somewhere and have your device ping that endpoint that might have some commands in it.

Nadine12:50 PM
hmm.

Nadine12:51 PM
a lot of the embedded sdks are mostly just bundles of python libraries (google home is anways), so you could just interface directly w/ the GPIO pins on a pi.

Stephen Tranovich12:53 PM
Have used Voice UIs to interface with any hardware project you've been a part of?

Nadine12:54 PM
Yeah, right now I'm setting up Fortune Tasker at an event. Its like an alexa that prints absurd fortunes. I've used them to control blenders, neopixels, and little bots. And currently I've been using them to use neural networks to ID objects.

Stephen Tranovich12:55 PM
That's awesome!

Nadine12:55 PM
I mean once most things become a call to an endpoint its not too bad to start stringing little actions together to do bigger things.

Stephen Tranovich12:56 PM
We only have a few minutes left, but I'd love to hear a tad more about one or all of those projects with the last bit of this time.

Stephen Tranovich12:56 PM
Do you have images and/or documentation of any of these?

Nadine12:57 PM
yeah! I have a vimeo: https://vimeo.com/nlessio that shows some of the things I've been working on.

Nadine12:57 PM
I think the blender w/ seasonal affective disorder is my favourite.

Nadine12:57 PM
its just odd and random.

Stephen Tranovich12:57 PM
Hah, the SAD blender

Stephen Tranovich12:58 PM
I get it :D

Stephen Tranovich12:58 PM
Perfect, I'm going to get lost into a vimeo project hole

Stephen Tranovich12:58 PM
but before I do that

Stephen Tranovich12:59 PM
I want to say, thank you so much @Nadine for coming on the chat and sharing your awesome explorations into the world of Hacking Voice UIs with us!!

Nadine12:59 PM
Thanks for having me! ^_^

Discussions

Become a Hackaday.io Member