Mapping Text to Actions

Rather than a video I took a few screenshots to show the start to finish flow of a command going from text input through to an action. In previous logs I've shown sort of the high level processing. How you actually interact with Stark via voice or text. Here I'm going to try and elaborate on more of the setup.

When you turn Stark on for the first time he knows how to do basically nothing. The system is meant to be a framework; so other than a few default commands like "are you there" you really won't get a lot done. Enter the web interface.

This is the commands page of the web interface. On the left hand side are all the various Methods you can run. These are loaded from any of the enabled Modules via a config file bundled with the module. This is basically a list of all the actions you can perform. Clicking on a method displays any arguments that the method accepts (optional and required) as well as a quick way to test it.

All tests are done via a JSON query to the Stark server which return a JSON response. Responses have the following formatting.

{
message: "Message returned as text for user",
time: 1465842197,
arguments: {
    arg1: "text",
    arg2: "text"
}
data: {
   response: "Message again"
   wave_file_name: "name of generated voice file"
   long_response: "Longer response text if available"
   response_arg_1: "",
   response_arg_2: ""
}
method: "Module.Action",
type: "SUCCESS|ERROR|QUESTION|IN_PROGRESS",
command_id: 00000000,
originator: {
   id: 00000000,
   name: "module.full.package",
   callback: true/false
   instance: "instance_name",
   user: "username"
}
}

The main things to notice here are that you always get the status of the response, either Success, Failure, Question, or In Progress. Question responses mean the system is waiting for more input, in progress responses mean the action succeeded but kicked off a threaded job which may still be running. There are also JSON hashes that contain all the arguments sent to the server, as well as a response object with any response information returned (data). The response can contain more JSON elements that can be used programmatically. An example would be in the Forecast.io module you'll get the weather information as an object with all the data points (temp, percent precip, etc) as well as a written message. The response object also contains the file name of a generated text-to-speech wave file that is generated and saved before the response is sent. This allows clients to poll the server for the response in a speech format.

Once you confirm that a given method works, the next step is attaching it to an action - or Job. A job is what takes the method and puts it around a phrase that a human would actually use to trigger it. Off the shelf systems - like Alexa - allow the programmer to specify what these phrases are. This makes them very specific, which allows for good speech recognition. This also makes them very inflexible.

Stark allows you to define your own expressions using regex syntax for matching. The simplest expressions are ones that trigger an action with no arguments. This is the case with the default "Are you awake" command. This is a very simple Job that triggers based on the phrases "are you awake" or "are you there". Notice how you can use the power of the regular expressions to define variances in what you say/type. It's this variance that allows for multiple phrases to trigger the same actions. It more closely resembles how a human would actually interact. Besides, if the phrase isn't quite right, you can go back in and tweak it.

For a more complicated example I'm going to use the SmartThings method of SmartThings.ToggleSwitch. This action will turn on or off a given "switch" type device within the SmartThings hub. This is any device, defined by name, with a switch characteristic associated with it. Here is an example regular expression you could use to trigger the method:

(turn\s(on\s|off\s)?the\s)(\w+\slights)(\son|\soff)?

Examining this you'll see various expression groups. The most important of which is the (\w+\slights) group. We are targeting any device that has a name "WORD Lights". In my home this could be "office lights", "basement lights", etc. Saying a phrase such as "turn on the bookshelf lights" or "turn the bookshelf lights off". Would both match this regular expression and match it to this action.

So how does the regex group correspond to the argument? This is where the framework gets a little more powerful. For each Job you can add Parameter Mappings. These are rules that match parts of the regular expression and map them to arguments accepted by the Method.

The following types of Parameter Mappings are supported:

Exclude - anything matching this regex is excluded, the remaining text is set as the argument
Include - anything matching this regex is set as the argument
Group - this group number in the regex is mapped to the argument (most useful)
Index - The word at this index is mapped to the argument (least useful)
Defined - ignore the given text completely and set the argument to a pre-defined value

Using the SmartThings.ToggleSwitch example we need to send the name of the switch as an argument. In this case I used the "Group" Parameter Mapping and used the group index of 3. This is then mapped to the "name" argument.

Here is another example using the Filesystem.Transcode method. Just looking at the Parameter Mapping area you can see this method accepts multiple arguments.

file - this is using the result of a previous command as a variable in the next command. In this case #PREVIOUSE_RESPONSE#_file. At runtime this will map to the "file" variable of the last response - if it exists.
copy_path - where to copy the file to. This is an exclude mapping. It excludes the phrase "transcode it to" and uses whatever is left as the path name.
transcode_ext - the file extension of the transcoded file. This is a defined parameter, I always want it to be .mp4

You can see how using these parameter mappings can make Jobs very powerful. It is also possible to map one phrase to multiple Methods. You could have one phrase such as "get ready for home theater mode" dim the lights, turn on the tv, start a movie, etc, etc as long as you had the right Modules enabled.

p.s. You may notice the "Visible To" checkbox area towards the bottom of each Job screenshot. This has to do with User Security, which I'll explain in a different log post.

Integrations Demo

The StarkBrain File

Discussions

Become a Hackaday.io Member