Building an Amazon Echo Skill with Python and Lambda

I bit the bullet about a month ago and picked up an Amazon Echo. While having a bluetooth networked speaker is cool, the fact that Amazon has opened up an API to allow you to interface with the voice aspects of the ECHO takes the platform from neat to awesome. With all of this in mind I wanted to start solving some of the daily tasks that I would normally use my phone for with the awesome speech recognition that the ECHO has. For example, asking the Alexa (the ECHO) to tell me what time my bus was going to be at the stop at the end of my block is a really common task that I’d do almost daily. Simply asking and getting a time is a lot more useful to me and makes things just a little bit easier than searching for my phone and opening an app and waiting for it to refresh with my bus times.

Mapping out the task:

Before I attempted to hack together some really poor python code, I first needed to clearly understand what was needed from every end of the process. While this seems like simple task a lot of different things need to happen to make this skill work so I find it easier to map things out before I get Here’s a quick a dirty picture of how the process works. Functionally, I needed to figure out how to connect to the CTA API and parse XML (xml, gross) for the next bus that’s arriving at my stop, then compute how many minutes that bus is away from my stop. For future goodness, I’d like my app to output the times for the next three buses, but for today one will do. Before all of that, I need to figure out how to get my code into a Lambda and execute it from an Alexa Skill.

Alexa Bus App Workflow

Anatomy of an Echo/Alexa request:

Alexa apps range from very simple to very complex. Some application are invoked and prompt the user through a series of requests and answers. For my purposes, I’m looking for a simple response that quickly tells me when my bus will be at the stop down the street. My application will need to invoke the skill and pass it the information needed to return the needed response. To accomplish this my app will respond to the following request:

“Alexa, ask bus times for my next bus.”

For your Echo and Amazon’s Cloud services to understand what to execute this request is broken down into several functional keys:

  1. Alexa – The wake word to make your Echo start “listening for more information”
  2. ask – Tells the Echo that we’ll be passing the skill invocation name over next.
  3. bus times – the invocation name of the Alexa Skill that is being developed
  4. for – the connecting word that passes the utterance to the skill directly.
  5. my next bus – the utterance that will map to an intent to be executed in our lambda code

Creating a Skill

First login to the Alexa Skills Kit developer console. We’ll be leveraging the “Alexa Skills Kit” to create our new service.

alexa_skills_kit

From here we’ll create a new skill for tracking our bus times, by clicking “Add a New Skill”:

Alexa_Dev_Console

That brings up the Skill Information page:

skill_information

Enter the following options:

  • Skill Type – Custom Interaction Model, because we’re not using a Smart Home Skill API
  • Name – Enter a name to be displayed in the Alexa App
  • Invocation Name – The name that the user will say to interact with the skill. Note, that this should follow the following invocation name guidelines and can be edited while the skill is in development mode but cannot be changed after the skill has been certified.

Click Next to Bring up the Interaction Model Page:

interacton_model

Intents

Intents map users voice input to services that your Alexa skill can address. More simply put an intent represents an action that fulfills the end user’s request. This mapping leverages a JSON structure called an intent schema. Here I define a single intent called GetBusTime, which will line up with a portion of code in my Amazon Lambda function that I leverage in a bit. This is a really simple example, it’s totally possible to include more intents in your intent schema to allow your app do many things.

intent_schema

After identifying the function for the Alexa skill to execute in the lambda function we then need to link this to a human request that will trigger this linkage. To do this we leverage utterances.

Utterances

Amazon Echo leverages an organized and structured text file that maps intents to the skills by leveraging a mapping file of likely utilized phrases called utterances. My app is really simple. In fact, to get it going, I’ve stripped out anything that might be asking for more than just triggering my lambda function. To trigger the lambda function my intent looks for any of the following utterances. Each line of the utterances file always beings with the intent, in my case “GetBusTime” and then ends with what you might expect a user to ask (the utterance).

Click Next and we’ll move onto a page that allows us to use a Amazon Resource Name (ARN) to link to AWS Lambda.

Linking to AWS Lambda:

Adding_an_ARN_Endpoint

Before we can add the ARN to our Alexa Skill we need to create the Lambda function that the ARN references. To do this, login to your AWS Management Console (separate from the Alexa Developer Console) and click Lambda under the Compute section

click_lambda

Click Create a Lambda function:

create_new

Filter for alexa and select “alexa-skills-kit-color-expert-python”

filter

Under the Event source type select “Alexa Skills Kit” and select Next

alexa_skills_kit

On the Next Screen we’ll configure our function:

Under Name/Description enter something that identifies the lambda function to you. I’m using bus_python_lambda for my name and left the description blank.

configure_funtion

Under Lambda function handler and role, I’ve entered lambda_function.lambda.handler and created a new basic execution role:

role

On the next page accept the defaults to create a new IAM role and Policy Name and click Allow:

IAM_Role

From here, it’s time to enter our python (or edit the python that exists) to work with our Alexa Skill. Before doing so, take some time to have a look at the sample color application to understand what each function in the example is doing. I’ve simply modified each of these functions to suit my but tracker application’s purpose. Let’s take a look at the code that I’m using for the app:

My Really Bad Python Code:

To this point we’ve created all of the linkages for our Alexa Skill that we can create without writing a bit of code. Below you’ll find my python lambda code. Couple of small notes:

  • The majority of the work for the application happens in the get_bus_time function.
  • You’ll note that you’ll need to get an API key from the CTA to make everything work and that for right now my code is statically set to the southbound #22 bus at a stop on Clark and Byron in Chicago Illinois.

 

"""
This sample demonstrates a simple skill built with the Amazon Alexa Skills Kit.
The Intent Schema, Custom Slots, and Sample Utterances for this skill, as well
as testing instructions are located at http://amzn.to/1LzFrj6

For additional samples, visit the Alexa Skills Kit Getting Started guide at
http://amzn.to/1LGWsLG
"""

from __future__ import print_function
import urllib2
import xml.etree.ElementTree as etree
from datetime import datetime as dt

def lambda_handler(event, context):
    """ Route the incoming request based on type (LaunchRequest, IntentRequest,
    etc.) The JSON body of the request is provided in the event parameter.
    """
    print("event.session.application.applicationId=" +
          event['session']['application']['applicationId'])

    """
    Uncomment this if statement and populate with your skill's application ID to
    prevent someone else from configuring a skill that sends requests to this
    function.
    """
    # if (event['session']['application']['applicationId'] !=
    #         "amzn1.echo-sdk-ams.app.[unique-value-here]"):
    #     raise ValueError("Invalid Application ID")

    if event['session']['new']:
        on_session_started({'requestId': event['request']['requestId']},
                           event['session'])

    if event['request']['type'] == "LaunchRequest":
        return on_launch(event['request'], event['session'])
    elif event['request']['type'] == "IntentRequest":
        return on_intent(event['request'], event['session'])
    elif event['request']['type'] == "SessionEndedRequest":
        return on_session_ended(event['request'], event['session'])


def on_session_started(session_started_request, session):
    """ Called when the session starts """

    print("on_session_started requestId=" + session_started_request['requestId']
          + ", sessionId=" + session['sessionId'])


def on_launch(launch_request, session):
    """ Called when the user launches the skill without specifying what they
    want
    """

    print("on_launch requestId=" + launch_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    # Dispatch to your skill's launch
    return get_welcome_response()


def on_intent(intent_request, session):
    """ Called when the user specifies an intent for this skill """

    print("on_intent requestId=" + intent_request['requestId'] +
          ", sessionId=" + session['sessionId'])

    intent = intent_request['intent']
    intent_name = intent_request['intent']['name']

    # Dispatch to your skill's intent handlers
    if intent_name == "GetBusTime":
        return get_bus_time(intent, session)
    elif intent_name == "AMAZON.HelpIntent":
        return get_welcome_response()
    else:
        raise ValueError("Invalid intent")


def on_session_ended(session_ended_request, session):
    """ Called when the user ends the session.

    Is not called when the skill returns should_end_session=true
    """
    print("on_session_ended requestId=" + session_ended_request['requestId'] +
          ", sessionId=" + session['sessionId'])
    # add cleanup logic here

# --------------- Functions that control the skill's behavior ------------------


def get_welcome_response():
    """ If we wanted to initialize the session to have some attributes we could
    add those here
    """

    session_attributes = {}
    card_title = "Welcome"
    speech_output = "Welcome to the Alexa Bus Tracker Application. " \
                    "Please ask me for bus times by saying, " \
                    "What are my bus times?"
    # If the user either does not reply to the welcome message or says something
    # that is not understood, they will be prompted again with this text.
    reprompt_text = "Please ask me for bus times by saying, " \
                    "What are my bus times?"
    should_end_session = False
    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))


def get_bus_time(intent, session):
    """ Grabs our bus times and creates a reply for the user
    """

    card_title = intent['name']
    session_attributes = {}
    should_end_session = True

    url="http://www.ctabustracker.com/bustime/api/v1/getpredictions?key=YOURAPIKEYGOESHERE&rt=22&stpid=1820"
    xml_data = urllib2.urlopen(url)

    #parse the example into ElementTree
    tree = etree.parse(xml_data)
    #close connection
    xml_data.close()

    #Find the root element
    rootElem = tree.getroot()

    #create lists to hold timesamps and prediction times
    timestamps = []
    predictiontime = []
    prediction = []

    #iterate over elements in rootElem finding tags with tmstmp and prdtm
    for element in rootElem.iter():
        if element.tag == 'tmstmp':
    #        print element.tag, element.text
            timestamps.append(element.text)
        if element.tag == 'prdtm':
    #        print element.tag, element.text
            predictiontime.append(element.text)

    #print "Print out list data for tmstmp and prdtm"
    #print out our values to make sure Tim isn't stupid
    #for value in timestamps:
    #    print value
    #for value in predictiontime:
    #   print value

    #describe how the XML time data looks when retrieved from the list
    FT = '%Y%m%d %H:%M'
    #initialize count to check our work
    count = 0
    for element in rootElem.getchildren():
        delta = dt.strptime(predictiontime[count], FT) - dt.strptime(timestamps[count],FT)
        prediction.append(int(delta.total_seconds()/60))
        count += 1

    #check your work to make sure that we iterated correctly
    #for value in prediction:
    #    print value
    #print "Count of child xml elements under root (equal to actaul busses):"
    #print count

    if count != 0:
        speech_output = "Your next bus is arriving in " + str(prediction[0]) + " minutes" \

        reprompt_text = ""
    else:
        speech_output = "Please ask me for bus times by saying, " \
                        "What are my bus times?"
        reprompt_text = "Please ask me for bus times by saying, " \
                        "What are my bus times?"
    return build_response(session_attributes, build_speechlet_response(
        card_title, speech_output, reprompt_text, should_end_session))


# --------------- Helpers that build all of the responses ----------------------


def build_speechlet_response(title, output, reprompt_text, should_end_session):
    return {
        'outputSpeech': {
            'type': 'PlainText',
            'text': output
        },
        'card': {
            'type': 'Simple',
            'title': 'SessionSpeechlet - ' + title,
            'content': 'SessionSpeechlet - ' + output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'PlainText',
                'text': reprompt_text
            }
        },
        'shouldEndSession': should_end_session
    }


def build_response(session_attributes, speechlet_response):
    return {
        'version': '1.0',
        'sessionAttributes': session_attributes,
        'response': speechlet_response
    }

Stitching it All Together:

Save your code in the code portion and click create function.

From there grab the ARN that correlates to your Lambda function in the upper right hand corner of the Lambda console. We’ll need to transfer that to the Alexa Developer Console in the next step.

ARN

 

Back in the Alexa Skill creation window we need to link to the created Amazon Resource Name (ARN) for our lambda function as our endpoint for the Alexa Skill:ARN_copy_test

 

From here we can move onto testing.

Testing the skill

My skill’s can take an utterance asking for “my next bus”. Let’s pass that along:

test_utternace

You can see that the Lambda responds as expected! A look at the AWS console lambda execution shows:

lambda_monitoring

Most importantly, on my side of the internet the Echo responds that my bus is arriving in 4 minutes. I’ll plan on digging in on some other possibilities with the Echo in the future, for now I just wanted to assist with getting people started with a fun an easy platform. Happy Automating!

 

 

3 Responses

  1. John Wheeler says:

    Hi Tim – This is an excellent post. I’ve been doing a little work with the Zappa folks to make my Python framework, Flask-Ask, even easier deploy to lambda. Since you have experience, I’d love to collaborate.

  2. Rich Elswick says:

    quick comment, if you try to switch the build_speechlet_response() to using SSML, it will break the welcome response. cannot mix SSML and plain text.

  1. July 13, 2017

    […] is mostly based on this work by Tim […]

Leave a Reply

Your email address will not be published. Required fields are marked *