Hacking Alexa to send Arbitrary Texts
I finally jumped on the Alexa bandwagon this week, and picked up an Amazon Dot. I got the Dot, because it's a lot cheaper than the Echo, and I already have an old speaker hanging around that I can use with it.
Just to clarify - the Dot is a little hockey puck sized device that sits in your house and listens to you, and then spits back audio from various services or responses from Alexa (it can do a few other things like act as a bluetooth receiver); Alexa, on the other hand, is the service that powers all of the "intelligence." Developers can create Skills for Alexa which can integrate with various other services (Alexa, in turn, is built on top of Lex, a new service from AWS). Together, they can also connect to Smart Home devices to improve the experience (for example, saying "Alexa, turn off the lights and lock the doors" is a lot nicer than finding your phone, unlocking it, finding the app, and hitting some buttons.")
I don't have any smart home devices, so I've been using Alexa for playing music, asking quick questions (like math questions, or History questions - think Wikipedia excerpts), listening to podcasts, setting timers, and adding items to my shopping and to-do lists.
One thing that was noticeably missing is the ability to send text messages, which I use Siri/Ok Google for a lot. Ostensibly, this is because Alexa (and Lex) is getting away from the business of doing arbitrary speech to text. There are various reasons for this, and this (admittedly biased) post goes pretty in depth on the issue, and I haven't researched it in depth, so I won't speculate or provide misinformed explanations. It does seem that the folks at Amazon think there are ways to achieve the same functionality so I might not be completely out of luck.
The set up was actually quite easy. I just:
- forked one of the Alexa skill sample lambda functions from Github, updating it to handle a few different request formats
- created the Alexa Skill and pointed it at the arn uri of the lambda function
- Created speech intents and utterances for the action
- granted the IAM role for the lambda function permissions for SNS
Getting the invocation name and speech utterances right was the hardest part. The invocation name has to be something unique and distinct so that Alexa picks it up right. Unfortunately this should also be a name that isn't stupid, and we all know how hard it is to name something. Plus the speech detection quirks 😫.
My intents definition is simple, with just one intent, and two "slots" which are effectively variables for the intent.
My rough draft of utterances provides a number of different phrases that the user can say to trigger the intent.
And last, the extra rough code for sending a basic text from an intent:
The way to really move forward with this would be to create a service where users can create an account and sync contacts from various services, and provide an authentication link between Alexa and my service. Then, the skill would make a request out to my service to fetch the matching contact for the authenticated session on the request, in order to fetch the phone number. Obviously, this is a much bigger task, but maybe I'll mess around some more and see what I can do.