ROLE: | VUI Designer |
DATE: | May - June 2018 |
TOOLS: | AWS, ASK, Node.js |
LINKS: | Twitter Web API |
While preparing for a Hackathon, called "Alexa Meets IoT", I wanted to make a Twitter gadget that I could leave on display for the duration of the event. The concept was simple: point the gadget at the event feed and let everyone see the latest tweets in real time. In sum, demonstrating the value of voice, surfacing live data, and inspiring others with an interactive project, were the primary objectives.
As I was getting organized, I bumped into a new tutorial: "Twitter Web API for ESP8266" by Deb Sahu. The project turns an ESP82266 into a web server that talks to the Twitter API (perfect!). But in Debs' version, users visit a web page to change the feed with a text input box. In other words, changing the Twitter channel requires manual data entry. To make the UX easier, I wanted to change channels with voice.
I approached Deb about adding voice UI to the project and he was keen to learn how to build skills and get some use out of his Alexa device. (Now it was my turn to give him a quick tutorial.) Together we built a second version, Alexa-Tweets, which uses voice to change the crawling tweet stream.
I created the dialog flow in the developer portal and the back-end skill logic in node.js. As you can see in the diagram, this is the application flow:
The first challenge we ran into was how to expose the ESP8266 to the cloud. Without this, there was no way for an AWS Lambda function to communicate with the web server running on my ESP8266. At this point, we knew the skill was working because we heard the verbal confirmation ("...now following @{your-Requested- Feed}") but the change wasn't reflected in the dot matrix display.
Deb found Ngrok - a tool that provides secure tunnels from private servers to the outside world. They also have a number of useful utilities (reserved IP addresses) that you can bake into your code and eliminate the need to manually change code each time you restart the service.
The second problem we faced was more complex. As I was thinking about who would be speaking at the upcoming event (or whom attendees would want to follow) - I realized that the Twitter handles were difficult for Alexa to recognize. As Twitter names are often made up of more than one word - they're not standard entries in the American English dictionary. Also there are no built-in slot types as there are with US_FIRST_NAMES for example.
However, I was working with a known set of names - the speakers list and hash tags that belonged to the event (#VoiceFirst, @LizMyers, etc.) So, my quick solution was to hardwire the possabilities into the code. In this way, I can say to Alexa: "Follow me" - and this phrase maps to "@LizMyers" in the skill code. I can also say things like: "follow this event", which maps to "#AlexaMeetsIoT" or "#TalkToMeBerlin" for example.
AI is not meant to be "hard wired" and admittedly my hacky solution is not optimal. But we got the project working and I was able to demonstrate effectively how channel surfing with voice is quick and easy. I was trying to communicate a vision and in terms of this goal, the project succeeded.
My dream of channel surfing with voice doesn't end there. The dot matrix display has always reminded me of a stock ticker, and since this domain has known lists of company names there are JSON libraries readily available (for mapping names to ticker symbols). For our next project, we plan to use voice to build a list of companies, convert their names into stock symbols, and retrieve current price data for the Red Ticker display. I've already begun the dialog design and core voice experience.
Early Prototype
Text Input UI
Voice UI Architecture
Improving Twitter ASR
Lego Build
Ready for the Next Project