annotated bibliography

profilelady2005
alexa.pdf

26

TECHNOLOGYREVIEW.COM MIT TECHNOLOGY REVIEW VOL . 120 | NO. 5

Voice-based AI devices aren’t just jukeboxes with attitude. They could become the primary way

we interact with our machines.

By George Anders Illustrations by Roman Muradov

SO17_alexa.indd 26 8/2/17 3:42 PM

27

TECHNOLOGYREVIEW.COM MIT TECHNOLOGY REVIEW

VOL . 120 | NO. 5

payments could catch on. The smallest of these three mar- kets, home automation, already accounts for more than $5 billion of spending each year, while retail sales in the U.S. last year totaled $4.9 trillion. Today Amazon makes money on the machines themselves, at prices ranging from $50 for Dots to $230 for the highest-end Echos with video screens, and reaps a second payo� if users end up shopping more heavily at Ama- zon’s vast online store. (Amazon won’t disclose those tra�c numbers, however.)

For Echos to become as pervasive as smartphones, they will need to do many more things. To that end, Amazon is encouraging independent developers to build new services on the platform, just as Apple has long done with app develop- ers. More than 15,000 such “skills,” or apps, have been built so far, and app-building tools have become so easy to snap together that it’s now possible to build a simple skill in about an hour, without much programming knowledge. Among the most popular apps are ride-hailing options from Uber and Lyft. Duds include 48 separate skills that bombard listeners with insults.

Among the most ambitious developers are companies mak- ing hardware or selling services that work with Alexa. Capital One, for example, is o�ering Alexa-based bill payment to its banking customers; Toronto-based Ecobee is one of a number of makers of smart thermostats to rig up Alexa-powered ver- sions that let people raise or lower room temperatures merely by uttering a few words. “Our customers have busy lives,” says Stuart Lombard, chief executive of Ecobee, which now gets roughly 40 percent of overall sales from its Alexa devices, the 10-year-old company’s fastest-growing product line. “They have to fight tra�c to get home, and then they have to feed the kids, diaper the baby, and who knows what else. We give them a hands-free way of getting something done while they’re in the midst of other tasks.”

When speech meets AI What makes voice-based AI so appealing to consumers is its promise to conform to us, to respond to the way we speak— and think—without requiring us to type on a keyboard or screen. That’s also what makes it so technically di�cult to build. We aren’t at all orderly when we talk. Instead, we inter- rupt ourselves. We let thoughts dangle. We use words, nods, and grunts in odd ways, and we assume that we’re making sense even when we aren’t.

Thousands of Amazon sta�ers are working on this chal- lenge, including some at research hubs in Seattle, Sunnyvale, California, and Cambridge, Massachusetts. Even so, Amazon’s careers page recently o�ered 1,100 more Alexa jobs spread

On August 31, 2012, four Amazon engineers filed the funda- mental patent for what ultimately became Alexa, an artificial- intelligence system designed to engage with one of the world’s biggest and most tangled data sets: human speech. The engineers needed just 11 words and a simple diagram to describe how it would work. A male user in a quiet room says: “Please play ‘Let It Be,’ by the Beatles.” A small tabletop machine replies: “No problem, John,” and begins playing the requested song.

From that modest start, voice-based AI for the home has become a big business for Amazon and, increasingly, a strategic battleground with its technology rivals. Google, Apple, Samsung, and Microsoft are each putting thousands of researchers and business specialists to work trying to cre- ate irresistible versions of easy-to-use devices that we can talk with. “Until now, all of us have bent to accommodate tech, in terms of typing, tapping, or swiping. Now the new user inter- faces are bending to us,” observes Ahmed Bouzid, the chief executive o�cer of Witlingo, which builds voice-driven apps of all sorts for banks, universities, law firms, and others.

For Amazon, what started out as a platform for a bet- ter jukebox has become something bigger: an artificial- intelligence system built upon, and constantly learning from, human data. Its Alexa-powered Echo cylinder and tinier Dot are omnipresent household helpers that can turn o� the lights, tell jokes, or let you read the news hands-free. They also col- lect reams of data about their users, which is being used to improve Alexa and add to its uses.

Tens of millions of Alexa-powered machines have been sold since their market debut in 2014. In the U.S. market for voice-powered AI devices, Amazon is believed to ring up about 70 percent of all unit sales, though competition is heating up. Google Home has sold millions of units as well, and Apple and Microsoft are launching their own versions soon.

The ultimate payo� is the opportunity to control—or at least influence—three important markets: home automa- tion, home entertainment, and shopping. It’s hard to know how many people want to talk to their refrigerators, but pat- terns of everyday life are changing fast. In the same way that smartphones have changed everything from dating etiquette to pedestrians’ walking speed, voice-based AI is beginning to upend many aspects of home life. Why get up to lock the front door or start your car heater on a bitterly cold day, when Alexa or her kin can instantly sort things out instead?

For now, Amazon isn’t trying to collect revenue from companies making smart thermostats, lightbulbs, and other Alexa-connected devices. Down the road, though, it’s easy to imagine ways that revenue-sharing arrangements or other

SO17_alexa.indd 27 8/2/17 3:42 PM

28

TECHNOLOGYREVIEW.COM MIT TECHNOLOGY REVIEW VOL . 120 | NO. 5

across a dozen departments, including 215 slots for machine- learning specialists. During a meeting at the company’s Cam- bridge o�ces, I asked Alexa’s head scientist, Rohit Prasad, why he needs so many people—and when his research team might be fully built out.

“I’m laughing at every aspect of your question,” Prasad replied.

After a few seconds, having regained his composure, Prasad explained that he’s been working on speech technol- ogy for 20 years, with frustratingly slow results for most of that period. In the past five years, however, giant opportuni- ties have opened up. Creating a really e�ective voice-triggered AI is a complex and still unconquered task (see “AI’s Language Problem,” September/October 2016). But while in the past, speech scientists struggled to determine the exact meaning of sometimes-chaotic utterances on the first try, new approaches to machine learning are making progress by taking a di�er- ent tack: they work from imperfect matches at the outset, fol- lowed by rapid fine-tuning of provisional guesses. The key is working through large swaths of user data and learning from earlier mistakes. The more time Alexa spends with its users, the more data it collects to learn from, and the smarter it gets. With progress comes more opportunity, and the need for more manpower.

“Let me give you an example,” Prasad said. “If you ask Alexa ‘What was Adele’s first album?’ the answer should be ‘19.’ If you then say, ‘Play it,’ Alexa will know enough to start play- ing that album.” But what if there’s some conversational ban- ter in the middle? What if you first ask Alexa what year the album came out, and how many copies it sold? Finish such an exchange with the cryptic “Play it,” and earlier versions of Alexa would have been stumped. Now the technology can fol- low that train of thought, at least sometimes, and recognize that “it” still means “19.”

This improvement comes from machine-learning tech- niques that reexamined thousands of previous exchanges in which Alexa stumbled. The system learns what song users actually did want to hear, and where the earlier parts of the conversation first identified that piece of music. “You need to make some assumptions at the beginning about how people will ask for things,” says James Glass, head of the spoken- language systems group at MIT. “Then you gather data and tune up your models.”

The case for such a machine-learning approach is widely appreciated, Glass says, but making it work requires far more data than university researchers can easily muster. With Alexa’s usage surging, Amazon now has access to an expansive repository of human-computer speech interactions—giving it

Rules of Engagement

Alexa, turn up the lights

Alexa, increase the lights* by 50 percent, please

Alexa, rock my world with the lights

* Alexa works with more than 50 smart lighting systems, among them Philips Hue, GE Link Bulb, and Ikea Trådfri. ** A German-language version of Alexa made its debut in 2016. Other non-English versions are expected to follow.

Alexa, schalte Lichter ein**

Alexa, set the brightness of the lights to 75

Alexa, make the lights more bright

Alexa, use the lights to dazzle my eyes

Alexa understands an increasing list of commands. Here are some of the dozen ways you can tell her to make a room brighter (gray), and a few that won’t work (red).

SO17_alexa.indd 28 8/2/17 3:42 PM

29

TECHNOLOGYREVIEW.COM MIT TECHNOLOGY REVIEW

VOL . 120 | NO. 5

the sort of edge in fine-tuning its voice technology that Google has long enjoyed in text-based search queries. Outside data helps, too: a massive database of song lyrics loaded into Alexa in 2016, for example, has helped assure that users asking for the song with “drove my Chevy to the levee” will be steered to Don McLean’s “American Pie.”

One of the newest projects for Prasad’s group highlights the flexibility of this approach. It involves deciphering the moments when users backtrack on their initial requests. Sig- naling phrases can vary enormously. Some people say “No, no, no”; others prefer “Cancel that,” and a third bunch tries some variant of “Wait, actually, here’s what I want instead.” Alexa doesn’t need to decode each utterance. Large samples and semi-supervised machine learning enable it to outline a cluster of likely markers for negated speech, and then pick up a coher- ent new request after the change of course.

In addition to making Alexa a better listener, Amazon’s AI experts are using troves of data to make it a better speaker, fine-tuning the cadences of the machine’s synthetic-female voice, in order to boost sustained usage. Traditional attempts at speech synthesis rely on fusing many snippets of recorded

human speech. While this technique can produce a reason- ably natural sound, it doesn’t lend itself to whispers, irony, or other modulations an engaging human speaker might use. To sharpen Alexa’s handling of everything from feisty dialogue to calm recital, Amazon’s machine-learning algorithms can take a di�erent approach, training on the eager, anxious—and wise- sounding—voices of professional narrators. It helps that Ama- zon owns the audiobook publisher Audible.

So much to talk about Among the most ardent adopters of voice-based AI are people who can’t easily type on phones or tablets. Gavin Kerr, chief executive of Philadelphia’s Inglis, which provides housing and services for people with disabilities, has installed Amazon Echo and Dot devices in eight residents’ homes. He hopes to even- tually add them to all 300-some residences once pilot testing is complete. “It’s an incredible boon for residents,” Kerr says. “They can be more comfortable. It gives them independence.”

Kerr works with hundreds of people who have multiple sclerosis or other debilitating conditions. For those who are bedridden or use wheelchairs, a hard-to-reach wall thermostat

Some people say “No, no, no”; others prefer “Cancel that,” and a third bunch tries some variant of “Wait, actually, here’s what I want instead.” Alexa doesn’t need to decode each utterance.

SO17_alexa.indd 29 8/2/17 4:38 PM

30

TECHNOLOGYREVIEW.COM MIT TECHNOLOGY REVIEW VOL . 120 | NO. 5

resides in its ability to alleviate the stresses of an overbooked life. “With the simple action of asking,” Austin wrote, “Alexa relieves the negative emotions of uncertainty and the fear of forgetting.” Users get hooked on bringing all sorts of momen- tary puzzlements or desires to Alexa, he contended; it’s the companion that’s always ready to engage.

Every week—sometimes more often—Alexa general man- ager Rob Pulciani scans aggregate data on the most common utterances by Alexa and Dot users. Typically, the top of the list is dominated by requests for music, news, weather, tra�c, and games. This past spring, however, a newcomer was rising fast. The trending phrase: “Alexa, help me relax.”

When users make this request, they are steered into a col- lection of soothing sounds. Birds chirp; distant waves hit the shore; freight trains rumble through the night. Such ambient- noise loops can keep playing for hours if users choose. P ulciani had regarded these apps as minor oddities when they first appeared on the Alexa platform, in 2015. But they have rap- idly picked up a big following. Stressed-out adults use the sounds to fall asleep. Parents turn them into lullaby substitutes for cranky infants. Over the next few weeks after his discov- ery, Pulciani and colleagues fine-tuned Alexa’s internal archi- tecture so new Echo buyers could rapidly discover soothing sounds if they asked for pointers about what new skills to try.

can be a constant source of torment. “Their bodies have a hard time regulating temperature,” Kerr explains. “A room that’s 72 °F may feel hot one hour and cold the next.” With limited mobility, there’s no easy way to get comfortable, especially if round-the-clock assistance isn’t available.

With a bit of tinkering, Alexa’s software can serve even those with severely restricted speech. Kerr tells of one man in his late 30s who wanted to leave a long-term-care facility and move back into an everyday community. “He told us, ‘I’ll never be able to use Alexa’s commands,’” Kerr recalls. “So we asked him, ‘What can you say?’ Then we reworked the software so he could make Alexa work on his terms. Now he says ‘Mom’ when he wants to turn on the kitchen lights, and ‘John’ when he wants to turn on the bathroom lights.”

Although Inglis provides its Echo users with four hours of training, it’s far more common for new users to grope their way along. Pull an Echo out of the box, and a bit of packaging will highlight especially common applications, such as playing music, setting alarms, or updating shopping lists. Organized users can call up Alexa control panels on their smartphones or laptops to adjust settings, hunt for new apps, or get guidance on what prompts will make an app work best.

In a widely read blog post in June, Microsoft product manager Darren Austin wrote that Alexa’s broader success

SO17_alexa.indd 30 8/2/17 3:42 PM

31

TECHNOLOGYREVIEW.COM MIT TECHNOLOGY REVIEW

VOL . 120 | NO. 5

Sustained conversation In studies, AI platforms by Google, Apple, Microsoft, and Amazon all show di�erent strengths. Google Assistant is the best on wide-ranging search commands. Apple’s Siri and Microsoft’s Cortana have other talents. Alexa does particularly well with shopping commands.

The ultimate triumph for voice-based AI would be to carry on a realistic, multi-minute conversation with users. Such a feat will require huge jumps in machines’ ability to dis- cern human speakers’ intent, even when there isn’t an obvi- ous request. Humans can figure out that a friend who says “I haven’t been to the gym in weeks” probably wants to talk about stress or self-esteem. For AI software, that’s a hard leap. Sud- den switches in topic—or oblique allusions—are tough, too.

Eager to strengthen ties with the next generation of AI and speech researchers, a year ago Amazon invited engineer- ing students at a dozen universities worldwide to build voice bots that can sustain a 20-minute conversation. The campus making the most progress by this November’s deadline will win a $500,000 prize. I auditioned a half-dozen of these bots one weekend, moving each time from simple queries to trick- ier open-ended statements of opinion that invited all kinds of possible replies. We got o� to a good start when one bot asked me, “Did you see any recent movies?” “Yes,” I replied, “we saw

Hidden Figures.” Rather than mimic newspaper reviews of this poignant film about NASA’s early years, the social bot shot back: “I thought Hidden Figures was very thin on the actual mathematics of it all.” Not my take on the film, but it seemed like a charmingly appropriate thing for an AI program to say. Our conversation stalled out soon afterward, but at least we had that brief, beautiful moment.

Alas, none of the other bots could come close. The most confused one blurted out sentences such as “Do you like curb service?” when I thought we were trying to talk about Internet sites. I said something perhaps a little sharp about the bot’s limitations, only to be asked: “Can you collective bargaining?” A few days later, when I asked Amazon’s Prasad for his take on the social bots, none of their early failings bothered him. “It’s a super-important area,” he told me. “It’s where Alexa could go in terms of being very smart. But this is way harder than play- ing games like Go or chess. With those games, even though they have a lot of possible moves, you know what the end goal is. With a conversation, you don’t even know what the other person is trying to accomplish.” When Alexa is able to figure that out, we will really be talking.

George Anders has covered Amazon for national publications since the late 1990s. His newest book is You Can Do Anything.

Alexa’s broader success resides in its ability to alleviate the stresses of an overbooked life. It’s the companion that’s always ready to engage.

SO17_alexa.indd 31 8/2/17 3:42 PM

Copyright of MIT Technology Review is the property of MIT Technology Review and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.