Say Anything (Within Reason)
By Mark Frauenfelder, Mon Oct 28 00:00:00 GMT 2002

Call it wishful thinking. For centuries, people have dreamed of machines that could converse as easily as people do.

Imagine the value of a technology that could convert your utterances into text, extract the gist of what you are saying, and respond appropriately. Such a device could completely change the way we interact with computers, with the potential of making keyboards, touch-tone pads, and mice obsolete.

High quality, inexpensive natural language processing could also be very useful for mobile phone users, who could use voice recognition systems to navigate voice portals or get other information without having to resort to using the ridiculously tiny keypads than come with wireless devices.

Conversational Claire

I get my first taste of the natural language processing future on a warm Wednesday morning in Menlo Park, California. I'm sitting in front of Starbucks making a call to SprintPCS. I have a question about the phone bill. A perky service representative answers, identifying herself as Claire. "Briefly tell me how I can help you today," she says. "I lost my bill and I don't know how much I owe," I tell Claire. "Account information!" she enthuses. She then asks me to key in my zip code for verification, and after a couple of seconds, announces, "OK, got it!"

Claire is remarkably effervescent, especially at this hour of the morning. In fact, she's just as bubbly to every one of the 20 million callers she talks to every month. That's because Claire isn't real. She's a "virtual service representative" designed to direct Sprint PCS subscribers through various account functions. Callers talk to Claire as they would to a real customer support representative, and she does everything her human counterpart is supposed to do except cop an attitude. And if she gets stumped, she'll turn you over to a non-virtual assistant.

A company called Nuance designed Claire's underlying technology. Located in a bland one-story building in a nondescript business park near Stanford University, Nuance spun off from SRI International (formerly the Stanford Research Institute) in 1994 to develop speech recognition, voice authentication, and text-to-speech software for customer support systems.

Claire, who started taking calls for SprintPCS in November, uses Nuance's "Say Anything" technology, which is the latest development in the evolution of automated telephone support systems. "Say Anything" goes a step beyond "directed dialogue," the all-too-familiar voice prompt system that constrains what callers can say by asking for specific answers to questions: "To pay your bill, say 'pay bill.' To request a copy of your statement, say 'statement.'"

Natural Lingo

Nuance's natural language system typically starts out by asking callers, "What can I do for you?" and then listens for certain key words, ignoring everything else. Wally Brill, whose bleached and buzzed pate makes him looks more like an electronica pop star than Nuance's director of persona design and production, explains that the system was developed, in part, to overcome the fact that people don't use perfect grammar when they talk. For instance, he says, a customer calling in to report a lost credit card could conceivably say something like "Uh, a UFO landed in our backyard and my wife was abducted by aliens! She had her purse with her and when they let her go she noticed her credit card was missing!" Say Anything will look through the database of previously recorded conversations, and discover that the words "card," and "missing" often show up in requests to report a lost of stolen card, and direct the person to the proper department. "It ignores the stutters, ums and uhs," says John Shae, Nuance's vice president of product marketing and management. "Earlier systems would get screwed up," trying to process the superfluous utterances.

Before Say Anything can do anything, however, the system must first digitize callers' speech. It doesn't actually store the words themselves. It converts them into acoustical models made of phonemes (phonetic bits of speech, such as "ah" or "buh") that make up each word. "The system looks at the energy levels of speech and compares them to the dictionary of energy patterns already loaded into the database," says Shae. "There's a unique energy pattern associated with each word."

Peculiar Processing

This particular way of converting speech isn't really anything new. It's the way in which Say Anything processes the speech that's different. To explain how Nuance's system differs from traditional systems, Shae gives the example of an airline reservation system. The standard method, he explains, would be to construct a database that contains hand-coded grammatical rules that can detect every conceivable variation of the phrase "I want to fly to" followed by the name of a city. With Nuance's "Say Anything," the system developer starts out by letting a large number of test users call the system and make requests, recording what they say. The requests are sorted into different categories: itinerary changes, cancellation, frequent flier account, etc., and stored in the data based. "There's no structured grammar file," says Shae, "We just throw them into the system." Then, when real users call into the system, their requests are compared to the recorded requests, and using statistical models, the callers are directed to the (hopefully) proper service area.

"This is the direction we are headed in," says Dan Hawkins, a managing analyst for voice business at Datamonitor in New York. Hawkins says that Nuance's Say Anything (along with competitor Speechworks' "How May I Help You?" technology), is, in certain cases, a big improvement over directed dialogue systems. Take phone-based betting. Rather than having to go through a large number of steps ("What race would you like to bet on?" "What kind of bet would you like to place?" "Which horses?" "In what order?" "How much would you like to bet?"), you can just tell the system, "$10 on horse three to place, in the second race." The betting system would analyze the sentence, making sure that all the empty "buckets" of information were filled by the request, and if not, it would ask the caller for the missing information: "OK, $10 on horse number three, second race. Matinee or evening program?"

Loaded with Personality

As it turns out, a well-designed user interface can go a long way in making up for shortcomings in the kind of speech recognition technology that Nuance uses. It can convey the impression that the system understands what a caller is saying, even though it's merely looking for key words it can use to determine what callers want. Really, Nuance's system isn't much different than a touch-tone menu, in which users are trapped in voice mail hell.

The trick is to make users think that they aren't trapped, but are in an open, free-form conversation. Nuance and Speechworks (an MIT Media Lab spin-off) achieve this by combining their statistical-based language processing systems with cleverly constructed user interfaces designed to elicit utilizable information from callers. "We are faking an open interaction, even though it is pretty closed," says Brill. "We want to make the customer feel they are in charge, rather than being led through the nose by the computer.

To add more realism to this useful illusion, the virtual assistants in Nuance and Speechworks use carefully scripted personas, designed to allow the caller to interact with them in much the same way that they would with a real person. "Speech recognition systems are social actors," says Blade Kotelly, Speechwork's manager of worldwide solutions marketing. "While we never try to pretend a real person is on the other end of the line, we do try to clearly make sure that people are working with a system that is treating them in a certain way, and that people feel that they are being treated well by the system."

Nuance even goes so far as to give its personas "back stories," describing the personas' hobbies, occupation, and interests, so that voice actors can imbue the personas with consistent behavior.

Useful or Useless?

All this makes for entertaining phone conversation. But how useful is it, really? Can a company improve its bottom line by using a virtual assistant to handle its customer support?

Yes, say the experts, but only in certain cases. Natural speech processing is still too young (SprintPCS is Say Anything's only customer so far, and Speechworks has yet to announce any "How May I Help You?" deployments) to yield any useful metrics. But on the basis of "anecodotal feedback," says Hawkins, callers using natural language systems are able to get to the information they need more quickly than callers using directed dialogue systems. But he warns that deploying a natural speech system is "a tall order. It requires lots of works for months and months," mainly because it is "very difficult to predict what people are going to say. The variety of ways that people can ask for the same information is remarkable."

Aurica Yen, a senior analyst who covers enhanced productivity applications and services for the Yankee Group, agrees that while natural speech applications are "a good thing," they are limited. "It's not going to apply for everything. There's a limit to the applications that will result in efficient services, and there's some concern that this type of technology could be inappropriately sold to companies that don't need it."

Nuance and Speechworks say they're only interested in using natural language in places where it makes sense to do so. And Hawkins says there are very few places right now where it does make sense. "99 times out of 100, directed dialogue is better. The increased work [required to deploy a natural speech system] is going to add to the cost." However, he says, the cost of a natural speech system is less than employing a warehouse full of human support staffers.

Which is why there's a future for natural speech technology. Claire, who can handle 4,000 calls at once, never gets grumpy, and more importantly, won't be asking her boss for a raise anytime soon.

Mark Frauenfelder is a writer and illustrator from Los Angeles.