User Research and Design for Voice Applications
July 29, 2019
By Janet M. Six, UXmatters
In this edition of Ask UXmatters, our experts consider how user research and design for voice applications differs from research and design for traditional, graphic user interfaces (GUIs). First, our expert panel discusses the importance of deeply understanding the context in which people would use an application, as well as the behavior of those who would use it.
Our panel of experts also recommends that we accurately understand the problems a voice application can solve, so it is truly helpful rather than just a cute gimmick. The panel also explores how to collect data from users when you’re designing a voice system or training artificial-intelligence (AI) algorithms.
The following experts have contributed answers to this month’s edition of Ask UXmatters:
- Richard Alvarez—UX Practice Manager at Apexon
- Bob Hotard—Lead UX Designer at AT&T
- Gavin Lew—Managing Director at Bold Insight
Q: How is user research for voice applications different from user research for products with a traditional GUI?—from a UXmatters reader
Do in-field, contextual research or ethnographic studies when doing user research for voice applications.—Bob Hotard
“It’s extremely important to do in-field, contextual research or ethnographic studies when doing user research for voice applications,” advises Bob. “This is probably true for any new user interface or technology, but more so for voice.
”My experience has been that there is too much of a knowledge gap to ask users quantitative survey questions about voice applications. They’re still not common enough to ensure all users understand or can properly respond to questions about a voice user interface.
“For example, I have watched a usability study in which a person who was stuck on the task of how to edit a search field on a mobile phone admitted that he would ‘delete the whole thing and just tap the microphone thingy on my keyboard….” This same person had responded to a previous survey he had never used speech-to-text or voice search. Unfortunately, this scenario wasn’t an edge case.
You must observe how a person uses a voice app in the real world. This can mean the difference between your designing a good or a bad voice app.—Bob Hotard
“Age demographics—boomers versus millennials—might influence the knowledge gap for voice user interfaces. You must observe how a person uses a voice app in the real world. This can mean the difference between your designing a good or a bad voice app.
“Some might look at this observation as stating the obvious in regard to researching any user interface. On one hand, that is true. On the other hand, when designing voice apps, it is critical that you actually see how someone speaks to Alexa in their home versus in a lab rather than your asking in a survey, ‘What commands do you use to ask Alexa to [perform a given task].’ Go where your users are or where they would engage with your application and observe them.”
Leverage Voice as a Solution, Not a Gimmick
A good voice user interface (VUI) is conversational and guides the user through its responses.—Richard Alvarez
“The obvious difference with voice is that, in many cases, you are working without a visual component or a multi-modal user interface,” replies Richard. “A traditional GUI has its own set of design patterns to which we’ve grown accustomed, as users. Menus, buttons, typography, spacing, and of course, our mouse pointer—to name just a few—are the conversation pieces that a visual GUI utilizes. We immediately recognize navigation in the header or know that it’s tucked away under a hamburger menu. Mouse movement on a page and the visual changes of menu items, links, and buttons as we hover over them are a GUI’s response to our interactions, telling us ‘I’m listening.’
“With voice-only user interfaces, no wider view of the content is immediately available. A good voice user interface (VUI) is conversational and guides the user through its responses. So, when we think about a voice assistant, we’re thinking about and planning the end-to-end conversation with users, their situation, and the problem they are trying to solve.
“In many ways, our research for voice and traditional GUIs are the same. We start with understanding the use case to build empathy and learn as much as we can about the what, why, and how. So, although we strongly believe that voice is a tremendous step forward in our interactions with the digital world, there are still cases where a traditional GUI would make more sense. We don’t want voice skills to be a gimmick, but rather a solution that improves our interaction with the digital world.
Our research should identify areas where the use of voice can improve users’ interactions and outcomes.—Richard Alvarez
“Our research should identify areas where the use of voice can improve users’ interactions and outcomes,” continues Richard. “Consider a user baking a cake in the kitchen and wanting to know how many ounces there are in two cups. The user might put down the mixing bowl, wash his hands, go to his laptop, do a quick search, come back to the kitchen, wash his hands again—you get the picture. In this example, asking the question of a smart speaker using a VUI and getting an instant response, without ever having to stop mixing the batter, is much more convenient. It’s a hands-free user interface and lets the user continue his normal activities. It’s also faster, which could make a big difference when someone is cooking or doing other time-sensitive activities. Having a smart speaker in a kitchen to aid a user in performing such simple question-and-response tasks is also a very natural approach.
“This kitchen example is a fun way of illustrating how voice can improve such simple interactions with computers. At Apexon, we’re looking at voice solutions for our clients in warehousing, manufacturing, and back-office solutions. The same advantages of hands-free interaction, speed, and ease of use apply. However, we can now see the results of improved safety, allowing users to continue performing critical jobs in situations where the use of their hands or visual attention on tasks is not only necessary, but a matter of preventing serious injury. Plus, in many cases, our voice solutions result in greater accuracy by providing confirmation before the user moves on to other tasks, as well as by displaying real-time views of data on command.”
Apply the Wizard-of-Oz Technique
UX researchers [need to] conduct the research protocols for high-sample studies because they have the rigor necessary in collecting this data.—Gavin Lew
“Early voice-application work in User Experience involved testing call flows for interactive voice response (IVR) systems, where the user received and interpreted a voice prompt, then the user’s response produced another voice prompt,” answers Gavin. “To support the design of such systems, UX researchers needed to separate the voice system from the design. Imagine that a user stutters or accidentally stumbles when speaking. If we are using a computer to interpret the user’s response, it is possible that the user might become confused about whether the system simply did not recognize an utterance or the flow was incorrect.
By using the Wizard-of-Oz technique—one inspired by the movie of the same name, in which a man hiding behind a curtain pretends to be a wizard—a user researcher can act the role of the voice system by interpreting the user response and providing a voice prompt. This approach allows you to test and refine a flow, without any added confusion about the accuracy of a voice system.
“For flat voice applications, where a call flow might involve a simple stimulus-response, reserve the use of the Wizard-of Oz technique for cases when voice applications extend the voice interaction and add complexity. Remember this Wizard-of-Oz technique as voice user interfaces advance. It can help you refine a user experience. Now, we are collecting voice inputs by a factor of 100 to train algorithms. This is important to do—although it’s perhaps not a job for UX researchers. But we are finding the need to have UX researchers conduct the research protocols for high-sample studies because they have the rigor necessary in collecting this data.”
Want to learn more about Apexon? Consult with an expert here.