A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device (VCD) is a device controlled with a voice user interface.
Voice user interfaces have been added to automobiles, home automation systems, computer operating systems, home appliances like washing machines and microwave ovens, and television remote controls. They are the primary way of interacting with virtual assistants on smartphones and smart speakers. Older automated attendants (which route phone calls to the correct extension) and interactive voice response systems (which conduct more complicated transactions over the phone) can respond to the pressing of keypad buttons via DTMF tones, but those with a full voice user interface allow callers to speak requests and responses without having to press any buttons.
Newer VCDs are speaker-independent, so they can respond to multiple voices, regardless of accent or dialectal influences. They are also capable of responding to several commands at once, separating vocal messages, and providing appropriate feedback, accurately imitating a natural conversation.
A VUI is the interface to any speech application. Controlling a machine by simply talking to it was science fiction only a short time ago. Until recently, this area was considered to be artificial intelligence. However, advances in technologies like text-to-speech, speech-to-text, Natural Language Processing, and cloud services, in general, contributed to the mass adoption of these types of interfaces. VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations.
VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of computer science, linguistics and human factors psychology – all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user’s mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.
A VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users (including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate “mixed initiative dialogs“, which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated.
Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts.