Voice Input

The Voice Input pattern supports the design of a voice or speech interface. This pattern relies on a solid understanding of the underlying user task and how to apply Automated Speech Recognition (ASR), Natural Language Processing (NLP) and speech output in order to support the user. Three levels of speech interaction are:

  1. ASR with text output
  2. ASR combined with NLP and text output
  3. ASR combined with NLP and spoken aloud text output 

All of the above interactions leverage constantly improving ASRs and solve one of the biggest obstacles of mobile interfaces: typing on a soft keyboard is not an efficient input method. Each Mobile OS has a different solutions, but all have similar approaches for voice input:


Appearance characteristics for this pattern.

The voice interaction pattern has four basic states:


Common behaviors for this pattern.


Usage guidelines for this pattern.

General voice and Natural Language Processing (NLP) guidelines:



Fig 1. Voice Input - Standby



Fig 2. Voice Input - Active



Fig 3. Voice Input - Processing



Fig 4. Voice Input - Output