2 minute read / Aug 25, 2016 /
A Breakthrough in Human Computer Interaction
It’s no secret I believe speech is the next input mechanism. We are in the Voice-to-Text Era. I wrote in late 2014 that speech is the fastest user interface, and the newest speech recognition experiments confirm it.
Andrew Ng, a luminary in the world of machine learning, and his teammates at Baidu, Stanford and University of Washington have developed Deep Speech 2, a neural network based speech recognition system. They tested the speech and accuracy of the system and compared it to people typing on their mobile phones.
The results were clear no matter the language. For English, speech recognition was three times faster than typing, and the error rate was 20.4 percent lower. In Mandarin Chinese, speech was 2.8 times faster, with an error rate 63.4 percent lower than typing.
There are many limitations of speech. It’s not always a convenient input mechanism; speaking to your phone in the middle of a meeting is bound to derail it. And speech also faces the uphill battle of changing societal norms. Saying “send email to Tomer hi comma how is the pilot going question mark” will likely arouse the same feelings of judgemental hostility to geekdom in passersby as the Bluetooth headset five years ago.
Regardless of all these frictions, speech is much faster and far more accurate irrespective of language. This speed advantage will render speech to be the primary form of input to computers, initially with mobile phones, but ultimately with laptops.
The ramifications are broad. We will redesign offices to contain the persistent murmurs of people speaking to their machines. Natural language understanding, the science of computers deciphering our meaning, will become critically important to master for major Internet and Software companies. Users and buyers of software will change their buying parameters to include speech recognition.
Ultimately, we will all be more productive for it. The best tools are the ones we don’t recognize we’re using because they are extensions of our bodies. Learn to use a fork to eat, and quickly, spearing a morsel on tines becomes as natural as raising a peach to your lips.
The QWERTY keyboard, a relic of an era when the typewriter needed to slow the typist to prevent hammers from criss-crossing and jamming, will be a curiosity, a relic soon. Speech will replace it, and transform the human-computer interaction in the process.