3 minute read / Aug 18, 2017 /
The Limiting Factor of Voice and Dictation Adoption
Harry Stebbins published a podcast with David Beisel this week in which they discussed the importance of voice. David says, “Voice is the most natural user interface possible.” I think the biggest challenge for voice is the skepticism and cynicism engendered by a decade or two of poor experiences. It’s no longer the technology.
One of my partners recently switched from an iPhone to an Essential phone and was stunned by the accuracy of the voice interface Google offers. Voice has become far more sophisticated than just a few years ago, and we are quickly moving past the stage of toy applications.
I’ve written before about how each of my emails are dictated. This blog post is dictated and edited entirely by voice. If I make a mistake, I can say “insert semicolon after dictated” and the computer will insert the; after the word dictated.
If I’m curious about cash conversion cycle, I can stop in the middle of the sentence, and say “search Google for cash conversion cycle”. My Mac will launch Chrome and issue the query in Google. When I’m done, I say “close window” which closes Chrome and “switch to Typora”, which is the name of my favorite text editor for writing blogs. Then I can can continue speaking and drafting this post.
When I speak the command “publish blog post,” the computer executes a shell script which does four things. First, a program verifies the syntax of a new blog post. Second, a script resizes the images I use in that post to the optimal size and format for the website. Third, Hugo, the static blog engine I use, compiles website. Fourth, it calls the Amazon Web Services command line interface to synchronize the local website I just compiled to S3. That happens less than four seconds, in the background, all from a voice command.
If I’m researching a company, I can say “research Looker”. The computer will open up four tabs in Chrome - RelateIQ, Crunchbase, Mattermark, LinkedIn - and issue the query in each of those services so that when they have all loaded, I’m looking at the Looker page in each of those services.
Voice assists with small things too. I can also move windows around my desktop by saying top left, bottom right and the windows will resize. “Play music” starts Spotify. “Next song” changes the music. I can say “Gmail” in my email will load. “Open” opens the first email. “Reply” begins a reply. “Send to Asana” forwards the email to Asana and then archives that message.
I read about an engineer who configured his voice dictation software to allow him to write code entirely by speaking to his computer. Imagine typing C++ code by speaking it. For someone who is native in the language, it’s far more natural, and up to three times quicker than typing. Plus no more RSI issues.
We’re simply not that far away from incredibly sophisticated applications and uses of voice technology when interacting with computers. We are seeing voice become a common use case within the home with Alexa and Google Home. Or in the car, by dictating text messages that you are going to be late or asking for navigation instructions.
That familiarity and consumer use cases will dissolve the skepticism users have accumulated over years of disappointing experiences. Voice will come to the office in a very big way and enable far more sophisticated and complex use cases than we have seen in the past from data input to sophisticated analysis.