Voice is set to be the biggest enterprise tech disruption since the smartphone—if we can overcome the remaining practical difficulties.
Over the decades, computer interaction has slowly become more natural, with the computers adapting to users rather than the other way around, with the slow progression from punch cards to keyboards and computer mouse devices.
Currently, touch interfaces are the default standard in almost any device that includes a screen, from cameras to cars (and they’re habit-forming! Am I the only one who occasionally tries to swipe TV screens or pinch-to-zoom in on paper images?).
Now it’s time for the inevitable big step: talking to your corporate device. In the movies, at least, we’ve been talking to computers for a long time, from Jarvis in Iron Man to the famous scene in Star Trek IV: The Voyage Home, where Scotty tries to issue instructions to a 1980s computer (see it here).
Of course, voice assistants have been available in mobile phones for years Apple’s Siri came out in 2011. But they have had limited impact on corporate environments so far.
What’s changing is that advances in computing power and machine learning have enabled computers to transcribe speech better than human beings, and then accurately interpret the result, without cumbersome coding. And the new systems can more efficiently update themselves, learning from their mistakes rather than requiring explicit instruction.
The number of voice-enabled devices continues to skyrocket—over 50 million of them are expected to be sold this year and the global voice tech industry is expected to reach US$126.5 billion by 2023.
There are real benefits to voice interfaces for workers, especially compared to fiddly mobile keyboards. A study earlier this year by researchers from Stanford University, the University of Washington, and Baidu USA found that voice input with mobile devices was nearly three times faster than typing, will little difference in the error rate between the two types of input.
One of the biggest benefits of voice interfaces is that they can provide a “universal remote control” to enterprise business systems. Google has already demonstrated that voice can be used to connect computers to people-based systems, for example setting up an appointment at the hair salon:
These kinds of connections can make even more sense in the corporate world. Every large organization has many different applications, and providing a coherent workflow across them has always required coding, which can be slow and expensive. But if the systems are all voice-enabled, workers can easily switch from one to another without requiring any explicit integration.
In addition, it’s likely that enterprise systems will start talking to each other. It’s very difficult to implement universal standards in computing, and using voice—although obviously inefficient—could be a very pragmatic short-term solution for system integration. We’re already seeing this in the consumer world, with the recent news that Amazon’s Alexa can now “talk to” Microsoft’s Cortana and vice-versa.
All these advances mean that corporate digital assistants are going mainstream, providing chat and voice interfaces to a full range of corporate business activities, including easy access to HR information like outstanding vacation days¹:
Voice is also becoming part of innovative service approaches. For example, Workheld provides innovative field management systems to increase the productivity of construction and service processes. The company uses machine learning and text analytics technology to automatically match customer jobs with the most appropriate technician, with hands-free voice interfaces to help workers through maintenance steps.
There are still some problems with voice interfaces—for example, it can be hard (or just embarrassing) to use in a busy office where there’s a lot of background noise. As ever, technology companies are working on solutions: for example, more advanced noise cancellation using AI, picking out your voice in a crowd, reading your lips, or even detecting your words directly through your jaw, without you having to say them out loud (so maybe the future is actually about voiceless voice interfaces?!).
There are also clear dangers. For example, all those microphones provide ample opportunities for nefarious eavesdropping and personal tracking (from George Orwell’s 1984: “there was always the danger of concealed microphones by which your voice might be picked up and recognized “)
And how will corporate security be enforced with voice interfaces when algorithms can create uncannily accurate synthesized speech? There are already problems with corporate “vishing” today—what will happen when the distinctive (but faked) voice of your CEO connects to the system to request an exceptional wire transfer? (Or when your teenage daughter calls you to say she’s in trouble and needs money?!).
Overall, I expect there to be huge leaps in voice adoption in the enterprise this year, primarily through optional voice interaction with chatbot interfaces. You can read more in this Digitalist Magazine Executive Quarterly article I helped co-author: “Say What?!”
And then we can all look forward to the next big interface wave in the enterprise: artificial and augmented reality!
This article originally appeared on Timo Elliott’s blog, Digital Business & Business Analytics, and has been republished with permission.
¹consult days off, Recast.AI