It is the year 2266. These are the voyages of the starship Enterprise. Its five-year mission: to explore strange new worlds. To seek out new life and new civilizations. To boldly go where no man has gone before!
If you are over 40 it is very likely that these iconic words brought you in front of the family TV to follow Captain James T. Kirk, Spock and Leonard McCoy on their voyages and adventures in faraway galaxies. A cocktail party fact (CPFs): While the original airing of the show between 1966 and 1969 had only limited success and barely made it through four seasons it has created a new universe and is now a major landmark for the genre. To this day it attracts thousands of fans to conventions. Two more CPFs? The T in James T. Kirk stands for Tiberius and Leonard McCoy in German is not Bones but Pille (Pill).
In addition to finding facts to impress party guests, it is also interesting to watch the show now almost 50 years later from a technology perspective and check how visionary some of the predictions have been (or haven’t). Many of the gadgets are now already available like the communicator (smart phones) or variants of the phaser weapons. And I personally would love to see breakthroughs in beaming technology utilized by the transporter deck on the Enterprise.
In this blog, let’s take a closer look at the Universal Translator. Simply put it allows for the instant translation from any language into every other language. With this technology Captain Kirk was able to communicate in real-time with let’s say Klingon or Gorn representatives, not only reading the transcript, but listening to the voice of his counterpart. Transposing this into my personal world would mean that I could communicate with my Japanese colleagues each using our own language and hearing the translated voice of our counterpart.
So how close are we to this scenario? Natural voice recognition and interaction have reached maturity for the mass market. Digital assistants like SAP CoPilot, Siri or Alexa are seeing more and more adoption and getting smarter all the time. The nature of machine learning tends to keep enhancing the quality of results, the more data is available. So, the more these digital assistants are used, the better their performance will become.
Translation capabilities have made similar leaps over the last decade. Ten years ago we were joking that the badly translated menus at our vacation destination were run through Google, creating all kinds of funny constellations. Today, Google translations are already getting quite close to a good and correct translated version. And just like the natural language capabilities, Machine Learning and Artificial Intelligence capabilities continue to evolve and improve, the more data they have available.
So that leaves the generation of your translated voice. And maybe not too surprising, in this field as well, we are not too far from the version implemented on the Enterprise. For example, you can check out Lyrebird, one company that is active in natural voice generation. While still recognizably not my real voice, it is not that bad either. And I have only delivered the minimum number of sound bites as input. It’s all about the data sample, so if I continued to supply voice samples, my avatar voice would relatively soon be almost “the real thing.”
And once we have that, my speech would be translated into text via natural language recognition, translated into Japanese and then intoned by my digital voice avatar. How far away from primetime is this? At the rate of innovation that we are currently seeing in these fields I’d say no more than a couple of years. I am pretty sure that by the time we send our first manned mission to Mars this will be a mature solution, at least for many known languages – which interestingly enough might include Elfish and Klingon.