By Stefan Haenisch, Ingo Schulz, Michael Pflanz
Providing learning content in a learner’s native language has always been a major challenge for knowledge transfer in global environments. With all technology advancements, the process has remained highly manual – slow, cumbersome, and expensive. Once content is available in a source language, translators are hired – typically through external agencies – who then manually translate into the required language. Then, to ensure your business specific lingo and context was translated correctly, another intensive quality assurance step is done with local experts – which often takes longer than the translation itself, due to resource bottlenecks. Multiply this by lots of content and lots of languages – and add, as a further ingredient, that the original source content may change while translation projects are already underway – and you soon get to unsolvable scalability and funding challenges. Companies like SAP have succeeded to the highest extent possible in providing a lot of content in local languages – still we cannot provide full coverage in all languages, but focus on the most popular customer demand, while trying to minimize the time to market gap. Not a perfect solution, but for the most part “good enough”.
This need for immediate availability of translated learning content, however, has drastically accelerated with the advent of the digital economy, mainly driven by the following two factors:
- More and more learning is consumed in digital formats as opposed to traditional in-person classroom training. In classroom training, a huge advantage from a language perspective is that with the instructor present, there is a ‘human bridge’ between the content and the learner. So, for example if you teach English course content in Germany or France with a German/ French instructor, this can still work well in many cases. If there’s no instructor in-between content and learner, the need for translated content is extensively increased .
- With the speed of innovation moving faster than ever and software solutions moving to the cloud with updates on a quarterly basis, the original content is so fast paced that there’s hardly any time for translations. If you were to follow traditional processes, the translated version would be already outdated on the day it is released!
The good news is, that machine translation has made dramatic advances, too. Most of us have used examples like Google translator and others in our personal lives, and are impressed by the results. From a learning perspective, it would be a dream if those engines were good enough to provide instant translations of all learning content into all required languages, at the right quality – all the challenges above would be immediately resolved! But how ready for learning is machine translation technology? What can already be achieved in today’s reality, and what do we expect to see in the future?
Machine Translation – the new hype
The topic of machine translation has been around for quite a while. It started in the 1950s with rule-base machine translation (RBT) followed by Statistical Machine Translation (SMT) in the 1990s. SMT was already quite successful and brought machine translation to everyone’s attention, but often people only remember the big failures; because SMT isn’t that good when the reordering of words is required. It only works well for certain language pairs and the resulting translation isn’t really fluent in many cases.
For a few years, there’s a new kid on the block: Neural Machine Translation (NMT). It has become one of the successful areas of machine learning – besides autonomous driving, image recognition, and classification tasks. The new NMT approaches benefit from the availability of strong and affordable hardware power, based on Graphical Processing Units (GPUs) and decades of human translated content – ideal pre-requisites for training a machine learning model, while trying to simulate the neural network of a human brain.
As NMT translation results show significantly better results compared to SMT translated content, it became one of the machine learning darlings: machine translation was a key note topic at the big developer conferences from Microsoft, Google, and even SAP (check out Jürgen Müller’s keynote from SAPPHIRE Now 2017). In addition, new gadgets like Google’s new earbuds also caught a lot of attention and publicity: “Real-time translation in up to 40 languages”, “A human dream comes true” … and so a hype was born.
Neural machine translation – a hype with promising results
SAP has built its own SAP NMT in the context of the SAP Translation Hub. The SAP NMT currently supports translations between ten language pairs. It’s available as an alpha translation service to developers in the SAP Business API hub on the SAP Leonardo Machine Learning Foundation and can also be tested via SAP Translation Hub.
And the SAP NMT delivers pretty impressive translation results. In our first internal assessments leveraging NMT for key and end user tutorials for three language pairs, we could prove that
- Human translators preferred NMT more than five times over SMT translations
- NMT translations were ~2 times faster to correct during post-editing
- And NMT translations required much less corrections at all compared to SMT e.g. More than 60% of English to Chinese translations didn’t require any human post-editing at all
Correlation between CharacTER (Translation Edit Rate on Character Level) and post-editing time
In summary, NMT results require much less human post-editing, the translations are more fluid than with SMT, and it’s really promising, … but you shouldn’t expect the same translation quality as from human translators. You should expect that issues will remain and even with funny errors, e.g. on Amazon, I looked at a cook book automatically translated to German: One of the recipes translated turkey (the bird) to Turkey (the country) … In some cases NMT results will show strange translation results and parts of sentences might be missing. But if a 70-80% quality fits with your translation use case, e.g. supports you in learning about a certain topic, it’s already a brilliant tool today.
Real-Life check with 10,000 openSAP Learners
To put NMT to the ultimate test with external users, SAP Globalization Services teamed up with openSAP and used NMT to translate English transcripts from the openSAP course, Enterprise Machine Learning in a Nutshell, in December 2017. openSAP is SAP’s Massive Open Online Course (MOOC) platform with a global audience. For this pilot course, we provided fully machine translated video subtitles in four languages – German, French, Spanish, and Portuguese. During this course, we then collected user feedback to check if the quality was acceptable and if the subtitles were helpful in the learning process. We received good feedback with over 80% of learners stating that they would like to see more machine-translated subtitles on openSAP, even if the quality is less than 100%. We also saw a large number of learners volunteering to support our future efforts to improve the quality of translations with NMT.
So today, does machine learning solve the learning globalization challenge already?
Well, regarding the dream of getting instant perfect translations of all learning content out of the box, we’re certainly not quite there yet. However, it already provides interesting use cases where ‘good enough’ rather than perfection is aspired. In addition, if combined with some human post-editing, it can provide significant efficiency gains. At SAP, we’re certainly looking with high focus into how we can leverage the best out of this today. And if you consider that today, in such a hybrid scenario, more than 70% of machine translated sentences don’t need to be touched anymore, we’re very confident that the ‘translation dream’ coming true completely is not too many years out from today.