In the past years, I’ve been evaluating machine translation output extensively. I started to do this because I was curious, and I ended up doing this professionally. As this evaluation work was for various NLP scientists and translation agencies, I cannot dwell on details, but what I learned in the various tests is this:
- The machine translation output has overall improved a lot. 5 years ago a test document with 1000 sentences contained more than 700 sentences where a translator was taking the risk to waste time editing the machine translated text. Today less than 100 sentences are too bad for editing. “Too bad for editing” means that the sentence the translator creates for delivery is too different from the machine translated text: too many words had to be dropped, added, moved or edited.
What changed in these 5 years? The test set was almost the same: the same source sentences, but different engines. The Systran, Google and Microsoft engines became neural MT systems, and I added Amazon and DeepL to the translation set. (A pity I could not add LILT to the test set — I cannot use adaptive MT in my measuring tools. But I’m sure LILT has improved at the same pace as the others. Maybe even more…) In the same test set, there were also high fuzzy matches from translation memories. Those I never changed as I used them for validating the data.
The sentences in the test set were quite long (15 to 45 words, or up to 300 characters), and the syntax of some sentences was quite complex. Clearly, 5 years ago, MT system could not handle well those long sentences. Today the quality of the machine translated long sentences is much, much higher.
Another observation: translators were asked to label sentences before editing — we asked them if they believe the pre-translated text was retrieved from a translation memory or if it was produced by an MT engine. 5 years ago most translators got it right, because “ “. This year, they could also tell what was MT, but the reason was very different: “ “.
- there are still big differences between the language pairs. For some target languages, I really doubt MT will ever be good enough to be used by professional translators. But for a lot of target languages, it is clear: machine translation is a solid tool that can help to translate more in less time.
- there are still a lot of documents that will never be fit for MT pre-translation. It has just become easier to spot them now.
- nobody should ever trust a translation produced by a machine. Only a human translator can verify the machine did a good or a bad job… and fix it.
So I was wondering: why do we keep on using the term “post-editing”? Maybe we should stop using this term, and just use “editing” or why not just use “translating”? This is what translators do, after all…