Human vs. Machine (Translation)
… or the story of how Google Translate beat (by a smallish margin) Human Translate and of what you pay for is what you get.
We did something naughty, sparked by a highly specialised job we quoted for being given to someone who quoted 9 times lower. It fired back for the potential client so badly that the end-user didn’t understand a word. But that’s another story we will tell another time.
So, we sent a sample to three translators in three price ranges. Sample 1 was translated at a price roughly equal to what we quoted. Sample 2, to someone who quoted roughly 9 times lower, and sample 3, to someone who asked for a price somewhat in the middle of this range.
First and foremost, please keep in mind that we often translate and review very-to-extremely specialised medical/pharma and technical texts, which the sample was not. It was just slightly more complicated than a text for the general public, i.e. it had some tricky terminology, but nothing one cannot find in a dictionary or in online glossaries. Second, bear with us, there will be some tips and tricks.
Sample 1: clearly translated by an English-speaking professional in that specific field, as promised. Not a very cunning linguist, yet, but on their way to it. This one only needed some minor style and consistency changes and congratulations on the excellent terminology work, therefore not making it into this article.
Sample 2: so bad (terminology, style, punctuation, typos) it made us cringe and wonder if it was Google-translated. And made us sample a portion of the text and run it through GT. Lo and behold, it was all human and its averaged error score was around 110%.
Sample 3: two sentences in, it looked familiar, so we ran the same sample through GT. And the next paragraph. And another one. Go figure, minus some changes, it was Google-translated. However, using the same type of scoring, it had an averaged error score of around 70%.
Let us explain how this specific type of error scoring works. First, you have the types of mistakes. we scored translation, accuracy, terminology, style, consistency and punctuation mistakes. There are, usually, 3 error categories: minor, major and critical. You might get an excel spreadsheet with predefined formulae for the category totals. It means that every time you insert a number in the minor category, it stays as it is, gets multiplied by 5 for major, and by 10 for critical. So, a typo in an otherwise correct term would be 1x, providing a less than ideal translation would be a 5x and a full mistranslation would be a 10x. What is generally acceptable is a percentage of errors no higher than 5-10% with, preferably, few, if any, major or critical, as well as repetitive, errors.
There might be cases when a translator used a less than ideal term (i.e. correct, but not in the given context, such as `teste de laborator’ instead of ‘analize de laborator’ for ‘lab tests’) repetitively and had very few or no other mistakes. This is when a good reviewer steps in and explains that, despite the, let’s say, 15% error score, the translator actually did an excellent job, and the error was easy to fix with the Find/Replace function.
Now, back to our case study. How could a translation end up with a 110% error score? Well, first of all, the major and critical errors were many. A lot of terms were either mistranslated or not translated at all, there were repeated major grammar and spelling errors that couldn’t possibly have been typos, and the punctuation was all over the place, either copied from the source language, or simply wrong. We’re not saying we’ve never mistranslated anything, because we all do, but at least the reviewers we work with do not have to worry about grammar, spelling and punctuation. And (see meme below)…
How comes Dr. Gurgle did a better job? Well, one thing it learned reasonably well is grammar, spelling and punctuation. However, terminology-, style- and consistency-wise, it was not great, and that’s an understatement. What puzzled us, to be honest, is how, given that they had a pretty good basic structure, didn’t bother reviewing at least the terminology. Unless it is actually horrendous, we gloss over subpar style and maybe mark such errors as minor, but when you don’t even bother to check the machine translation for terminology and consistency, and we actually have to retranslate 75% of your work, believe me, we are less inclined to be forgiving.
First, dear potential client, what you pay for is indeed what you get, and dear colleague, if your price is so low your workload is 10 times over what’s humanly possible, we can guarantee your work will be, in the best case, subpar, even if you could be good enough for a specific field. Also, dear colleague, at least use a free CAT-tool, for Jerome’s and consistency’s sake.
Second, let’s be honest with ourselves, machine translation is here to stay. Neural machine translation works reasonably well whenever the corpora are big and good enough and will need a specific type of review, because the errors it makes at this moment are either so subtle that it will need double or triple reviews, or so bad that they will need retranslation. In our opinion, it won’t be great any time soon in language pairs such as ours, between major and minor languages, because the demand is not big enough to produce massive corpora. It will, eventually, get better, but not in the very near future.