“Conservas Lazo,” Pre-Editing & The Machine Translation Odyssey

I don’t think I’ll ever forget one of the first times I used Google translate when I wrote the name James Bond in the English>Spanish translator. The result was “Conservas Lazo” (we have to agree that it is a fair translation nonetheless). That was Machine Translation back in the day. We can all agree that it has come a long way since the early 2000s before it was officially launched as Google Translate in April 2006. Although we are miles away from those early attempts at machine translation, GMT, or more recently renamed GNMT (Google Neural Machine Translation), the system is not without limitations.

It seems as though the machine cannot make up its “artificially intelligent mind” when dealing with consistency. For instance, we usually see the same term translated differently from segment to segment. And, if you add an unreliable source such as a non-pre-edited converted file… well, we are in real trouble.

Machine Translation (MT) projects, usually involving Post-Editing (PE), require this step based on the use of an MT output, mainly to increase productivity (that is, the amount of words one can work on during a day), and consequently, getting the job done faster. Nowadays, we can rely on MT to accelerate the translation process, but that does not usually mean that the machine alone will do the work. As we have mentioned, we are far from “Conservas Lazo,” but there are still issues that need our attention. One can even say that PE requires more work, in the sense that we have to pay attention to the machine’s inconsistencies, and we more often than not end up retranslating more than post-editing at the same time. But fear not, it is still a smoother process that really increases the amount of work we can do in one day.

Not only it is necessary to pay attention to machine-generated inconsistencies, but it is of utmost importance to be aware of our source content before running the MT. This is where pre-editing comes into play. In a previous post, I mentioned how converted files and copied/pasted content can generate unnecessary problems such as tags in our source when uploaded to the CAT Tool if the pre-editing step wasn’t done well. The same goes for projects which are meant to be done with MT. Tags are not the only issue to deal with. We also have the problem of incomplete sentences, letters that are converted into numbers or symbols, all of which will not be picked up by the MT. And if they are, the machine will read them as something which will not make any sense in our context.

So, the point is, whenever we have a PE project with files that will need pre-editing, do not skip the latter. Our clients rely on our responsibility and professionalism when they agree to pay for PE. They don’t expect a Tarzanesque-(Me Tarzan, You Jane)-outcome, because that they can do themselves. It is expected that we provide all necessary steps to guarantee consistency, a good and eloquent not-machine-like translation, and also maintaining the original context. And in order to achieve that, we need to start with our source content.

And so it comes the friendly reminder: Always make sure that the source file is “clean” before uploading it to a CAT Tool. You will save yourself, and consequently your clients, a lot of unnecessary trouble, especially when dealing with Machine Translation and Post-Editing projects.