TMX: Translation Memories in XML format

TMX
In a previous post I described XML as a standard for exchanging information. There are some specific XML formats which, in turn, have become pretty much standard in the world of translation services. Two of them are TMX, for exchanging translation memories, and XLIFF, a format that helps with the process of translation itself. I’ll talk about XLIFF in a future post.

A TMX file is, as all XML files, a plain text file with well nested tags. However, in this case the tags have got a special name structure which facilitates the identification of particular bits of data.

The main content of the file is a series of “tu” tags, each of them with “tuv” tags containing the segments in a particular language.

          ...          <tu>          <tuv xml:lang="EN-GB">          <seg>General provisions</seg>          </tuv>          <tuv xml:lang="ES-ES">          <seg>Disposiciones generales</seg>          </tuv>          </tu>          ...

This fragment shows the segment “General provisions” and its Spanish equivalent. The segment itself is contained in a “seg” tag with the attribute xml:lang labeling the appropriate language.

One of the reasons why we find this format extremely convenient arises in the context of engine training for Machine Translation. The training requires high quality bilingual texts. The quality is measured along linguistic and technical axes. The linguistic quality is provided by the good quality of human translators. The technical quality lies in the fact that the segments come from a translation memory with perfect matches between source and target.

All CAT (Computer Assisted Translation) Tools that I know support the TMX format. Regardless of the CAT Tool used, and since it is fairly straightforward to export these memories into TMX, we can make good use of the translation body accumulated over the years.

If you would like your company to try the wonders of working in a translation project using this technology, contact us to get a free quote.