PDF files can come from numerous sources. They can be created from an MS Word document, a Power Point presentation, a scanned image, etc. Since CAT Tools do not accept these types of files, our first option is to do a “side-by-side” translation directly over an MS Word file. If we do not want to do this, and use a CAT Tool instead (which we should always do as it is more beneficial to our needs and the end quality of our work); we would have to somehow extract the text from the PDF file to be able to open it in our CAT Tool.
There are different ways to extract text from a PDF file, depending on its source. If the PDF comes from an MS Word document, we simply have to save/convert our PDF into our desired format, in this case an MS Word, and fix the format (spacing, misspellings, etc.). If our PDF comes from another source (let’s say, a scanned document), our approach will be different. If we are a company, we should have a department, or at least someone, who can perform a pre-DTP task (which we’ll explain in more detail in the future) to extract the text. Once our pre-DTP is done, our next step is to pre-edit the file. It is imperative that we do this when dealing specially with context that involves numbers, even when our DTP department is reliable, there are things that may not be converted properly and will need reviewing, very often an “I” will be converted into a “1,” a “c” into a “e,” and so on.