The process of translating MS Office documents or web pages with Trados or other CAT Tools is excellent, but many of the documents we receive to translate are in PDF format. What do we do then?
Here are some tips that can be of much use when converting PDF documents.
1. Ask if there are any original files
Suppose a customer has an instruction manual designed entirely in InDesign, but they want to distribute it on their website. Only people with InDesign would be able to open that file. So to make it accessible to everyone, we would publish it in Portable Document Format (PDF) to be opened on any PC and on any platform (Windows, Mac, Linux).
But while the distribution of PDF files is easy, extracting translatable content is a much bigger challenge. So, first of all you have to ask the client to send original files. In the absence of these files, we have other alternatives that will detail below.
2. Dealing with customer expectations
The client must be informed of the challenges of converting PDF files, i.e., it may be possible to extract the text, but the format can be (partly) lost, especially when the document has columns and text boxes. If the client expects to receive the translated document in the same format, you may need the service of DTP. Will the customer be willing to pay extra for this service? Not to mention scanned documents with multiple overlapping text stamps, tears, coffee stains or handwritten comments. These PDFs are nearly impossible to convert, and probably require a prior transcription of the text.
3. Choose one or more reliable PDF converter
Generally, translators use third-party tools to convert PDF files into Word, PowerPoint or Excel. Some say it works perfectly, while others have tried and have found it useless since the formatting may be affected directly or even lost.
The good news is that the versions of SDL Trados Studio 2009 and 2011 support reading some PDF files. The bad news is that it has many limitations because it only works with PDF files whose text is editable (easy selection and generated directly from MS Office and design software). Also, Acrobat Reader or Foxit Reader are free tools that let you open a PDF document and then save it as a text file. But in doing so, an “enter” will appear at the end of each line, which is incorrectly segmented for CAT tools.
The solution is to get a, OCR converter, such as PDF Solid Converter or Abby Reader, or use free online tools such as PDFtoWord.com. Abby is a good alternative for simple scanned documents, while the Solid works only with PDF files generated from software.
In short, we must be well prepared, and it never hurts to have two or more converters installed since, as Forrest Gump would say, a PDF is like “a box of chocolates, you never know what you’re gonna get.”