How Are Word Counts Calculated for Complex Documents?

calculator-300x288A few days ago we presented a first introduction to the issue of obtaining a word count for translation projects. Today, we will look at some important details that are crucial for further understanding this fundamental process in the industry. To do this, we will use practical examples of common quotes that we may confront any day in our line of work.

A very common case is that of personal documents in need of a certified translation. How many words does a scanned birth certificate have? What about in an academic transcript? It is difficult to know a priori, because these documents never come in an editable format, that is, in the type of file pertaining to the software with which they were originally created. Often times, they are scans of a printed version. The best way to calculate the word count in these types of documents (which typically have the extensions .pdf or .jpg) is to process them with optical character recognition (OCR ) software. This will allow us to open its conversion to a Word document, for example, and treat it more or less as if it were originally a .doc file. Of course, these OCR programs are not infallible, so we have to refine the estimate given this margin of error.

Another everyday situation is quoting files created with some version of Adobe InDesign. These usually have a commercial purpose or are for professional dissemination, and lend themselves well to the use of translation assistance tools. However, the process that we must follow in order to quote them is not an immediate one. First, the .indd file must be converted to some other format capable of being edited with CAT tools, such as .idml or .inx. From that point, we can make a word count with the CAT tool to be used for the translation. Finally, we add those words that, since they were part of uneditable images or charts, were not counted in the above process. This will allows us to determine if there is a high number of repetitions in the document, or if, due to a previous translation memory being taken into account, we already have part of the translation completed.

In the next post we will discuss a quoting process even more complex than those discussed up to this point. Meanwhile, feel free to contact our sales team to request a free quote, or use the comments section of this post to express any doubts about the quoting process.

To view the Spanish version of this post, go to:

¿Cómo calcular las palabras de documentos complejos?