Renaming Files for Website Translations

Renaming files

Consider the following scenario, which has come up a few times in our company: we need to translate a web site that has got lots and lots of files, all of them called “index.html” nested at different levels in their own folder.

I’m told that this is a nightmarish situation to manage with some Management Systems that accept single files for translation, since afterwards it’s very difficult to match up which translated file corresponds to each source.

One idea that works nicely is to rename each file with the full path of the directory structure. This guarantees a unique name, and at the same time it keeps the location of the file within this structure with it.

For example, say you have two files

site/main/index.htmlsite/deeply/nested/folder/index.html

Then we can rename them as

site_main_index.htmlsite_deeply_nested_folder_index.html

After translating them, replacing back the folder separator character (“” in Unix) will restore the original name and the correct folder structure.

This particular case works fine because the character used to replace “” doesn’t appear in the names of the files. For real life situations I usually choose “__”, which is a very infrequent combination of characters (although it will fail if any folder name ends with “_”), but any character or combination of them that doesn’t appear on the names will work.

At this point, I can hear the clamor of the masses roaring you can’t do that for dozens of files, it will take forever! Well, I just had to do it for 150 files, and it took me just a few seconds. This is what I used:

mkdir for_translationfind * -iname *.html > /tmp/files.txtwhile read x; do cp "$x"  for_translation"${x////__}"; done < /tmp/files.txt

And then, once the files have been translated, the conversion back can be accomplished in a flash with this:

ls > /tmp/files.txtwhile read x; do full_name=${x//__//}; dir=$(dirname $full_name);        mkdir -p "$dir"; mv "$x" "$full_name"; done < /tmp/files.txt

Easier done than said!

A quick explanation. The redirection into a temporary file caters for the possibility that the files or folders might contain spaces, commas, or other unruly characters in their names. The double quotation marks (") serve the same purpose. The copy (cp) can be replaced by a move (mv) if so wished (with the copy we keep the originals, just in case). The crucial bit is done inside the curly brackets ({}), but to find out the details you will have to consult a more advanced tutorial on the subject.

Unix wizards will find these methods trivial, but I think that they work well, and they are easily available to anyone with access to a Unix system (Linux, and in particular Ubuntu, which seems to be the most popular Linux distribution).