Oliver Sturm's Blog - Translation Memory Tools Tried and Found Wanting

Occasionally I do some translation from English to German for some of our products at DevExpress, and also for marketing content. Over the years, I have completed many such little projects and I find it hard to remember previous decisions — did I decide to translate certain terms at all, and if so, how?

Translation memory tools exist for such purposes, sometimes as part of larger CAT (Computer Aided Translation) suites, and there are quite a few of them out there. Many are the kind of software that I’m not really that interested in these days: maintained for 30 years as a Windows only application, and that kind of thing. However, I also found several that are reasonably nice-looking web apps, in the cloud and all that, convenient to use especially for the casual translator. I tested several tools, not to great extent, but to find out whether they would be at all usable for my purposes.

Unfortunately my results were not very positive and it seems to me at this point that the makers of such tools don’t have any understanding of the situation I’m in. Perhaps I’m the only person in that particular situation, I don’t know. I decided to summarize my findings so I’ll remember them in the future if I make up my mind to do another round of testing — and perhaps somebody has helpful ideas or comments.

I have two main issues. The first and most important one is that the tools have no consideration for the file formats I’m dealing with. Yes, they all claim to support 68 file formats (or so), but in reality what they mean is only that they can extract translatable strings from these formats, and sometimes generate the formats for download after the translation has taken place. One of the formats I have to deal with frequently is a simple Excel file. In such a file there will usually be one or more columns with an ID or similar, sometimes a comment string. This info describes the origin of the text that is to be translated, perhaps providing some context. The text itself is in one column and I’m usually expected to enter the target text in a different column. Translation of computer software text pieces has worked this way for decades, in my own personal experience — there are obviously various specialized tools for this particular scenario and they all also adhere to the same pattern. Regardless, the CAT suites I tested don’t understand this at all. They simply throw all the text elements in one big bag: the IDs from column one, the text from column two, all listed in a long sequence. Some import in A1, A2, B1, B2… order, others do A1, A2, A3, … B1, B2… instead. In any event, this makes it pretty much impossible to work through the translatable text items because the important context information is not available anymore.

I have not found a single tool that shows me a tabular document in the way it’s intended to be used: as a tabular document (duh!). For some strange reason, the assumption seems to be that tabular documents are basically just containers for long text that could just as well be in a Word document — no idea what the point of that is. Oh well, being negative… perhaps somebody can make use of this, but I can’t. I won’t even mention the idea of storing the translation in a different cell from the source. Or the fact that I sometimes add a comment about a translation, or a question, to the document in a fourth column.

Another format I also tried is PDF. Obviously the extraction of text from PDF documents is not easy technically, and as expected, the tools that were able to do it at all had loads of invalid bits and pieces included in the segment lists (like text extracted from images and stuff like that). So far, okay. But now: how do I simply click to mark the rows with rubbish in them as “untranslatable, please don’t bother me with this anymore”? Where do I click to see a quick screenshot of the part of the document where a particular text item came from? Well, you guessed it: not possible. Makes me wonder whether the makers of those tools that do support PDF have ever tried translating one — it seems rather impossible to me. These findings lead to me to the statement I already made above: the tools don’t show consideration for the file formats they support. They seem to assume that a format is “supported” if text can be extracted from it, pretty much. Technically, file formats are used for different reasons though, and that is not reflected in the tools at all.

Finally, the second main issue I found is that it is too hard to get started in the sense of actually taking some benefit from the tool. Remember, I was interested in these tools because of the translation memory capabilities. Most tools support uploading a translation memory, but only in a specialized file format. Of course I don’t have such a file because I haven’t used these tools in the past. What I have though is a bunch of documents that I have translated previously. I was surprised to find that not a single CAT suite was able to import these documents for me and use them as a basis for a translation memory. Surely this would be a valuable selling point…