Word counts or character counts are an essential part of the work of translators and others who need to bill based on word counts. This immediately raises the question of how many words a translator can process. This, of course, depends on the translator’s capabilities and the difficulty of translating the text. The article “Realistic translation times using human translators” from PACTRANZ provides some good information about word counts. However, they make the same mistake as many other LSPs. They use 2500 words/day as the number that fits all languages. The number might fit most of the western language pairs (e.g., English<>German, English<>Italian), but it is inaccurate for Asian languages.
Comparing languages
Comparing languages is a risky thing to do because different countries mean different cultures, different writing systems, different expectations of the customers, etc. So I dare to say that comparing languages is often like comparing apples to oranges.
If someone makes such a comparison, then I would ask if they know the languages well enough to do so. For example, suppose a project manager of a translation agency gives the same number of words to a German translation team and a Japanese translation team with the same deadline. In that case, it already shows that he/she did not consider Japanese a much more difficult language for translation (there are articles online about this).
Why it takes more time to translate into Japanese (all from the articles):
- Four different writing systems (Hiragana, Katakana, Kanji, Romanji), two keyboard typing systems
- the Japanese audience doesn’t compromise on quality
- the translation is completely context-based
- translation often requires transliteration (or “transvocalization”)
- honorific system
- several review cycles to achieve high quality
- Japanese has separate rules for two types of adjectives, nouns, and two types of verbs. You also have two exception verbs and two exception adjectives.
- Japanese is an agglutinative language, having more in common with Native American languages than with neighboring China’s
- preference for the visual over the textual (often calling for re-design of original source materials)
- the level of formality depends on the type of content, the target audience, and the context. The level of formality in the source is often not applicable to the Japanese level of formality.
More technical issues
Encoding (Unicode) AND font must be correctly used, or characters will be garbled. Explanation: Many IT teams do not seem to know that Unicode encoding alone is insufficient because Asian languages like Chinese and Japanese use the same Unicode pages for Kanji characters. The correct font is needed to render the characters properly, or they appear garbled. Because of this, Japanese translators verifying implemented translations often have to create error reports indicating the problem and send the jobs back for fixing. This can lead to delays and missed deadlines.
Count Anything
Count Anything is a free word-count utility for Windows. As mentioned above, word or character counts are an essential part of the work of translators, writers, and others who need to bill based on word counts. There are several word-count utilities around (MS Word and Open Office can provide word counts as well), some of them free, but this one was written because other tools didn’t count Asian characters and/or didn’t support some formats.
The tool counts the number of words, characters (with and without spaces), Asian characters, and non-Asian words. You can select individual files or entire folders to count and drill down on count results to get details about specific files, character types, and Asian character types. It is also possible to enter the URL of a Web page to get a word count for that page, but this is something we didn’t test. And finally, you can save the reports generated by Count Anything as HTML files or tab-delimited text files or print them.
Count Anything counts the words and characters in a variety of file formats. While it doesn’t quite count anything, it supports the following file types:
Microsoft Office: Word (.doc, .rtf), Excel (.xls, .csv), PowerPoint (.ppt)
Open Office: Writer (.odt), Impress (.odp), Calc (.ods)
Other: HTML, XML, Text, PDF
One disadvantage of this tool is that it does not work for the latest formats of Microsoft Office (e.g., .docx, .xlsx, etc.). You will need to convert those file formats to the old formats, and then you can get your word counts.
Another interesting feature is the Command Line Interface of Count Anything. This could be used to integrate this tool into your own software or a batch program.
Available for: Windows