Hello,
Does someone know how to calculate the word count of a Thai document?
Printable View
Hello,
Does someone know how to calculate the word count of a Thai document?
I second that question. Is it like the wordcount for Chinese?
According to what I've read, it's not quite the same. I can't find the ratio of characters that become words. Also, it seems that you have to consider the ratio and that some words can be separated and thus counted as words without the ratio.
Good question danielr. As far as I know, the concept of "Wordcount" for us westeners does not quite exist in languages such as Thai for instance. Indeed, there is no spacing between words, which makes it very delicate to count for a software. What you can do is pretranslate the document, and count the target. Also, you can ask a native to give you an approximate Wordcount, this way you can compare it to the target Wordcount, and maybe get a relation (equation like) between wordcount and character count.
Again, I don't think there's any software capable of giving you an accurate Wordcount for Thai (or Japanese for instance). Here is another point where machines will never be able to replace the human brain!;)
Thanks, Nabyl. You are right. There is no space and the concept of "word" is very polemical.
The problem this time was that the file was not editable. I just had an image from the client and my brain couldn't count its characters either ;)
I understand. This way I recommend sending it to a native to get an approximate number. However, if the file is very small, and you dont want to loose time (different time zones), comparing target wordcount and characted count seems to be the best option. Did you manage to get a good quality converted file?
That is what I did. The native confirmed the word count the following day. The thing is that the file didn't have a good quality for a conversion.
I also tried to google how to calculate Thai, but I couldn't find a good explanation.
Are you recreating an editable version of the file or translating it directly in Word?
The project has not started, but for 200 words and a plain format, I wanted to translate directly in a Word.
Oh too bad. So you won't be able to get the exact source character count and target wordcount to extract a relation. That would have been useful for future projects. If I get to handle one of these projects, I'll let you know about the "equation" I get.:D
Yes, I'll have to wait for another file to calculate :(
But let\'s keep in touch if we get a file like that. We may need many tries before getting to a conclusion.
For Thai document such type of word count not available it only applicable for the western country language.
Thank you nabylm for your points, but I would like to friendly present some different ideas. :)
You mentioned " As far as I know, the concept of "Wordcount" for us westeners does not quite exist in languages such as Thai for instance. Indeed, there is no spacing between words, which makes it very delicate to count for a software". Well, this might not be 100% true, as to my own personal experience, MS WORD can count the Chinese characters as easy as it does with the English words. The concept of "wordcount" still exist in oriental languages like Thai, Chinese, Japanese and Korean. Or more precisely, we would call it "character count" to reflect the actual nature of these text.
I myself had been working as translator in the language pair of English<>Chinese, and am naturally keen on checking up the ratio between the Chinese character count and the English word count. I could suggest with a reasonable level of certainty that 1,000 Chinese characters can be approximately translated into about 600-700 English words, or 1,000 English words be translated into about 1,500-1,700 Chinese characters, variying depending on the natures of the source contents and target writing styles. I assume the Thai - English should also have some sort of character-word count ratio like this, and you could just get it confirmed from an experience Thai translator?
Cheers.
Hello lingotext,
Thank you for sharing your ratios and experience.
We are discussing here one of the most controversial topics in linguistics as we can’t define what can be a word. My romance language mind likes to segment according to the concept of potential pause as a limit for a word, but it is not enough and it will not work in Chinese, Thai, Japanese, etc.
I am about to receive a Thai translation and I will share how was the output when it is ready.
Hi lingotext,
If you say that MS Word can count Chinese characters as good as English words, I believe you and will use it in the future.
I think the only way to get an idea of the wordcount is by having a relation/ratio between characters and words, for each language. In your case, a 1.5 ratio (approx) for English>Chinese could be used as a reference.Quote:
What you can do is pretranslate the document, and count the target. Also, you can ask a native to give you an approximate Wordcount, this way you can compare it to the target Wordcount, and maybe get a relation (equation like) between wordcount and character count.
However, the ratio for Thai, Japanese, Hindi or Korean may differ (thanks for sharing with us danielr what you get from your ongoing project)
In addition, the ratio 1.5 for English>Chinese doesn't take into account the material domain (legal, medical, literature, etc)
One more question, do you think it would me more accurate to quote those projects with character counts instead of word counts?
Here is what I got from the Thai project. My Word version said that there were 464 words in English and the target file in Thai had 126 words and 3,282 characters without spaces. When I double-clicked on the string of text, only one precise part was selected and counted as a single word.
According to what I’ve read, Thai doesn’t work as Chinese or Japanese because it combines the notions of characters and words.
So approximately a 1 to 7 ratio !!!!
Are you sure?
I won't venture to state a ratio. I can't find a logical proportion.
Here are more numbers to see if we can find a way of calculating it. I have some files in English with 577 words and the target in Thai has 3314 words and 3834 characters.
And as bonus, I had an English text with 300 words and the Khmer target had 100 words.
It seems like we are still within that 1 (engish word) to 7 (thai characters) ratio ...
Yes, the ratio is getting repeated. My concern is the way of calculating. It's not very clear what's the notion of word in Thai and what the software takes into account.
Indeed, the right question would be: Does this ratio reflect correctly the amount of work a Thai translator provides to translate one English Word? In other words, are 7 Thai characters equivalent to translate one english Word to one Spanish (for example) word?
It would be a good idea to ask Memsource's support if they can reveal their way of calculating. Just for reference of course.