+ Reply to Thread
Results 1 to 4 of 4

Thread: Question about utf8 files

 
  1. #1
    Contributing User
    Join Date
    May 2011
    Posts
    166
    Rep Power
    192

    Default Question about utf8 files

    Hello there

    Does anyone know of a simple way of finding out whether a file has got a particular field unstranslated looking at the characters? I mean, say a field needs to have Chinese, Arabic or Russian words only, I'd like to check if this is the case, or if it still has got only (or any) western characters.

    Can't think of a way of doing this automatically (and sometimes I need to check files that are thousands of lines long.

    Cheers.
    P.

  2. #2
    Registered User
    Join Date
    Jul 2007
    Posts
    12
    Rep Power
    207

    Default Re: Question about utf8 files

    I think an easy way to do this good be to check any character with ASCII value of 255 or less if you want to rule out latin languages you might need to raise the bar a little.

    Hugs,
    James

  3. #3
    Contributing User
    Join Date
    May 2011
    Posts
    166
    Rep Power
    192

    Default Re: Question about utf8 files

    Hi James

    Thanks for the tip. At the end I settled for this simple solution:

    Code:
    sed 's/<[^>]*>//g' *.xml |grep -o "[a-zA-Z]*"
    That is, first stripped off xml tags and then "grepped" any sequence of letters in the ascii range. It produces a list of words which are not in Chinese, or Arabic. It wouldn't work well for Western languages, but it did the job for what I needed. I'm sure it can be improved. For some reason, leaving out the "-o" flag makes it malfunction.

    Cheers.
    P.
    Last edited by pabloa; 08-02-2011 at 08:04 AM. Reason: grammar

  4. #4
    Registered User
    Join Date
    Jul 2007
    Posts
    12
    Rep Power
    207

    Default Re: Question about utf8 files

    Pretty nice one indeed!

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. PSD files
    By Maximiliano in forum General English to Spanish Translation
    Replies: 4
    Last Post: 12-14-2016, 10:42 AM
  2. PO Files
    By Maximiliano in forum General English to Spanish Translation
    Replies: 0
    Last Post: 11-29-2016, 08:52 AM
  3. WAV files
    By chris.r in forum Miscellaneous
    Replies: 12
    Last Post: 10-11-2012, 11:02 AM
  4. VDX (Visio) files
    By gentle in forum Trados
    Replies: 0
    Last Post: 05-29-2012, 10:17 AM
  5. Replies: 1
    Last Post: 02-12-2007, 02:16 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •