Hi there

The other day I was wondering how to work with UTF16 encoded files. This is because both INI and TTX files have this encoding. And then trying to do a "grep", "sed", "awk", etc, on them fails because the pattern never matches.

I asked in the yahoo forum Linux for Translators (see this other thread: http://www.english-spanish-translato...anslators.html) and the only reply I received was a suggestion to create my own scripts. I didn't like much the idea because it looks a bit wasteful doing a separate script for each command I wanted to have.

Then I thought about the following solution:


iconv -f utf16 -t utf8 $filename | $comm $args
It produces utf8 output, of course, but this can be improved adding an extra pipe at the end.

It uses the convenient fact that "$0" is the name of the script running, the way it was invoked. I called the script "utf16script". Then in my "bin" directory I created all the commands that made sense to have.

pabloa$ cd ~/bin
pabloa$ for x in grep sed awk cat; do ln utf16script 16$x; done
It works surprisingly well.

pabloa$ grep --color Prints file.xml.ttx 
pabloa$ 16grep --color Prints file.xml.ttx 
  <ut Type="start" Style="external" RightEdge="angle" DisplayText="sentence">&lt;sentence&gt;</ut><Tu Origin="manual" MatchPercent="0"><Tuv Lang="EN-US">Prints</Tuv><Tuv Lang="es_ES">Impresiones</Tuv></Tu><ut Type="end" Style="external" LeftEdge="angle" DisplayText="sentence">&lt;/sentence&gt;</ut>
I hope someone might find this useful.