The other day I was wondering how to work with UTF16 encoded files. This is because both INI and TTX files have this encoding. And then trying to do a "grep", "sed", "awk", etc, on them fails because the pattern never matches.
I asked in the yahoo forum Linux for Translators (see this other thread: http://www.english-spanish-translato...anslators.html) and the only reply I received was a suggestion to create my own scripts. I didn't like much the idea because it looks a bit wasteful doing a separate script for each command I wanted to have.
Then I thought about the following solution:
It produces utf8 output, of course, but this can be improved adding an extra pipe at the end.
iconv -f utf16 -t utf8 $filename | $comm $args
It uses the convenient fact that "$0" is the name of the script running, the way it was invoked. I called the script "utf16script". Then in my "bin" directory I created all the commands that made sense to have.
It works surprisingly well.
pabloa$ cd ~/bin
pabloa$ for x in grep sed awk cat; do ln utf16script 16$x; done
I hope someone might find this useful.
pabloa$ grep --color Prints file.xml.ttx
pabloa$ 16grep --color Prints file.xml.ttx
<ut Type="start" Style="external" RightEdge="angle" DisplayText="sentence"><sentence></ut><Tu Origin="manual" MatchPercent="0"><Tuv Lang="EN-US">Prints</Tuv><Tuv Lang="es_ES">Impresiones</Tuv></Tu><ut Type="end" Style="external" LeftEdge="angle" DisplayText="sentence"></sentence></ut>