-
INI files content
Hello
I don't know if anyone has noticed, but the ini files used by Trados have got each ASCII character padded with an null byte:
pabloa:~/Development$ hexdump -C file.ini
00000000 ff fe 0d 00 0a 00 5b 00 47 00 65 00 6e 00 65 00 |......[.G.e.n.e.|
00000010 72 00 61 00 6c 00 53 00 47 00 4d 00 4c 00 53 00 |r.a.l.S.G.M.L.S.|
00000020 65 00 74 00 74 00 69 00 6e 00 67 00 73 00 5d 00 |e.t.t.i.n.g.s.].|
00000030 0d 00 0a 00 43 00 61 00 73 00 65 00 53 00 65 00 |....C.a.s.e.S.e.|
00000040 6e 00 73 00 69 00 74 00 69 00 76 00 65 00 3d 00 |n.s.i.t.i.v.e.=.|
00000050 59 00 65 00 73 00 0d 00 0a 00 44 00 54 00 44 00 |Y.e.s.....D.T.D.|
[etc ...]
This makes it very difficult to manipulate with standard unix tools. For the time being I've managed to find out how to filter it so I can use "grep" on substrings I might be interested on. First I created a very simple sed script:
pabloa:~/Development$ cat -A cleanup_ini.sed
s/^@//g
Here ^@ is the null character. Then I can do things like:
pabloa:~/Development$ sed -f cleanup_ini.sed file.ini | grep Group
Tag1=type:External,Group
Tag2=native_title_lang:External,Group
Tag12=type_of_place:External,Group
Tag50=ddc:External,Group
With lots of good luck someone might find this useful.
Cheers.
P.
-
Re: INI files content
I've found out why this is so. The files are encoded in UTF-16 where each character is two bytes. The method suggested above works fine as long as there are no characters outside the ASCII range. Otherwise it breaks the encoding. The correct way of dealing with this situation is to convert the file into UTF-8 with the following command:
Code:
pabloa:~/Development$ iconv -f utf16 -t utf8 file.ini > file-utf8.ini
Cheers.
P.