Hi there

Regular expressions are an exceedingly powerful tool to process text. I'll post a few of the regular expressions I use to extract information from a TTX file. They can be wrapped with your favourite programming language (php, python, perl, even AWK if you dare!).

To start with an easy one, these are the ones I use to get source and target language:

Code:
/SourceLanguage="(.*?)"/
/TargetLanguage="(.*?)"/
The question mark transforms the preceding expression from lazy into greedy, matching the first occurrence of the double quote, instead of the last one as it would do for default. AWK hasn't got this feature, so it should be changed slightly. Also brackets have to be quoted in AWK, so the expressions would become

Code:
/SourceLanguage="\([^"]*\)"/
/TargetLanguage="\([^"]*\)"/
That is, we have to say explicitly we want to match anything except a double quote.

Cheers.
P.