All tags in XML or HTML files

Hi there

A nifty way of finding which tags are being used in a file (specially useful for XML files, where the tags can be anything) is using "grep" to get the tags, sort them (with "sort", what else?) and removing duplicates with "uniq":

Code:

pabloa:~$ grep -ohe "<[^/][^> ]*[ |>]" *.xml |sort|uniq <city> <country> <description <language> <metadata <title> <topic> <value> <?xml <year>

It's so useful that I'm going to do an alias for it. Here we are using a few very nice features of the "grep" command:

-o: output only the matching bit instead of the whole line
-h: don't output the file name where the pattern was found
-e: use a regular expression (it seems that this has to be the last flag of the three, otherwise it malfunctions)

The pattern used ("<[^/][^> ]*[ |>]") can be explained in words like this: "anything starting with a '<', followed by any character different than '/' (so we avoid closing tags), followed by anything which is not a space or a '>', up to (and including) a space or a '>'"

Improve at your leisure and enjoy at your pleasure!

Cheers.
P.