+ Reply to Thread
Results 1 to 2 of 2

Thread: All tags in XML or HTML files

  1. #1
    Contributing User
    Join Date
    May 2011
    Rep Power

    Default All tags in XML or HTML files

    Hi there

    A nifty way of finding which tags are being used in a file (specially useful for XML files, where the tags can be anything) is using "grep" to get the tags, sort them (with "sort", what else?) and removing duplicates with "uniq":

    pabloa:~$ grep -ohe "<[^/][^> ]*[ |>]" *.xml |sort|uniq 
    It's so useful that I'm going to do an alias for it. Here we are using a few very nice features of the "grep" command:

    • -o: output only the matching bit instead of the whole line
    • -h: don't output the file name where the pattern was found
    • -e: use a regular expression (it seems that this has to be the last flag of the three, otherwise it malfunctions)

    The pattern used ("<[^/][^> ]*[ |>]") can be explained in words like this: "anything starting with a '<', followed by any character different than '/' (so we avoid closing tags), followed by anything which is not a space or a '>', up to (and including) a space or a '>'"

    Improve at your leisure and enjoy at your pleasure!


  2. #2
    Join Date
    Jul 2007
    Rep Power

    Default Re: All tags in XML or HTML files


    Very neat script indeed! Thank you for sharing! XML massaging is unavoidable this days when doing software localization in order to make the CAT Tools to properly digest the wide variety of structures presented by XML. I will try to share my scripts too.

    Best wishes,

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts