+ Reply to Thread
Results 1 to 3 of 3

Thread: Simple regular expressions

 
  1. #1
    Contributing User
    Join Date
    May 2011
    Posts
    166
    Rep Power
    103

    Default Simple regular expressions

    Hi there

    People for whom words are their daily bread and butter would find a close acquaintance with regular expressions most useful. (Wow, that was a literary sentence!)

    It seems that "grep" has got its name from "global regular expression parser" or something like that. In general, what we pass as a first parameter to this command is a regular expression. All the examples in this thread are literal strings. The first slogan for today is:

    • "literal strings are the simplest regular expression, and they match themselves".

    That is, the word "line" will match an "l" followed immediately by an "i", etc.

    Normal letters, digits and spaces behave very well. But there are a number of characters with special meaning. I'll introduce the two most used, and will leave the rest for other posts.

    • The dot (full stop) character ".": matches any letter
    • The asterisk, (star) character "*": matches cero or more occurrences of the regular expression immediately preceding it.

    Some examples:

    • The expression "l.ne" matches lane, lene, line, lone, lune, and also lbne, lcne, l4ne, etc.
    • The expression "line*" matches lin, line, linee, lineee, and so on, cero or more letters "e" following "lin".

    These are toy examples, of course. But there are a few useful things that can be done with these simple rules. For example grep -o '<.*>' index.html will display the html tags (as long as there is only one per line, more on this in future posts). ".*" can be read as "cero or more instances of any character".

    Enjoy!

    Cheers.
    P.
    Last edited by pabloa; 10-21-2011 at 10:14 AM.

  2. #2
    Contributing User
    Join Date
    May 2011
    Posts
    166
    Rep Power
    103

    Default Re: Simple regular expressions

    Hello

    Taking it one step at the time, when we want to group characters, we do it with square brackets. So, the notation [abc] means any of the letters "a", "b" or "c". Within the brackets there are two useful notations: a range of characters is written with the first and last character of the range separated by a dash ("-"), and also the "hat" character ("^") can be used to negate the content of the square brackets. Some examples:
    - [0-9]: any digit
    - [a-z]: any lowercase letter
    - [A-Z]: any uppercase letter
    - [a-zA-Z0-9]: any alphanumeric character
    - [^0-9]: any character which is not a digit
    - [^&]: any character that is not an ampersand

    Now it's easier to understand a regular expression used in a previous post that finds html entities. These entities always start with an ampersand and end with a semicolon. So this regular expression finds all occurrences of them: '&[^;]*;'. In words, it looks for a single ampersand ("&") followed by any number of characters different to a semicolon, followed by a semicolon.

    Cheers.
    P.

  3. #3
    Moderator
    Join Date
    Mar 2012
    Age
    29
    Posts
    982
    Rep Power
    920

    Default Re: Simple regular expressions

    it's great that regular expressions are the same in most the programming languages and search engines

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •