Metacharacter | Match |
---|---|
\ | the escape character - used to find an instance of a metacharacter like a period, brackets, etc. |
. (period) | match any character except newline |
x | match any instance of x |
^x | match any character except x |
[x] | match any instance of x in the bracketed range - [abxyz] will match any instance of a, b, x, y, or z |
| (pipe) | an OR operator - [x|y] will match an instance of x or y |
() | used to group sequences of characters or matches |
{} | used to define numeric quantifiers |
{x} | match must occur exactly x times |
{x,} | match must occur at least x times |
{x,y} | match must occur at least x times, but no more than y times |
? | preceding match is optional or one only, same as {0,1} |
* | find 0 or more of preceding match, same as {0,} |
+ | find 1 or more of preceding match, same as {1,} |
^ | match the beginning of the line |
$ | match the end of a line |
POSIX Class | Match |
---|---|
[:alnum:] | alphabetic and numeric characters |
[:alpha:] | alphabetic characters |
[:blank:] | space and tab |
[:cntrl:] | control characters |
[:digit:] | digits |
[:graph:] | non-blank (not spaces and control characters) |
[:lower:] | lowercase alphabetic characters |
[:print:] | any printable characters |
[:punct:] | punctuation characters |
[:space:] | all whitespace characters (includes [:blank:], newline, carriage return) |
[:upper:] | uppercase alphabetic characters |
[:xdigit:] | digits allowed in a hexadecimal number (i.e. 0-9, a-f, A-F) |
Character class | Match |
---|---|
\d | matches a digit, same as [0-9] |
\D | matches a non-digit, same as [^0-9] |
\s | matches a whitespace character (space, tab, newline, etc.) |
\S | matches a non-whitespace character |
\w | matches a word character |
\W | matches a non-word character |
\b | matches a word-boundary (NOTE: within a class, matches a backspace) |
\B | matches a non-wordboundary |
- \
The backslash escapes any character and can therefore be used to force characters to be matched as literals instead of being treated as characters with special meaning. For example, '\[' matches '[' and '\\' matches '\'. - .
A dot matches any character. For example, 'go.d' matches 'gold' and 'good'. - { }
{n} ... Match exactly n times
{n,} ... Match at least n times
{n,m} ... Match at least n but not more than m times - [ ]
A string enclosed in square brackets matches any character in that string, but no others. For example, '[xyz]' matches only 'x', 'y', or 'z', a range of characters may be specified by two characters separated by '-'. Note that '[a-z]' matches alphabetic characters, while '[z-a]' never matches. - [-]
A hyphen within the brackets signifies a range of characters. For example, [b-o] matches any character from b through o. - |
A vertical bar matches either expression on either side of the vertical bar. For example, bar|car will match either bar or car. - *
An asterisk after a string matches any number of occurrences of that string, including zero characters. For example, bo* matches: bo, boo and booo but not b. - +
A plus sign after a string matches any number of occurrences of that string, except zero characters. For example, bo+ matches: boo, and booo, but not bo or be. - \d+
matches all numbers with one or more digits - \d*
matches all numbers with zero or more digits - \w+
matches all words with one or more characters containing a-z, A-Z and 0-9. \w+ will find title, border, width etc. Please note that \w matches only numbers and characters (a-z, A-Z, 0-9) lower than ordinal value 128. - [a-zA-Z\xA1-\xFF]+
matches all words with one or more characters containing a-z, A-Z and characters larger than ordinal value 161 (eg. ä or Ü). If you want to find words with numbers, then add 0-9 to the expression: [0-9a-zA-Z\xA1-\xFF]+
Typical examples
- (bo*)
will find "bo", "boo", "bot", but not "b" - (bx+)
will find "bxxxxxxxx", "bxx", but not "bx" or "be" - (\d+)
will find all numbers - (\d+ visitors)
will find "3 visitors" or "243234 visitors" or "2763816 visitors" - (\d+ of \d+ messages)
will find "2 of 1200 messages" or "1 of 10 messages" - (\d+ of \d+ messages)
will filter everything from the last occurrence of "2 of 1200 messages" or "1 of 10 messages" to the end of the page - (MyText.{0,20})
will find "MyText" and the next 20 characters after "MyText" - (\d\d.\d\d.\d\d\d\d)
will find date-strings with format 99.99.9999 or 99-99-9999 (the dot in the regex matches any character) - (\d\d\.\d\d\.\d\d\d\d)
will find date-strings with format 99.99.9999 - (([_a-zA-Z\d\-\.]+@[_a-zA-Z\d\-]+(\.[_a-zA-Z\d\-]+)+))
will find all e-mail addresses
No comments:
Post a Comment