D Murli: Validation Expressions

Metacharacter	Match
\	the escape character - used to find an instance of a metacharacter like a period, brackets, etc.
. (period)	match any character except newline
x	match any instance of x
^x	match any character except x
[x]	match any instance of x in the bracketed range - [abxyz] will match any instance of a, b, x, y, or z
\| (pipe)	an OR operator - [x\|y] will match an instance of x or y
()	used to group sequences of characters or matches
{}	used to define numeric quantifiers
{x}	match must occur exactly x times
{x,}	match must occur at least x times
{x,y}	match must occur at least x times, but no more than y times
?	preceding match is optional or one only, same as {0,1}
*	find 0 or more of preceding match, same as {0,}
+	find 1 or more of preceding match, same as {1,}
^	match the beginning of the line
$	match the end of a line

POSIX Class	Match
[:alnum:]	alphabetic and numeric characters
[:alpha:]	alphabetic characters
[:blank:]	space and tab
[:cntrl:]	control characters
[:digit:]	digits
[:graph:]	non-blank (not spaces and control characters)
[:lower:]	lowercase alphabetic characters
[:print:]	any printable characters
[:punct:]	punctuation characters
[:space:]	all whitespace characters (includes [:blank:], newline, carriage return)
[:upper:]	uppercase alphabetic characters
[:xdigit:]	digits allowed in a hexadecimal number (i.e. 0-9, a-f, A-F)

Character class	Match
\d	matches a digit, same as [0-9]
\D	matches a non-digit, same as [^0-9]
\s	matches a whitespace character (space, tab, newline, etc.)
\S	matches a non-whitespace character
\w	matches a word character
\W	matches a non-word character
\b	matches a word-boundary (NOTE: within a class, matches a backspace)
\B	matches a non-wordboundary

\
The backslash escapes any character and can therefore be used to force characters to be matched as literals instead of being treated as characters with special meaning. For example, '\[' matches '[' and '\\' matches '\'.
.
A dot matches any character. For example, 'go.d' matches 'gold' and 'good'.
{ }
{n} ... Match exactly n times
{n,} ... Match at least n times
{n,m} ... Match at least n but not more than m times
[ ]
A string enclosed in square brackets matches any character in that string, but no others. For example, '[xyz]' matches only 'x', 'y', or 'z', a range of characters may be specified by two characters separated by '-'. Note that '[a-z]' matches alphabetic characters, while '[z-a]' never matches.
[-]
A hyphen within the brackets signifies a range of characters. For example, [b-o] matches any character from b through o.
|
A vertical bar matches either expression on either side of the vertical bar. For example, bar|car will match either bar or car.
*
An asterisk after a string matches any number of occurrences of that string, including zero characters. For example, bo* matches: bo, boo and booo but not b.
+
A plus sign after a string matches any number of occurrences of that string, except zero characters. For example, bo+ matches: boo, and booo, but not bo or be.
\d+
matches all numbers with one or more digits
\d*
matches all numbers with zero or more digits
\w+
matches all words with one or more characters containing a-z, A-Z and 0-9. \w+ will find title, border, width etc. Please note that \w matches only numbers and characters (a-z, A-Z, 0-9) lower than ordinal value 128.
[a-zA-Z\xA1-\xFF]+
matches all words with one or more characters containing a-z, A-Z and characters larger than ordinal value 161 (eg. ä or Ü). If you want to find words with numbers, then add 0-9 to the expression: [0-9a-zA-Z\xA1-\xFF]+

Typical examples

(bo*)
will find "bo", "boo", "bot", but not "b"
(bx+)
will find "bxxxxxxxx", "bxx", but not "bx" or "be"
(\d+)
will find all numbers
(\d+ visitors)
will find "3 visitors" or "243234 visitors" or "2763816 visitors"
(\d+ of \d+ messages)
will find "2 of 1200 messages" or "1 of 10 messages"
(\d+ of \d+ messages)
will filter everything from the last occurrence of "2 of 1200 messages" or "1 of 10 messages" to the end of the page
(MyText.{0,20})
will find "MyText" and the next 20 characters after "MyText"
(\d\d.\d\d.\d\d\d\d)
will find date-strings with format 99.99.9999 or 99-99-9999 (the dot in the regex matches any character)
(\d\d\.\d\d\.\d\d\d\d)
will find date-strings with format 99.99.9999
(([_a-zA-Z\d\-\.]+@[_a-zA-Z\d\-]+(\.[_a-zA-Z\d\-]+)+))
will find all e-mail addresses

D Murli

Friday, October 8, 2010

Validation Expressions

No comments:

Post a Comment