A regular expression is a pattern that specifies a list of characters. In this section, we will look at how those characters are specified :

Start and End ( ^ $ )

A caret (^) at the beginning of a regular expression indicates that the string being searched must start with this pattern.

The pattern ^foo can be found in "food", but not in "barfood".

A dollar sign ($) at the end of a regular expression indicates that the string being searched must end with this pattern.

The pattern foo$ can be found in "curfoo", but not in "food".

Number of Occurrences ( ? + * {} )

The following symbols affect the number of occurrences of the preceding character: ?, +, *, and {}.

A questionmark (?) indicates that the preceding character should appear zero or one times in the pattern.

The pattern foo? can be found in "food" and "fod", but not "faod".

A plus sign (+) indicates that the preceding character should appear one or more times in the pattern.

The pattern fo+ can be found in "fod", "food" and "foood", but not "fd".

A asterisk (*) indicates that the preceding character should appear zero or more times in the pattern.

The pattern fo*d can be found in "fd", "fod" and "food".

Curly brackets with one parameter ( {n} ) indicate that the preceding character should appear exactly n times in the pattern.

The pattern fo{3}d can be found in "foood" , but not "food" or "fooood".

Curly brackets with two parameters ( {n1,n2} ) indicate that the preceding character should appear between n1 and n2 times in the pattern.

The pattern fo{2,4}d can be found in "food","foood" and "fooood", but not "fod" or "foooood".

Curly brackets with one parameter and an empty second paramenter ( {n,} ) indicate that the preceding character should appear at least n times in the pattern.

The pattern fo{2,}d can be found in "food" and "foooood", but not "fod".

Common Characters ( . \d \D \w \W \s \S )

A period ( . ) represents any character except a newline.

The pattern fo.d can be found in "food", "foad", "fo9d", and "fo*d".

Backslash-d ( \d ) represents any digit. It is the equivalent of [0-9].

The pattern fo\dd can be found in "fo1d", "fo4d" and "fo0d", but not in "food" or "fodd".

Backslash-D ( \D ) represents any character except a digit. It is the equivalent of [^0-9].

The pattern fo\Dd can be found in "food" and "foad", but not in "fo4d".

Backslash-w ( \w ) represents any word character (letters, digits, and the underscore (_) ).

The pattern fo\wd can be found in "food", "fo_d" and "fo4d", but not in "fo*d".

Backslash-W ( \W ) represents any character except a word character.

The pattern fo\Wd can be found in "fo*d", "fo@d" and "fo.d", but not in "food".

Backslash-s ( \s) represents any whitespace character (e.g, space, tab, newline, etc.).

The pattern fo\sd can be found in "fo d", but not in "food".

Backslash-S ( \S ) represents any character except a whitespace character.

The pattern fo\Sd can be found in "fo*d", "food" and "fo4d", but not in "fo d".

Grouping ( [] )

Square brackets ( [] ) are used to group options.

The pattern f[aeiou]d can be found in "fad" and "fed", but not in "food", "faed" or "fd".

The pattern f[aeiou]{2}d can be found in "faed" and "feod", but not in "fod", "fed" or "fd".

Negation ( ^ )

When used after the first character of the regular expression, the caret ( ^ ) is used for negation.

The pattern f[^aeiou]d can be found in "fqd" and "f4d", but not in "fad" or "fed".

Subpatterns ( () )

Parentheses ( () ) are used to capture subpatterns.

The pattern f(oo)?d can be found in "food" and "fd", but not in "fod".

Alternatives ( )

The pipe ( ) is used to create optional patterns.

The pattern foo$^bar can be found in "foo" and "bar", but not "foobar".

Escape Character ( \ )

The backslash ( \ ) is used to escape special characters.

The pattern fo\.d can be found in "fo.d", but not in "food" or "fo4d".

Backreferences

Backreferences are special wildcards that refer back to a subpattern within a pattern. They can be used to make sure that two subpatterns match. The first subpattern in a pattern is referenced as \1, the second is referenced as \2, and so on.

A more practical example has to do matching the delimiter in social security numbers. Examine the following regular expression.

^\d{3}([\- ]?)\d{2}([\- ]?)\d{4}$

Within the caret (^) and dollar sign ($), which are used to specify the beginning and end of the pattern, there are three sequences of digits, optionally separated by a hyphen or a space. This pattern will be matched in all of following strings (and more).

•123-45-6789
•123 45 6789
•123456789
•123-45 6789
•123 45-6789
•123-456789