[Previous] [Next] [TOC] 

Regular Expressions

Regular expressions are patterns used to match character combinations in strings. For example, to search for all occurrences of 'the' in a string, you create a pattern consisting of 'the' and use the pattern to search for its match in a string. Regular expression patterns are constructed using literal notation (/abc/) or the RegExp constructor function (re = new RegExp("abc")). These patterns are used with the regular expression methods, exec and test and with the String methods, match, replace, search, and split.

This section includes:

For complete information on the objects, properties, and methods used with regular expressions, see:

Constructing Regular Expressions

In JavaScript, a regular expression is an object that contains the pattern used to search for a match in a string. This section describes the regular expression syntax and how to write a pattern.

The Regular Expression Syntax

You construct a regular expression in one of two ways: The regular expression object is explained in detail in Regular Expression Object.

The examples used in the remainder of this section are shown in literal form.

Writing a Regular Expression Pattern

A regular expression pattern is composed of simple characters, such as /abc/, or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/. The last example includes parentheses which are used as a memory device. The match made with this part of the pattern is remembered for later use.

Using Simple Patterns

Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /abc/ matches character combinations in strings only when exactly the characters 'abc' occur together and in that order. Such a match would succeed in the strings "Hi, do you know your abc's?" and "The latest airplane designs evolved from slabcraft." In both cases the match is with the substring 'abc'. There is no match in the string "Grab crab" because it does not contain the substring 'abc'.

Using Special Characters

When the search for a match requires something more than a direct match, such as finding one or more b's, or finding a whitespace, the pattern includes special characters. For example, the pattern /ab*c/ matches any character combination in which a single 'a' is followed by zero or more 'b's (* means zero or more of the preceding character) and then immediately followed by 'c'. In the string "cbbabbbbcdebc," the pattern matches the substring 'abbbbc'.

Special Characters Used in Regular Expressions provides a complete list and description of the special characters that can be used in regular expressions.

Using Parentheses

Parentheses around any part of the regular expression pattern cause that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use.

For example, the pattern /Chapter, (\d+)\.\d*/ illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter, ' followed by one or more numeric characters (\d means any numeric character and + means one or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character zero or more times (\d means numeric character, * means zero or more times). In addition, parentheses are used to remember the first matched numeric characters.

This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapters 3 and 4."

How you use parenthesized substring matches is described in Using Parenthesized Substring Matches.

Working With Regular Expressions

Regular expressions are used with the regular expression methods test and exec and with the String methods match, replace, search, and split. These methods are explained in detail at their linked locations.
exec A regular expression method that executes a search for a match in a string. It returns an array of useful information.
test A regular expression method that tests for a match in a string. It returns true or false.
match A String method that executes a search for a match in a string. It returns an array of useful information.
search A String method that tests for a match in a string. It returns true or false.
replace A String method that executes a search for a match in a string, and replaces the matched substring with a replacement substring.
split A String method that uses a regular expression or a fixed string to break a string into an array of substrings.
When you want to know whether a pattern is found in a string use the test or search method; for more information (but slower execution) use the exec or match methods. If you use exec or match and if the match succeeds, these methods return an array and update properties of the regular expression object and the global regular expression object, RegExp.

For information about the returned array and its properties, see Working With Arrays and Regular Expressions.

For information about the global RegExp object and its properties, see The RegExp Object.

In the following example, the script uses the exec method to find a match in a string.

<SCRIPT>
myRe=/db+d/;
myArray = myRe.exec("cdbbdbsbz");
</SCRIPT>
The match succeeds and returns the following array and updates the following properties:
Object Property/Index Description Example
myArray all array elements dbbd
index the zero-based index of the match in the string 1
input the original string cdbbdbsbz
[0] the last matched characters dbbd
myRe lastIndex the index at which to start the next match. 5
source the text of the pattern db+d
RegExp lastMatch the last matched characters dbbd
leftContext the string up to the most recent match c
rightContext the string past the most recent match bsbz
If the match fails, the exec method returns null (which converts to Boolean false).

Using Parenthesized Substring Matches

Including parentheses in a regular expression pattern causes the corresponding submatch to be remembered. For example, /a(b)c/ matches the characters 'abc' and remembers 'b'. To recall these parenthesized substring matches, use the global RegExp properties $1, ..., $9 or the Array elements [1], ..., [n].

The number of possible parenthesized substrings is unlimited. The RegExp object holds up to the last nine and the returned array holds all that were found. The following examples illustrate how to use parenthesized substring matches.

Example 1. The following script uses the replace method to switch the words in the string. For the replacement text, the script uses the values of the $1 and $2 properties of the global RegExp object. Note that the RegExp object name is not be prepended to these properties when they are passed as the second argument to the replace method.

<SCRIPT>
re = /(\w+)\s(\w+)/;
str = "John Smith";
newstr=str.replace(re, "$2, $1");
document.write(newstr)
</SCRIPT>
This prints "Smith, John".

Example 2. In the following example, RegExp.input is set by the Change event. In the getInfo function, the exec method uses the value of RegExp.input as its argument. Note that RegExp must be prepended to its $ properties (since they appear outside the context of a regular expression).

<HTML>
<SCRIPT>
function getInfo(){
re = /(\w+)\s(\d+)/;
re.exec();
window.alert(RegExp.$1 + ", your age is " + RegExp.$2);
}
</SCRIPT>
Enter your first name and your age, and then press Enter.
<FORM>
<INPUT TYPE:"TEXT" NAME="NameAge" onChange="getInfo(this);">
</FORM>
</HTML>
Example 3. The following example is similar to Example 2. Instead of using the RegExp.$1 and RegExp.$2, this example creates an array and uses a[1] and a[2].
<HTML>
<SCRIPT>
function getInfo(){
re = /(\w+)\s(\d+)/;
a = re.exec();
window.alert(a[1] + ", your age is " + a[2]);
}
</SCRIPT>
Enter your first name and your age, and then press Enter.
<FORM>
<INPUT TYPE:"TEXT" NAME="NameAge" onChange="getInfo(this);">
</FORM>
</HTML>

Executing a Global Search and Ignoring Case

Regular expressions have two optional flags that allow for global and case insensitive searching. To indicate a global search, use the g flag. To indicate a case insensitive search, use the i flag. These flags can be used separately or together in either order, and are included as part of the regular expression.

To include a flag with the regular expression, use this syntax

re = /pattern/[g|i|gi]
re = new RegExp("pattern", [g|i|gi])
Note that the flags, i and g, are an integral part of a regular expression. They cannot be added or removed later.

For example, re = /\w+\s/g creates a regular expression that looks for any number of characters followed by a space, and it looks for this combination throughout the string.

<SCRIPT>
re = /\w+\s/g;
str = "fee fi fo fum";
myArray = str.match(re);
document.write(myArray);
</SCRIPT>
This writes "fee ,fi ,fo".

A Complete Example

The following example illustrates the formation of regular expressions and the use of string.split() and string.replace().

It cleans a roughly-formatted input string containing names (first name first) separated by blanks, tabs and exactly one semicolon.

Finally, it reverses the name order (last name first) and sorts the list.

<SCRIPT LANGUAGE="JavaScript1.2"> 

/********* 
* The name string contains multiple spaces and tabs, 
* and may have multiple spaces between first and last names. 
*********/ 
names = new String ( "Harry  Trump  ;Fred Barney; Helen   Rigby ;\
                      Bill Abel ;Chris Hand ") 

document.write ("---------- Original String" + "<BR>" + "<BR>") 
document.write (names + "<BR>" + "<BR>") 

/********* 
* Prepare two regular expression patterns and array storage. 
* Split the string into array elements. 
*********/ 
// pattern: possible white space then semicolon then possible white space
pattern = /\s*;\s*/
// break the string into pieces separated by the pattern above and
// and store the pieces in an array called nameList
nameList = names.split (pattern)
// new pattern: one or more characters then spaces then characters 
// use parentheses to "memorize" portions of the pattern 
// the memorized portions are referred to later 
pattern = /(\w+)\s+(\w+)/
// new array for holding names being processed
bySurnameList = new Array;  

/********* 
* Display the name array and populate the new array 
* with comma-separated names, last first. 
* 
* The replace method removes anything matching the pattern 
* and replaces it by the memorized string - 2nd memorized portion 
* followed by comma space followed by 1st memorized portion. 
* 
* The variables $1 and $2 refer to the portions 
* memorized while matching the pattern. 
*********/ 
document.write ("---------- After Split by Regular Expression" + "<BR>") 
for ( i = 0; i < nameList.length; i++) { 
 document.write (nameList[i] + "<BR>") 
 bySurnameList[i] = nameList[i].replace (pattern, "$2, $1") 
} 

/********* 
* Display the new array. 
*********/ 
document.write ("---------- Names Reversed" + "<BR>") 
for ( i = 0; i < bySurnameList.length; i++) { 
 document.write (bySurnameList[i] + "<BR>") 
} 

/********* 
* Sort by last name, then display the sorted array. 
*********/ 
bySurnameList.sort() 
document.write ("---------- Sorted" + "<BR>") 
for ( i = 0; i < bySurnameList.length; i++) { 
 document.write (bySurnameList[i] + "<BR>") 
} 

document.write ("---------- End" + "<BR>") 

</SCRIPT >

Special Characters Used in Regular Expressions

The following list describes the special characters that can be used in regular expressions.
\ indicates that the next character is special and not to be interpreted literally. For example, /b/ matches the character 'b'. By placing a backslash in front of b, e.g. /\b/, the character becomes special to mean match a word boundary. 

-or- 

indicates that the next character is not special and should be interpreted literally. For example, * is a special character that means zero or more of the preceding character should be matched, e.g. /a*/ means match zero or more a's. To match * literally, precede the it with a backslash, e.g. /a\*/ matches 'a*'. 

^ matches beginning of input or line, e.g. /^A/ matches only the first 'A' in "An A+ for Kelly."
$ matches end of input or line, e.g. /t$/ matches only the last 't' in "A cat in the hat".
* matches the preceding character zero or more times, e.g. /bo*/ matches 'boooo' in "The ghost screamed boooo."
+ matches the preceding character one or more times (equivalent to {1,}), e.g. /a+/ matches the 'a' in "candy" and all the a's in "caaaaaaandy."
? matches the preceding character zero or one time, e.g. /e?le?/ matches the 'el' in "angel" and the 'le' in "angle."
. (the decimal point) matches any single character except new line, e.g. /.n/ matches 'an' and 'on' in "an apple is on the tree."
(x) matches 'x' and remembers the match, e.g. /(foo)/ matches and remembers 'foo' in "foo bar." The matched substring can be recalled from the result Array elements [1], ..., [n], or the global RegExp properties $1, ..., $9.
x|y matches either 'x' or 'y', e.g. /green|red/ matches 'green' in "green apple" and 'red' in "red apple."
{x} where x is a non-negative integer. Matches exactly x times, e.g. /a{2}/ doesn't match the 'a' in "candy," matches all of the a's in "caandy," and the first two a's in "caaaaaaandy."
{x,} where x is a non-negative integer. Matches at least x times, e.g. /a{2,} doesn't match the 'a' in "candy" and matches all of the a's in "caandy" and in "caaaaaaandy."
{x,y} where x and y are non-negative integers. Matches at least x and at most y times, e.g. /a{1,3}/ matches the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy."
[xyz] a character set. Matches any one of the enclosed characters, e.g. [abc] matches the 'b' in "brisket" and the 'c' in "chop."
[^xyz] a negative character set. Matches anything that is not enclosed in the brackets, e.g. [^abc] matches 'r' in "brisket" and 'h' in "chop."
\b matches a word boundary, such as a space, e.g. /\bn\w/ matches the 'no' in "noonday", and /\wy\b/ matches the 'ly' in "possibly yesterday."
\B matches a non-word boundary, e.g. /\w\Bn/ matches 'on' in "noonday", and /y\B\w/ matches 'ye' in "possibly yesterday."
\d 
[0 -9]
matches a digit character, e.g. /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."
\D 
[^0-9]
matches any non-digit character, e.g. /\D/ or /[^0-9]/ matches 'B' in "B2 is the suite number."
\f matches a form-feed.
\n matches a linefeed.
\r matches a carriage return.
\s 
[ \f\n\r\t\v]
matches any white space including space, tab, form feed, line feed, e.g. /\s\w*/ 
matches ' bar' in "foo bar."
\S 
[^ \f\n\r\t\v]
matches any non-white space, e.g. /\S/\w* matches 'foo' in "foo bar."
\t matches a tab
\v matches a vertical tab.
\w 
[A-Za-z0-9_]
matches any word character including the underscore, e.g. /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."
\W 
[^A-Za-z0-9_]
matches any non-word character, e.g. /\W/ or /[^$A-Za-z0-9_]/ matches '%' in "50%."
/\#/ where # is a positive integer. A back-reference to the last substring matching the # parenthetical in the regular expression (counting left parentheses), e.g. /apple(,)\sorange\1/ matches 'apple, orange', in "apple, orange, cherry, peach."  A more complete example follows this table. 
Note: if the number of left parentheses is less than the number specified in \#, the \# is taken as an octal escape as described in the next row.
/x/ where x is an octal, hexadecimal, or decimal escape value. Allows you to embed ASCII codes into regular expressions.

Example Using Special Characters

In the following example, a user enters a phone number. When the user presses Enter, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script posts a window thanking the user and confirming the number. If the number is invalid, the script posts a window telling the user that the phone number isn't valid.

The regular expression looks for zero or one open parenthesis \(?, followed by three digits \d{3}, followed by zero or one close parenthesis \)?, followed by one dash, forward slash, or decimal point and when found, remember the character ([-\/\.]), followed by three digits \d{3}, followed by the remembered match of a dash, forward slash, or decimal point \1, followed by four digits \d{4}.

The Change event activated when the user presses Enter, sets the value of RegExp.input.

<HTML>
<SCRIPT LANGUAGE = "JavaScript1.2">
re = /\(?\d{3}\)?([-\/\.])\d{3}\1\d{4}/
function testInfo() { 
   OK = re.exec() 
   if (!OK) 
      window.alert (RegExp.input + " isn't a phone number with area code!") 
   else 
      window.alert ("Thanks, your phone number is " + OK[0]) 
}
</SCRIPT>
Enter your phone number (with area code) and then press Enter.
<FORM> <INPUT TYPE:"TEXT" NAME="Phone" onChange="testInfo(this);"> </FORM>
</HTML>

[Previous] [Next] [TOC] 



Copyright © 1997 Netscape Communications Corporation