This section includes:
re = /ab+c/Literal notation provides compilation of the regular expression once only, when the script is first loaded. When the regular expression will remain constant, use literal notation for better performance.
re = new RegExp("ab+c")Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input. Once you have a defined regular expression, and if the regular expression is used throughout the script, you can use the compile method to compile the regular expression for efficient reuse. The regular expression object is explained in detail in Regular Expression Object.
The examples used in the remainder of this section are shown in literal form.
Special Characters Used in Regular Expressions provides a complete list and description of the special characters that can be used in regular expressions.
For example, the pattern /Chapter, (\d+)\.\d*/ illustrates additional escaped and special characters and indicates that part of the pattern should be remembered. It matches precisely the characters 'Chapter, ' followed by one or more numeric characters (\d means any numeric character and + means one or more times), followed by a decimal point (which in itself is a special character; preceding the decimal point with \ means the pattern must look for the literal character '.'), followed by any numeric character zero or more times (\d means numeric character, * means zero or more times). In addition, parentheses are used to remember the first matched numeric characters.
This pattern is found in "Open Chapter 4.3, paragraph 6" and '4' is remembered. The pattern is not found in "Chapters 3 and 4."
How you use parenthesized substring matches is described in Using Parenthesized Substring Matches.
exec | A regular expression method that executes a search for a match in a string. It returns an array of useful information. |
test | A regular expression method that tests for a match in a string. It returns true or false. |
match | A String method that executes a search for a match in a string. It returns an array of useful information. |
search | A String method that tests for a match in a string. It returns true or false. |
replace | A String method that executes a search for a match in a string, and replaces the matched substring with a replacement substring. |
split | A String method that uses a regular expression or a fixed string to break a string into an array of substrings. |
For information about the returned array and its properties, see Working With Arrays and Regular Expressions.
For information about the global RegExp object and its properties, see The RegExp Object.
In the following example, the script uses the exec method to find a match in a string.
<SCRIPT>
myRe=/db+d/; myArray = myRe.exec("cdbbdbsbz");
</SCRIPT>The match succeeds and returns the following array and updates the following properties:
Object | Property/Index | Description | Example |
myArray | all array elements | dbbd | |
index | the zero-based index of the match in the string | 1 | |
input | the original string | cdbbdbsbz | |
[0] | the last matched characters | dbbd | |
myRe | lastIndex | the index at which to start the next match. | 5 |
source | the text of the pattern | db+d | |
RegExp | lastMatch | the last matched characters | dbbd |
leftContext | the string up to the most recent match | c | |
rightContext | the string past the most recent match | bsbz |
The number of possible parenthesized substrings is unlimited. The RegExp object holds up to the last nine and the returned array holds all that were found. The following examples illustrate how to use parenthesized substring matches.
Example 1. The following script uses the replace method to switch the words in the string. For the replacement text, the script uses the values of the $1 and $2 properties of the global RegExp object. Note that the RegExp object name is not be prepended to these properties when they are passed as the second argument to the replace method.
<SCRIPT> re = /(\w+)\s(\w+)/; str = "John Smith"; newstr=str.replace(re, "$2, $1"); document.write(newstr) </SCRIPT>This prints "Smith, John".
Example 2. In the following example, RegExp.input is set by the Change event. In the getInfo function, the exec method uses the value of RegExp.input as its argument. Note that RegExp must be prepended to its $ properties (since they appear outside the context of a regular expression).
<HTML>
<SCRIPT> function getInfo(){ re = /(\w+)\s(\d+)/; re.exec(); window.alert(RegExp.$1 + ", your age is " + RegExp.$2); } </SCRIPT>
Enter your first name and your age, and then press Enter.
<FORM> <INPUT TYPE:"TEXT" NAME="NameAge" onChange="getInfo(this);"> </FORM>
</HTML>Example 3. The following example is similar to Example 2. Instead of using the RegExp.$1 and RegExp.$2, this example creates an array and uses a[1] and a[2].
<HTML>
<SCRIPT> function getInfo(){ re = /(\w+)\s(\d+)/; a = re.exec(); window.alert(a[1] + ", your age is " + a[2]); } </SCRIPT>
Enter your first name and your age, and then press Enter.
<FORM> <INPUT TYPE:"TEXT" NAME="NameAge" onChange="getInfo(this);"> </FORM></HTML>
To include a flag with the regular expression, use this syntax
re = /pattern/[g|i|gi]
re = new RegExp("pattern", [g|i|gi])Note that the flags, i and g, are an integral part of a regular expression. They cannot be added or removed later.
For example, re = /\w+\s/g creates a regular expression that looks for any number of characters followed by a space, and it looks for this combination throughout the string.
<SCRIPT> re = /\w+\s/g; str = "fee fi fo fum"; myArray = str.match(re); document.write(myArray); </SCRIPT>This writes "fee ,fi ,fo".
It cleans a roughly-formatted input string containing names (first name first) separated by blanks, tabs and exactly one semicolon.
Finally, it reverses the name order (last name first) and sorts the list.
<SCRIPT LANGUAGE="JavaScript1.2"> /********* * The name string contains multiple spaces and tabs, * and may have multiple spaces between first and last names. *********/ names = new String ( "Harry Trump ;Fred Barney; Helen Rigby ;\ Bill Abel ;Chris Hand ") document.write ("---------- Original String" + "<BR>" + "<BR>") document.write (names + "<BR>" + "<BR>") /********* * Prepare two regular expression patterns and array storage. * Split the string into array elements. *********/ // pattern: possible white space then semicolon then possible white space pattern = /\s*;\s*/
// break the string into pieces separated by the pattern above and // and store the pieces in an array called nameList nameList = names.split (pattern)
// new pattern: one or more characters then spaces then characters // use parentheses to "memorize" portions of the pattern // the memorized portions are referred to later pattern = /(\w+)\s+(\w+)/
// new array for holding names being processed bySurnameList = new Array; /********* * Display the name array and populate the new array * with comma-separated names, last first. * * The replace method removes anything matching the pattern * and replaces it by the memorized string - 2nd memorized portion * followed by comma space followed by 1st memorized portion. * * The variables $1 and $2 refer to the portions * memorized while matching the pattern. *********/ document.write ("---------- After Split by Regular Expression" + "<BR>") for ( i = 0; i < nameList.length; i++) { document.write (nameList[i] + "<BR>") bySurnameList[i] = nameList[i].replace (pattern, "$2, $1") } /********* * Display the new array. *********/ document.write ("---------- Names Reversed" + "<BR>") for ( i = 0; i < bySurnameList.length; i++) { document.write (bySurnameList[i] + "<BR>") } /********* * Sort by last name, then display the sorted array. *********/ bySurnameList.sort() document.write ("---------- Sorted" + "<BR>") for ( i = 0; i < bySurnameList.length; i++) { document.write (bySurnameList[i] + "<BR>") } document.write ("---------- End" + "<BR>") </SCRIPT >
\ | indicates that the next character is special and not to
be interpreted literally. For example, /b/ matches the character 'b'. By
placing a backslash in front of b, e.g. /\b/, the character becomes special
to mean match a word boundary.
-or- indicates that the next character is not special and should be interpreted literally. For example, * is a special character that means zero or more of the preceding character should be matched, e.g. /a*/ means match zero or more a's. To match * literally, precede the it with a backslash, e.g. /a\*/ matches 'a*'. |
^ | matches beginning of input or line, e.g. /^A/ matches only the first 'A' in "An A+ for Kelly." |
$ | matches end of input or line, e.g. /t$/ matches only the last 't' in "A cat in the hat". |
* | matches the preceding character zero or more times, e.g. /bo*/ matches 'boooo' in "The ghost screamed boooo." |
+ | matches the preceding character one or more times (equivalent to {1,}), e.g. /a+/ matches the 'a' in "candy" and all the a's in "caaaaaaandy." |
? | matches the preceding character zero or one time, e.g. /e?le?/ matches the 'el' in "angel" and the 'le' in "angle." |
. | (the decimal point) matches any single character except new line, e.g. /.n/ matches 'an' and 'on' in "an apple is on the tree." |
(x) | matches 'x' and remembers the match, e.g. /(foo)/ matches and remembers 'foo' in "foo bar." The matched substring can be recalled from the result Array elements [1], ..., [n], or the global RegExp properties $1, ..., $9. |
x|y | matches either 'x' or 'y', e.g. /green|red/ matches 'green' in "green apple" and 'red' in "red apple." |
{x} | where x is a non-negative integer. Matches exactly x times, e.g. /a{2}/ doesn't match the 'a' in "candy," matches all of the a's in "caandy," and the first two a's in "caaaaaaandy." |
{x,} | where x is a non-negative integer. Matches at least x times, e.g. /a{2,} doesn't match the 'a' in "candy" and matches all of the a's in "caandy" and in "caaaaaaandy." |
{x,y} | where x and y are non-negative integers. Matches at least x and at most y times, e.g. /a{1,3}/ matches the 'a' in "candy," the first two a's in "caandy," and the first three a's in "caaaaaaandy." |
[xyz] | a character set. Matches any one of the enclosed characters, e.g. [abc] matches the 'b' in "brisket" and the 'c' in "chop." |
[^xyz] | a negative character set. Matches anything that is not enclosed in the brackets, e.g. [^abc] matches 'r' in "brisket" and 'h' in "chop." |
\b | matches a word boundary, such as a space, e.g. /\bn\w/ matches the 'no' in "noonday", and /\wy\b/ matches the 'ly' in "possibly yesterday." |
\B | matches a non-word boundary, e.g. /\w\Bn/ matches 'on' in "noonday", and /y\B\w/ matches 'ye' in "possibly yesterday." |
\d
[0 -9] |
matches a digit character, e.g. /\d/ or /[0-9]/ matches '2' in "B2 is the suite number." |
\D
[^0-9] |
matches any non-digit character, e.g. /\D/ or /[^0-9]/ matches 'B' in "B2 is the suite number." |
\f | matches a form-feed. |
\n | matches a linefeed. |
\r | matches a carriage return. |
\s
[ \f\n\r\t\v] |
matches any white space including space, tab, form feed,
line feed, e.g. /\s\w*/
matches ' bar' in "foo bar." |
\S
[^ \f\n\r\t\v] |
matches any non-white space, e.g. /\S/\w* matches 'foo' in "foo bar." |
\t | matches a tab |
\v | matches a vertical tab. |
\w
[A-Za-z0-9_] |
matches any word character including the underscore, e.g. /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D." |
\W
[^A-Za-z0-9_] |
matches any non-word character, e.g. /\W/ or /[^$A-Za-z0-9_]/ matches '%' in "50%." |
/\#/ | where # is a positive integer. A back-reference to the last substring
matching the # parenthetical in the regular expression (counting left parentheses),
e.g. /apple(,)\sorange\1/ matches 'apple, orange', in "apple, orange, cherry,
peach." A more complete example follows this table.
Note: if the number of left parentheses is less than the number specified in \#, the \# is taken as an octal escape as described in the next row. |
/x/ | where x is an octal, hexadecimal, or decimal escape value. Allows you to embed ASCII codes into regular expressions. |
The regular expression looks for zero or one open parenthesis \(?, followed by three digits \d{3}, followed by zero or one close parenthesis \)?, followed by one dash, forward slash, or decimal point and when found, remember the character ([-\/\.]), followed by three digits \d{3}, followed by the remembered match of a dash, forward slash, or decimal point \1, followed by four digits \d{4}.
The Change event activated when the user presses Enter, sets the value of RegExp.input.
<HTML>
<SCRIPT LANGUAGE = "JavaScript1.2">
re = /\(?\d{3}\)?([-\/\.])\d{3}\1\d{4}/
function testInfo() { OK = re.exec() if (!OK) window.alert (RegExp.input + " isn't a phone number with area code!") else window.alert ("Thanks, your phone number is " + OK[0]) }
</SCRIPT>
Enter your phone number (with area code) and then press Enter.
<FORM> <INPUT TYPE:"TEXT" NAME="Phone" onChange="testInfo(this);"> </FORM>
</HTML>