[Previous] [Next] [First] [Last]
This chapter defines JavaScript's lexical grammar by specifying how input characters may be composed into white space, comments, and tokens.
JavaScript programs are written using ASCII, the American Standard Code for Information Interchange (defined by ANSI standard X3.4).
The translation of an ASCII character stream into a sequence of JavaScript tokens uses the following two lexical translations, which are applied in turn:
In these lexical translations JavaScript chooses the longest possible translation at each step, even if the result does not ultimately make a correct JavaScript program, while another lexical translation would.
JavaScript divides the sequence of input characters into lines by recognizing line terminators. This definition of lines determines the line numbers produced by a JavaScript compiler or other system component. It also specifies the termination of a single-line comment.
Lines are terminated by the ASCII characters CR, or LF, or CR LF. A CR immediately followed by LF is counted as one line terminator, not two.
RawInputCharacter: LineTerminator InputCharacter
LineTerminator: the ASCII LF character, also known as "newline" the ASCII CR character, also known as "return" the ASCII CR character followed by the ASCII LF character
InputCharacter: Any ASCII character, but not CR and not LF
The result of this step is a sequence of line terminators and characters, which are the input for the second step in the tokenization process.
The input characters and line terminators that result from input line recognition are reduced to a sequence of input elements. The input elements that are not white space or comments are JavaScript tokens.
This process is specified by the following grammar:
InputElements: InputElementopt InputElements InputElement
InputElement: WhiteSpace Comment Token
WhiteSpace: the ASCII SP character, also known as "space" the ASCII HT character, also known as "horizontal tab" the ASCII FF character, also known as "form feed" LineTerminator
Token: Keyword Identifier Literal Separator Operator
White space and comments can serve to separate tokens that, if adjacent, might be tokenized in another manner. For example, the characters - and = in the input can form the operator token -= only if there is no intervening white space or comment.
White space is defined as the ASCII space, horizontal tab, and form feed characters, as well as line terminators.
JavaScript has two kinds of comments:
A traditional C-style comment: all the text from /* to */ is ignored:
/* text */
A single-line C++-style comment: all the text from // to the end of the line is ignored:
// text
These comments are formally specified by the following lexical grammar:
Comment: TraditionalComment SingleLineComment
TraditionalComment: /* CommentTextopt */
CommentText: CommentCharacter CommentText CommentCharacter
CommentCharacter: NotStarSlash / NotStar * NotSlash LineTerminator
NotStar InputCharacter, but not *
NotSlash InputCharacter, but not /
NotStarSlash InputCharacter, but not * and not /
SingleLineComment: // CharactersInLineopt LineTerminator
CharactersInLine: InputCharacter CharactersInLine InputCharacter
The grammar implies all of the following properties:
As a result, these are legal comments:
/* this comment // ends here: */ // This // just /* fine */ as far as JavaScript // is concerned
But this causes a compile-time warning:
/* this comment /* causes a compile-time warning */
The following sequences of ASCII letters are reserved for use as keywords, and are not legal identifiers:
Keyword: one of
The above list includes all keywords used currently and reserved for future use. The following table lists keywords used in JavaScript version 1.1:
new return this typeof var void while with |
While true and false might appear to be keywords, they are technically Boolean literals; while null might appear to be a keyword, it is technically an object literal.
An identifier is an unlimited-length sequence of ASCII letters and digits, the first of which must be a letter. The letters include uppercase and lowercase ASCII letters (a-z and A-Z) and the ASCII underscore (_) and dollar sign ($). The digits include the ASCII digits 0-9.
Identifier: IdentifierChars, but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars: JavaScriptLetter IdentifierChars JavaScriptLetterOrDigit
JavaScriptLetter: any uppercase or lowercase ASCII letter (a-z, A-Z) _ $
JavaScriptLetterOrDigit: JavaScriptLetter any digit (0-9)
Examples of legal identifiers are
Number_hits temp99 _name $6million
A literal is the source code representation of a value of a primitive type:
Literal: IntegerLiteral FloatingPointLiteral BooleanLiteral StringLiteral NullLiteral
Integer literals may be expressed in decimal (base 10), hexadecimal (base 16), or octal (base 8):
IntegerLiteral: DecimalLiteral HexLiteral OctalLiteral
A decimal literal consists of a lone 0, or a digit from 1 to 9 followed by zero or more digits from 0 to 9, and represents a nonnegative integer:
DecimalLiteral: 0 NonZeroDigit Digitsopt
Digits: Digit Digits Digit
Digit: 0 NonZeroDigit
NonZeroDigit: one of 1 2 3 4 5 6 7 8 9
A hexadecimal literal consists of a leading 0x or 0X followed by one or more hexadecimal digits and can represent a nonnegative integer. Hexadecimal digits with values 10 through 15 are represented by the letters a through f or A through F, respectively; each letter used as a hexadecimal digit may be uppercase or lowercase.
HexLiteral: 0x HexDigit 0X HexDigit HexLiteral HexDigit
HexDigit: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
An octal literal consists of a digit 0 followed by one or more of the digits 0 through 7 and can represent a nonnegative integer.
OctalLiteral: 0 OctalDigit OctalLiteral OctalDigit
OctalDigit: one of 0 1 2 3 4 5 6 7
The largest hexadecimal and octal literals are 0xffffffff and 037777777777, respectively, which equal 4294967295. A compile-time error occurs for any integer literal of greater value.
0 1996 0372 0xDeadBeef 0x00FF88FF
A floating-point literal has the following parts: a whole-number part, a decimal point, a fractional part, an exponent, and a type suffix. The exponent, if present, is indicated by a letter e or E followed by an optionally signed integer.
At least one digit, in either the whole number or the fraction part, and either a decimal point or an exponent, are required. All other parts are optional.
FloatingPointLiteral: Digits . Digitsopt ExponentPartopt . Digits ExponentPartopt Digits ExponentPart
ExponentPart: ExponentIndicator SignedInteger
ExponentIndicator: one of e E
SignedInteger: Signopt Digits
Sign: one of + -
The largest positive finite floating point literal is 1.79769313486231570e+308. The smallest positive finite floating point literal is 4.94065645841246544e-324.
A compile-time error occurs if a non-zero floating point literal is too large, so that on rounded conversion to its internal representation, it becomes an IEEE 754 infinity.1 A JavaScript program can represent infinities without producing a compile-time error by using the predefined constants Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY.
A compile-time error occurs if a nonzero floating-point literal is too small, so that, on rounded conversion to its internal representation, it becomes a zero. A compile-time error does not occur if a nonzero floating-point literal has a small value that, on rounded conversion to its internal representation, becomes a nonzero denormalized number.2
Examples of floating-point literals are:
2. .3 0.0 3.14 1e-9
The boolean type has two values, represented by the literals true and false.
BooleanLiteral: true false
A string literal is zero or more characters, enclosed in single (') or double (")quotes.
StringLiteral: " StringCharactersDQopt " ' StringCharactersSQopt '
StringCharactersDQ: StringCharacterDQ StringCharactersDQ StringCharacterDQ
StringCharactersSQ: StringCharacterSQ StringCharactersSQ StringCharacterSQ
StringCharacterDQ: InputCharacter, but not " or EscapeSequence
StringCharacterSQ InputCharacter, but not ' or EscapeSequence
The escape sequences are described in section 2.7.5 Escape Sequences for String Literals.
It is a compile-time error for a line terminator to appear after the opening " and before the closing ". A long string literal can be broken up into shorter pieces and written as a expression using the string concatenation operator +.
"" // The empty string """ // A string containing " alone `This is a string' // A string containing 16 characters "This is a " + // Actually a string-valued expression "two-line string" // containing two string literals
The string escape sequences allow for the representation of some nongraphic characters as well as the single quote, double quote, and backslash characters in string literals.
EscapeSequence: b (backspace BS) t (horizontal tab HT ) n (linefeed LF ) f (form feed FF ) r (carriage return CR ) " (double quote " ) ' (single quote ' ) (backslash ) OctalEscape HexEscape
OctalEscape: OctalDigit OctalDigit OctalDigit ZeroToThree OctalDigit OctalDigit
OctalDigit: one of 0 1 2 3 4 5 6 7
ZeroToThree: one of 0 1 2 3
HexEscape:
xHexDigit HexDigit
HexDigit: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
The null object reference is denoted by the literal null.
NullLiteral: null
The following characters are used in JavaScript as separators (punctuators):
Separator: one of
( ) { } [ ] ; ,
The following tokens are used in JavaScript as operators. Note that dot (.) is an operator in JavaScript, wheras it is a separtor in Java.
Operator: one of:
= > < ! ~ ? : .
== <= >= != && || ++ --
+ - * / & | ^ % << >> >>>
+= -= *= /= &= |= ^= %= <<= >>= >>>=
1 JavaScript 1.1 as implemented in Navigator 3.0 fails to report this error.
2 JavaScript 1.1 as implemented in Navigator 3.0 fails to report this error.