[Previous] [Next] [First] [Last]


Lexical Structure

This chapter defines JavaScript's lexical grammar by specifying how input characters may be composed into white space, comments, and tokens.

2.1 Character Set

JavaScript programs are written using ASCII, the American Standard Code for Information Interchange (defined by ANSI standard X3.4).

2.2 Lexical Translations

The translation of an ASCII character stream into a sequence of JavaScript tokens uses the following two lexical translations, which are applied in turn:

  1. A translation of the ASCII character stream into a stream of input characters and line terminators.
  2. A translation of the stream of input characters and line terminators into a sequence of JavaScript input elements which, after white space and comments are discarded, comprise the tokens that are the terminal symbols of the syntactic grammar for JavaScript.

In these lexical translations JavaScript chooses the longest possible translation at each step, even if the result does not ultimately make a correct JavaScript program, while another lexical translation would.

2.3 Line Terminators

JavaScript divides the sequence of input characters into lines by recognizing line terminators. This definition of lines determines the line numbers produced by a JavaScript compiler or other system component. It also specifies the termination of a single-line comment.

Lines are terminated by the ASCII characters CR, or LF, or CR LF. A CR immediately followed by LF is counted as one line terminator, not two.

RawInputCharacter:
          LineTerminator
          InputCharacter

LineTerminator:
          the ASCII LF character, also known as "newline"
          the ASCII CR character, also known as "return"
          the ASCII CR character followed by the ASCII LF character

InputCharacter:
          Any ASCII character, but not CR and not LF

The result of this step is a sequence of line terminators and characters, which are the input for the second step in the tokenization process.

2.4 Input Elements and Tokens

The input characters and line terminators that result from input line recognition are reduced to a sequence of input elements. The input elements that are not white space or comments are JavaScript tokens.

This process is specified by the following grammar:

InputElements:
          InputElementopt 
          InputElements InputElement

InputElement:
          WhiteSpace
          Comment
          Token

WhiteSpace:
          the ASCII SP character, also known as "space"
          the ASCII HT character, also known as "horizontal tab"
          the ASCII FF character, also known as "form feed"
          LineTerminator

Token:
          Keyword
          Identifier
          Literal
          Separator
          Operator

White space and comments can serve to separate tokens that, if adjacent, might be tokenized in another manner. For example, the characters - and = in the input can form the operator token -= only if there is no intervening white space or comment.

2.4.1 White Space

White space is defined as the ASCII space, horizontal tab, and form feed characters, as well as line terminators.

2.4.2 Comments

JavaScript has two kinds of comments:

A traditional C-style comment: all the text from /* to */ is ignored:

/* text */ 

A single-line C++-style comment: all the text from // to the end of the line is ignored:

// text 

These comments are formally specified by the following lexical grammar:

Comment:
          TraditionalComment
          SingleLineComment

TraditionalComment:
          /* CommentTextopt */

CommentText:
          CommentCharacter
          CommentText CommentCharacter

CommentCharacter:
          NotStarSlash
          / NotStar
          * NotSlash
          LineTerminator

NotStar
          InputCharacter, but not *

NotSlash
          InputCharacter, but not /

NotStarSlash
          InputCharacter, but not * and not /

SingleLineComment:
          // CharactersInLineopt LineTerminator

CharactersInLine:
          InputCharacter
          CharactersInLine InputCharacter

The grammar implies all of the following properties:

As a result, these are legal comments:

/* this comment // ends here: */
// This // just /* fine */ as far as JavaScript // is concerned

But this causes a compile-time warning:

/* this comment /* causes a compile-time warning */

2.5 Keywords

The following sequences of ASCII letters are reserved for use as keywords, and are not legal identifiers:

Keyword: one of

    abstract
    boolean
    break
    byte
    case
    catch
    char
    class
    const
    continue
    default
    delete
    do
    double

else
extends
final
finally
float
for
function
goto
if
implements
import
in
instanceof
int
interface
long
native
new
package
private
protected
public
return
short
static
super
switch
synchronized
this
throw
throws
transient
try
typeof
var
void
volatile
while
with

The above list includes all keywords used currently and reserved for future use. The following table lists keywords used in JavaScript version 1.1:

    break
    continue
    delete
    else
    for
    function
    if
    in

new
return
this
typeof
var
void
while
with

While true and false might appear to be keywords, they are technically Boolean literals; while null might appear to be a keyword, it is technically an object literal.

2.6 Identifiers

An identifier is an unlimited-length sequence of ASCII letters and digits, the first of which must be a letter. The letters include uppercase and lowercase ASCII letters (a-z and A-Z) and the ASCII underscore (_) and dollar sign ($). The digits include the ASCII digits 0-9.

Identifier:
          IdentifierChars, but not a Keyword or BooleanLiteral or NullLiteral

IdentifierChars:
          JavaScriptLetter
          IdentifierChars JavaScriptLetterOrDigit

JavaScriptLetter:
          any uppercase or lowercase ASCII letter (a-z, A-Z)
          _
          $

JavaScriptLetterOrDigit:
          JavaScriptLetter
          any digit (0-9)

Examples of legal identifiers are

          Number_hits
          temp99
          _name
          $6million

2.7 Literals

A literal is the source code representation of a value of a primitive type:

Literal:
          IntegerLiteral
          FloatingPointLiteral
          BooleanLiteral
          StringLiteral
          NullLiteral

2.7.1 Integer Literals

Integer literals may be expressed in decimal (base 10), hexadecimal (base 16), or octal (base 8):

IntegerLiteral:
          DecimalLiteral 
          HexLiteral 
          OctalLiteral 

A decimal literal consists of a lone 0, or a digit from 1 to 9 followed by zero or more digits from 0 to 9, and represents a nonnegative integer:

DecimalLiteral:
          0
          NonZeroDigit Digitsopt

Digits:
          Digit
          Digits Digit

Digit:
          0
          NonZeroDigit

NonZeroDigit: one of
          1 2 3 4 5 6 7 8 9

A hexadecimal literal consists of a leading 0x or 0X followed by one or more hexadecimal digits and can represent a nonnegative integer. Hexadecimal digits with values 10 through 15 are represented by the letters a through f or A through F, respectively; each letter used as a hexadecimal digit may be uppercase or lowercase.

HexLiteral:
          0x HexDigit
          0X HexDigit
          HexLiteral HexDigit

HexDigit: one of
          0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

An octal literal consists of a digit 0 followed by one or more of the digits 0 through 7 and can represent a nonnegative integer.

OctalLiteral:
          0 OctalDigit
          OctalLiteral OctalDigit

OctalDigit: one of
          0 1 2 3 4 5 6 7

The largest hexadecimal and octal literals are 0xffffffff and 037777777777, respectively, which equal 4294967295. A compile-time error occurs for any integer literal of greater value.

Examples of integer literals:

          0
          1996
          0372
          0xDeadBeef
          0x00FF88FF

2.7.2 Floating-Point Literals

A floating-point literal has the following parts: a whole-number part, a decimal point, a fractional part, an exponent, and a type suffix. The exponent, if present, is indicated by a letter e or E followed by an optionally signed integer.

At least one digit, in either the whole number or the fraction part, and either a decimal point or an exponent, are required. All other parts are optional.

FloatingPointLiteral:
          Digits . Digitsopt ExponentPartopt 
          . Digits ExponentPartopt 
          Digits ExponentPart

ExponentPart:
          ExponentIndicator SignedInteger

ExponentIndicator: one of
          e E

SignedInteger:
          Signopt Digits

Sign: one of
          + -

The largest positive finite floating point literal is 1.79769313486231570e+308. The smallest positive finite floating point literal is 4.94065645841246544e-324.

A compile-time error occurs if a non-zero floating point literal is too large, so that on rounded conversion to its internal representation, it becomes an IEEE 754 infinity.1 A JavaScript program can represent infinities without producing a compile-time error by using the predefined constants Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY.

A compile-time error occurs if a nonzero floating-point literal is too small, so that, on rounded conversion to its internal representation, it becomes a zero. A compile-time error does not occur if a nonzero floating-point literal has a small value that, on rounded conversion to its internal representation, becomes a nonzero denormalized number.2

Examples of floating-point literals are:

          2.
          .3
          0.0
          3.14
          1e-9

2.7.3 Boolean Literals

The boolean type has two values, represented by the literals true and false.

BooleanLiteral:
          true 
          false

2.7.4 String Literals

A string literal is zero or more characters, enclosed in single (') or double (")quotes.

StringLiteral:
          " StringCharactersDQopt " 
          ' StringCharactersSQopt '

StringCharactersDQ:
          StringCharacterDQ
          StringCharactersDQ StringCharacterDQ

StringCharactersSQ:
          StringCharacterSQ
          StringCharactersSQ StringCharacterSQ

StringCharacterDQ:
          InputCharacter, but not " or 
          EscapeSequence

StringCharacterSQ
          InputCharacter, but not ' or 
          EscapeSequence

The escape sequences are described in section 2.7.5 Escape Sequences for String Literals.

It is a compile-time error for a line terminator to appear after the opening " and before the closing ". A long string literal can be broken up into shorter pieces and written as a expression using the string concatenation operator +.

Examples of string literals:

""                              // The empty string
"""                            // A string containing " alone
`This is a string'              // A string containing 16 characters

"This is a " +                  // Actually a string-valued expression
        "two-line string"   // containing two string literals

2.7.5 Escape Sequences for String Literals

The string escape sequences allow for the representation of some nongraphic characters as well as the single quote, double quote, and backslash characters in string literals.

EscapeSequence:
           b (backspace BS)
           t (horizontal tab HT )
           n (linefeed LF )
           f (form feed FF )
           r (carriage return CR )
           " (double quote " )
           ' (single quote ' )
            (backslash  )
          OctalEscape
          HexEscape

OctalEscape:
           OctalDigit
           OctalDigit OctalDigit
           ZeroToThree OctalDigit OctalDigit

OctalDigit: one of
          0 1 2 3 4 5 6 7

ZeroToThree: one of
          0 1 2 3

HexEscape:

           xHexDigit HexDigit

HexDigit: one of
          0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

2.7.6 The Null Literal

The null object reference is denoted by the literal null.

NullLiteral:
          null

2.8 Separators

The following characters are used in JavaScript as separators (punctuators):

Separator: one of

          ( ) { } [ ] ; ,

2.9 Operators

The following tokens are used in JavaScript as operators. Note that dot (.) is an operator in JavaScript, wheras it is a separtor in Java.

Operator: one of:

= > < ! ~ ? : .
== <= >= != && || ++ --
+ - * / & | ^ % << >> >>>
+= -= *= /= &= |= ^= %= <<= >>= >>>=


Footnotes

1 JavaScript 1.1 as implemented in Navigator 3.0 fails to report this error.

2 JavaScript 1.1 as implemented in Navigator 3.0 fails to report this error.


[Previous] [Next] [First] [Last]