Contents
Next
3.1 Lexical conventions
Blanks
The following characters are considered as blanks: space, newline,
horizontal tabulation, carriage return, line feed and form feed. Blanks are
ignored, but they separate adjacent identifiers, literals and
keywords that would otherwise be confused as one single identifier,
literal or keyword.
Comments
Comments are introduced by the two characters
(*, with no
intervening blanks, and terminated by the characters
*), with
no intervening blanks. Comments are treated as blank characters.
Comments do not occur inside string or character literals. Nested
comments are handled correctly.
Identifiers
ident
::=
letter { letter | 0...9| _| '}
letter
::=
A ... Z | a ... z
Identifiers are sequences of letters, digits,
_ (the underscore
character), and
' (the single quote), starting with a
letter. Letters contain at least the 52 lowercase and uppercase
letters from the ASCII set. The current implementation also recognizes
as letters all accented characters from the ISO 8859-1 (``ISO Latin
1'') set. All characters in an identifier are meaningful. The current
implementation places no limits on the number of characters of an
identifier.
Integer literals
integer-literal
::=
[ -] { 0...9 }+
|
[ -] ( 0x| 0X) { 0...9| A...F| a...f }+
|
[ -] ( 0o| 0O) { 0...7 }+
|
[ -] ( 0b| 0B) { 0...1 }+
An integer literal is a sequence of one or more digits, optionally
preceded by a minus sign. By default, integer literals are in decimal
(radix 10). The following prefixes select a different radix:
Prefix
Radix
0x, 0X
hexadecimal (radix 16)
0o, 0O
octal (radix 8)
0b, 0B
binary (radix 2)
(The initial
0 is the digit zero; the
O for octal is the letter O.)
The interpretation of integer literals that fall outside the range of
representable integer values is undefined.
String literals
string-literal
::=
" { string-character } "
string-character
::=
regular-char
|
\ ( \ | " | n | t | b | r)
|
\ ( 0...9) ( 0...9) ( 0...9)
String literals are delimited by
" (double quote) characters.
The two double quotes enclose a sequence of either characters
different from
" and
\, or escape sequences from the
table below:
Sequence
Character denoted
\\
backslash (\)
\"
double quote (")
\n
newline (LF)
\r
return (CR)
\t
horizontal tabulation (TAB)
\b
backspace (BS)
\ddd
the character with ASCII code ddd in decimal
The current implementation places no restrictions on the length of
string literals.
Infix symbols
infix-symbol
::=
< { operator-char }
| > { operator-char }
| < >
| no-inf-no-sup { operator-char }
operator-char
::=
< | > | no-inf-no-sup
no-inf-no-sup
::=
! | # | $ | % | & | * | + | - | . |
/ | = | ? | @ |
^ | | | ~
Sequences of ``operator characters'', such as
<=> or
!!,
are read as a single token from the
infix-symbol
class. These symbols are parsed as infix operators inside
expressions, but otherwise behave much as identifiers.
Warning: due to the concrete syntax of types that includes much
<
and
> in sequence, the only identifiers that contain only
< and
> are
<,
> and
<>.
Keywords
The identifiers below are reserved as keywords, and cannot be employed
otherwise:
and do else end external
false if in init let
loc open primitive reply spawn
then to true type val
where with
The following character sequences are also keywords:
-> . |
Ambiguities
Lexical ambiguities are resolved according to the ``longest match''
rule: when a character sequence can be decomposed into two tokens in
several different ways, the decomposition retained is the one with the
longest first token.
Contents
Next