java.lang.Object
java.io.StreamTokenizer
public class StreamTokenizer
extends Object
static int
TT_EOF
static int
TT_EOL
static int
TT_NUMBER
static int
TT_WORD
double
nval
int
ttype
StreamTokenizer(InputStream is)
StreamTokenizer(Reader r)
StreamTokenizer
to read
characters from a Reader
and parse them. void
commentChar(int ch)
void
eolIsSignificant(boolean flag)
int
lineno()
void
lowerCaseMode(boolean flag)
int
nextToken()
void
ordinaryChar(int ch)
void
ordinaryChars(int low, int hi)
void
parseNumbers()
void
pushBack()
nextToken
will return the same value on the next call.
void
quoteChar(int ch)
void
resetSyntax()
void
slashSlashComments(boolean flag)
void
slashStarComments(boolean flag)
String
toString()
String
in
the form "Token[x], line n", where 'n' is the current line numbers and
'x' is determined as follows.
void
whitespaceChars(int low, int hi)
void
wordChars(int low, int hi)
public static final int TT_EOF
A constant indicating that the end of the stream has been read.
- Field Value:
- -1
public static final int TT_EOL
A constant indicating that the end of the line has been read.
- Field Value:
- 10
public static final int TT_NUMBER
A constant indicating that a number token has been read.
- Field Value:
- -2
public static final int TT_WORD
A constant indicating that a word token has been read.
- Field Value:
- -3
public int ttype
Contains the type of the token read resulting from a call to nextToken The rules are as follows:
- For a token consisting of a single ordinary character, this is the value of that character.
- For a quoted string, this is the value of the quote character
- For a word, this is TT_WORD
- For a number, this is TT_NUMBER
- For the end of the line, this is TT_EOL
- For the end of the stream, this is TT_EOF
public StreamTokenizer(InputStream is)
Deprecated. Since JDK 1.1.
This method reads bytes from anInputStream
and tokenizes them. For details on how this method operates by default, seeStreamTokenizer(Reader)
.
- Parameters:
is
- TheInputStream
to read from
public StreamTokenizer(Reader r)
This method initializes a newStreamTokenizer
to read characters from aReader
and parse them. The char values have their hight bits masked so that the value is treated a character in the range of 0x0000 to 0x00FF. This constructor sets up the parsing table to parse the stream in the following manner:
- The values 'A' through 'Z', 'a' through 'z' and 0xA0 through 0xFF are initialized as alphabetic
- The values 0x00 through 0x20 are initialized as whitespace
- The values '\'' and '"' are initialized as quote characters
- '/' is a comment character
- Numbers will be parsed
- EOL is not treated as significant
- C and C++ (//) comments are not recognized
- Parameters:
r
- TheReader
to read chars from
public void commentChar(int ch)
This method sets the comment attribute on the specified character. Other attributes for the character are cleared.
- Parameters:
ch
- The character to set the comment attribute for, passed as an int
public void eolIsSignificant(boolean flag)
This method sets a flag that indicates whether or not the end of line sequence terminates and is a token. The defaults tofalse
- Parameters:
flag
-true
if EOF is significant,false
otherwise
public int lineno()
This method returns the current line number. Note that if thepushBack()
method is called, it has no effect on the line number returned by this method.
- Returns:
- The current line number
public void lowerCaseMode(boolean flag)
This method sets a flag that indicates whether or not alphabetic tokens that are returned should be converted to lower case.
- Parameters:
flag
-true
to convert to lower case,false
otherwise
public int nextToken() throws IOException
This method reads the next token from the stream. It sets thettype
variable to the appropriate token type and returns it. It also can setsval
ornval
as described below. The parsing strategy is as follows:
- Skip any whitespace characters.
- If a numeric character is encountered, attempt to parse a numeric value. Leading '-' characters indicate a numeric only if followed by another non-'-' numeric. The value of the numeric token is terminated by either the first non-numeric encountered, or the second occurrence of '-' or '.'. The token type returned is TT_NUMBER and
nval
is set to the value parsed.- If an alphabetic character is parsed, all subsequent characters are read until the first non-alphabetic or non-numeric character is encountered. The token type returned is TT_WORD and the value parsed is stored in
sval
. If lower case mode is set, the token stored insval
is converted to lower case. The end of line sequence terminates a word only if EOL signficance has been turned on. The start of a comment also terminates a word. Any character with a non-alphabetic and non-numeric attribute (such as white space, a quote, or a commet) are treated as non-alphabetic and terminate the word.- If a comment character is parsed, then all remaining characters on the current line are skipped and another token is parsed. Any EOL or EOF's encountered are not discarded, but rather terminate the comment.
- If a quote character is parsed, then all characters up to the second occurrence of the same quote character are parsed into a
String
. ThisString
is stored assval
, but is not converted to lower case, even if lower case mode is enabled. The token type returned is the value of the quote character encountered. Any escape sequences (\b (backspace), \t (HTAB), \n (linefeed), \f (form feed), \r (carriage return), \" (double quote), \' (single quote), \\ (backslash), \XXX (octal esacpe)) are converted to the appropriate char values. Invalid esacape sequences are left in untranslated. Unicode characters like ('\ u0000') are not recognized.- If the C++ comment sequence "//" is encountered, and the parser is configured to handle that sequence, then the remainder of the line is skipped and another token is read exactly as if a character with the comment attribute was encountered.
- If the C comment sequence "/*" is encountered, and the parser is configured to handle that sequence, then all characters up to and including the comment terminator sequence are discarded and another token is parsed.
- If all cases above are not met, then the character is an ordinary character that is parsed as a token by itself. The char encountered is returned as the token type.
- Returns:
- The token type
- Throws:
IOException
- If an I/O error occurs
public void ordinaryChar(int ch)
This method makes the specified character an ordinary character. This means that none of the attributes (whitespace, alphabetic, numeric, quote, or comment) will be set on this character. This character will parse as its own token.
- Parameters:
ch
- The character to make ordinary, passed as an int
public void ordinaryChars(int low, int hi)
This method makes all the characters in the specified range, range terminators included, ordinary. This means the none of the attributes (whitespace, alphabetic, numeric, quote, or comment) will be set on any of the characters in the range. This makes each character in this range parse as its own token.
- Parameters:
low
- The low end of the range of values to set the whitespace attribute forhi
- The high end of the range of values to set the whitespace attribute for
public void parseNumbers()
This method sets the numeric attribute on the characters '0' - '9' and the characters '.' and '-'. When this method is used, the result of giving other attributes (whitespace, quote, or comment) to the numeric characters may vary depending on the implementation. For example, if parseNumbers() and then whitespaceChars('1', '1') are called, this implementation reads "121" as 2, while some other implementation will read it as 21.
public void pushBack()
Puts the current token back into the StreamTokenizer sonextToken
will return the same value on the next call. May cause the lineno method to return an incorrect value if lineno is called before the next call to nextToken.
public void quoteChar(int ch)
This method sets the quote attribute on the specified character. Other attributes for the character are cleared.
- Parameters:
ch
- The character to set the quote attribute for, passed as an int.
public void resetSyntax()
This method removes all attributes (whitespace, alphabetic, numeric, quote, and comment) from all characters. It is equivalent to callingordinaryChars(0x00, 0xFF)
.
- See Also:
ordinaryChars(int,int)
public void slashSlashComments(boolean flag)
This method sets a flag that indicates whether or not "C++" language style comments ("//" comments through EOL ) are handled by the parser. If this istrue
commented out sequences are skipped and ignored by the parser. This defaults tofalse
.
- Parameters:
flag
-true
to recognized and handle "C++" style comments,false
otherwise
public void slashStarComments(boolean flag)
This method sets a flag that indicates whether or not "C" language style comments (with nesting not allowed) are handled by the parser. If this istrue
commented out sequences are skipped and ignored by the parser. This defaults tofalse
.
- Parameters:
flag
-true
to recognized and handle "C" style comments,false
otherwise
public String toString()
This method returns the current token value as aString
in the form "Token[x], line n", where 'n' is the current line numbers and 'x' is determined as follows.
- If no token has been read, then 'x' is "NOTHING" and 'n' is 0
- If
ttype
is TT_EOF, then 'x' is "EOF"- If
ttype
is TT_EOL, then 'x' is "EOL"- If
ttype
is TT_WORD, then 'x' issval
- If
ttype
is TT_NUMBER, then 'x' is "n=strnval" where 'strnval' isString.valueOf(nval)
.- If
ttype
is a quote character, then 'x' issval
- For all other cases, 'x' is
ttype
public void whitespaceChars(int low, int hi)
This method sets the whitespace attribute for all characters in the specified range, range terminators included.
- Parameters:
low
- The low end of the range of values to set the whitespace attribute forhi
- The high end of the range of values to set the whitespace attribute for
public void wordChars(int low, int hi)
This method sets the alphabetic attribute for all characters in the specified range, range terminators included.
- Parameters:
low
- The low end of the range of values to set the alphabetic attribute forhi
- The high end of the range of values to set the alphabetic attribute for