how does the JLS deal with the >> ambiguity?
Eric Blake
ebb9@byu.net
Thu Sep 15 15:25:00 GMT 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Adam Megacz on 9/14/2005 11:14 PM:
> This probably isn't the perfect mailing list for this, but it's close ;)
>> In C++ there's a known ambiguity where you have to put a space between
> two right-angle brackets in a template instantiation:
>> Foo<Bar,Baz<Bop> >
>> Otherwise the ">>" is treated as the "shift-right" token by the lexer.
>> I was wondering how JLS 1.5 handles this, since Javac seems to do just
> fine when the space isn't there. The JLS itself isn't sufficiently
> formal at the lexical level to clarify this -- it just says that '>'
> "is a token" and '>>' "is a token".
The Java 1.5 spec took great pains to ensure that even with >> (and >>>)
as a token, the grammar was still unambiguous, as well as great pains to
ensure the user did not have to insert a space between nested generic type
closers. There is no need for the parser to decide whether to parse >> as
'>>' or '>' '>'. Back when I was actively developing jikes, I was able to
produce an LALR(1) grammar that accurately parses every possible Java 1.5
construct (unfortunately, my time and attention were subsequently drawn
away from jikes prior to implementing semantic analysis of those
constructs, hence the latest jikes still cannot compile 1.5 code, even
though it can parse it). Java does have advantages over C++ in that the
contents of <foo<bar>> are more constrained (ie, an arithmetic expression
never appears inside of a generic type name, so the presence of an
arithmetic operator automatically means parsing the >> token as a shift
operator instead of a double generic name closer).
>> Are we to assume that this implies that any JLS 1.5-compliant parser
> with a separate lexer (lexerless parsers don't have this problem in
> the first place) is REQUIRED to implement some form of token-ambiguity
> and decide whether or not to merge the tokens during parsing?
Nope. LALR(1) grammars have no ambiguity, and Java can be parsed with
LALR(1) (unlike C++). It is completely possible to parse Java without
having to make token splitting decisions.
- --
Life is short - so eat dessert first!
Eric Blake ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDKW5V84KuGfSFAYARAr1uAJ9cOAfmZa3cf8Pcht2UX6WncyxEYACfeUGl
8xiDLKTEOmvrinEFLlMj1P0=
=27ZQ
-----END PGP SIGNATURE-----
More information about the Java
mailing list