1466 – Spec claims maximal munch technique always works: not for "1..3"

D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.

Issue 1466 - Spec claims maximal munch technique always works: not for "1..3"

Summary: Spec claims maximal munch technique always works: not for "1..3"

Status:	RESOLVED FIXED
Alias:	None
Product:	D
Classification:	Unclassified
Component:	dlang.org (show other issues)
Version:	D1 (retired)
Hardware:	All All
Importance :	P3 minor
Assignee:	No Owner
URL:	http://digitalmars.com/d/1.0/lex.html
Keywords:	spec
Depends on:
Blocks:	3104
Show dependency tree / graph

See Also:
Reported:	2007年09月01日 09:35 UTC by Matti Niemenmaa
Modified:	2014年02月16日 15:24 UTC (History)
CC List:	2 users (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Matti Niemenmaa 2007年09月01日 09:35:26 UTC

A snippet from http://digitalmars.com/d/1.0/lex.html:
"The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can."
Relevant parts of the grammar:
Token:
	FloatLiteral
	..
FloatLiteral:
	Float
Float:
	DecimalFloat
DecimalFloat:
	DecimalDigits .
	. Decimal
DecimalDigits:
	DecimalDigit
DecimalDigit:
	NonZeroDigit
Decimal:
	NonZeroDigit
Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3".
Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something.
Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.

Comment 1 Nazo Humei 2007年09月01日 12:15:17 UTC

Reply to d-bugmail@puremagic.com,
> http://d.puremagic.com/issues/show_bug.cgi?id=1466
> 
> Summary: Spec claims maximal munch technique always works:
> not
> for "1..3"
> Product: D
> Version: 1.020
> Platform: All
> URL: http://digitalmars.com/d/1.0/lex.html
> OS/Version: All
> Status: NEW
> Keywords: spec
> Severity: minor
> Priority: P3
> Component: www.digitalmars.com
> AssignedTo: bugzilla@digitalmars.com
> ReportedBy: deewiant@gmail.com
> A snippet from http://digitalmars.com/d/1.0/lex.html:
> 
> "The source text is split into tokens using the maximal munch
> technique, i.e., the lexical analyzer tries to make the longest token
> it can."
> 
> Relevant parts of the grammar:
> 
> Token:
> FloatLiteral
> ..
> FloatLiteral:
> Float
> Float:
> DecimalFloat
> DecimalFloat:
> DecimalDigits .
> . Decimal
> DecimalDigits:
> DecimalDigit
> DecimalDigit:
> NonZeroDigit
> Decimal:
> NonZeroDigit
> Based on the above, if a lexer encounters "1..3", for instance in a
> slice: "foo[1..3]", it should, using the maximal munch technique, make
> the longest possible token from "1..3": this is the Float "1.". Next,
> it should come up with the Float ".3".
> 
> Of course, this isn't currently happening, and would be problematic if
> it did. But, according to the grammar, that's what should happen,
> unless I'm missing something.
> 
> Either some exception needs to be made or remove the "DecimalDigits ."
> possibility from the grammar and the compiler.
> 
or make it "DecimalDigits . [^.]" where the ^ production is non consuming.

Comment 2 Nazo Humei 2007年09月02日 17:30:15 UTC

Reply to d-bugmail@puremagic.com,
> "The source text is split into tokens using the maximal munch
> technique, i.e., the lexical analyzer tries to make the longest token
> it can."
> 
another case:
actual
!isGood -> ! isGood 
MaxMunch
!isGood -> !is Good

Comment 3 Christopher Nicholson-Sauls 2007年09月02日 17:45:15 UTC

BCS wrote:
> Reply to d-bugmail@puremagic.com,
> 
> 
>> "The source text is split into tokens using the maximal munch
>> technique, i.e., the lexical analyzer tries to make the longest token
>> it can."
>>
> 
> another case:
> 
> actual
> !isGood -> ! isGood
> MaxMunch
> !isGood -> !is Good
> 
> 
I might be wrong, but my guess is that 'is' is always treated as its own entity, so that 
'!is' is really ('!' 'is'). Its not a bad practice when one has keyword-operators to do 
this, to avoid MM screwing up user's identifiers. But, as I haven't taken any trips 
through the DMD frontend source, I might be completely off.
-- Chris Nicholson-Sauls

Comment 4 Nazo Humei 2007年09月02日 18:30:16 UTC

Reply to Chris Nicholson-Sauls,
> BCS wrote:
> 
>> Reply to d-bugmail@puremagic.com,
>> 
>>> "The source text is split into tokens using the maximal munch
>>> technique, i.e., the lexical analyzer tries to make the longest
>>> token it can."
>>> 
>> another case:
>> 
>> actual
>> !isGood -> ! isGood
>> MaxMunch
>> !isGood -> !is Good
> I might be wrong, but my guess is that 'is' is always treated as its
> own entity, so that '!is' is really ('!' 'is'). Its not a bad
That's how I spoted it in the first place
> practice when one has keyword-operators to do this, to avoid MM
> screwing up user's identifiers. But, as I haven't taken any trips
> through the DMD frontend source, I might be completely off.
> 
For that to work the lexer has to keep track of whitespace. :-b

Comment 5 Jascha Wetzel 2007年09月03日 06:08:58 UTC

(In reply to comment #0)
> A snippet from http://digitalmars.com/d/1.0/lex.html:
> 
> "The source text is split into tokens using the maximal munch technique, i.e.,
> the lexical analyzer tries to make the longest token it can."
> 
> Relevant parts of the grammar:
> 
> Token:
> FloatLiteral
> ..
> 
> FloatLiteral:
> Float
> 
> Float:
> DecimalFloat
> 
> DecimalFloat:
> DecimalDigits .
> . Decimal
> 
> DecimalDigits:
> DecimalDigit
> 
> DecimalDigit:
> NonZeroDigit
> 
> Decimal:
> NonZeroDigit
> 
> Based on the above, if a lexer encounters "1..3", for instance in a slice:
> "foo[1..3]", it should, using the maximal munch technique, make the longest
> possible token from "1..3": this is the Float "1.". Next, it should come up
> with the Float ".3".
> 
> Of course, this isn't currently happening, and would be problematic if it did.
> But, according to the grammar, that's what should happen, unless I'm missing
> something.
> 
> Either some exception needs to be made or remove the "DecimalDigits ."
> possibility from the grammar and the compiler.
> 
(In reply to comment #1)
> Reply to d-bugmail@puremagic.com,
> 
> > http://d.puremagic.com/issues/show_bug.cgi?id=1466
> > 
> > Summary: Spec claims maximal munch technique always works:
> > not
> > for "1..3"
> > Product: D
> > Version: 1.020
> > Platform: All
> > URL: http://digitalmars.com/d/1.0/lex.html
> > OS/Version: All
> > Status: NEW
> > Keywords: spec
> > Severity: minor
> > Priority: P3
> > Component: www.digitalmars.com
> > AssignedTo: bugzilla@digitalmars.com
> > ReportedBy: deewiant@gmail.com
> > A snippet from http://digitalmars.com/d/1.0/lex.html:
> > 
> > "The source text is split into tokens using the maximal munch
> > technique, i.e., the lexical analyzer tries to make the longest token
> > it can."
> > 
> > Relevant parts of the grammar:
> > 
> > Token:
> > FloatLiteral
> > ..
> > FloatLiteral:
> > Float
> > Float:
> > DecimalFloat
> > DecimalFloat:
> > DecimalDigits .
> > . Decimal
> > DecimalDigits:
> > DecimalDigit
> > DecimalDigit:
> > NonZeroDigit
> > Decimal:
> > NonZeroDigit
> > Based on the above, if a lexer encounters "1..3", for instance in a
> > slice: "foo[1..3]", it should, using the maximal munch technique, make
> > the longest possible token from "1..3": this is the Float "1.". Next,
> > it should come up with the Float ".3".
> > 
> > Of course, this isn't currently happening, and would be problematic if
> > it did. But, according to the grammar, that's what should happen,
> > unless I'm missing something.
> > 
> > Either some exception needs to be made or remove the "DecimalDigits ."
> > possibility from the grammar and the compiler.
> > 
> 
> or make it "DecimalDigits . [^.]" where the ^ production is non consuming.
> 
it is possible to parse D using a maximal munch lexer - see the seatd grammar for an example. it's a matter of what lexemes exactly you choose. in this particular case, the float lexemes need to be split, such that those floats with a trailing dot are not matched by a single lexeme.

Comment 6 Nazo Humei 2007年09月03日 09:50:18 UTC

Reply to Jascha,
> BCS wrote:
> 
>> For that to work the lexer has to keep track of whitespace. :-b
>> 
> you can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for
> the submatch. or use a single-character lookahead.
> 
That's what I'm hoping to do sooner or later. I already do somthing like 
that for ".." vs "."

Comment 7 Matti Niemenmaa 2007年09月09日 12:26:03 UTC

Here's some example code underlining the issue:
class Foo {
	static int opSlice(double a, double b) {
		return 0;
	}
}
void main() {
	// works
	assert (Foo[0. .. 1] == 0);
	// thinks it's [0 ... 1], no maximal munch taking place
	assert (Foo[0... 1] == 0);
}

Comment 8 Nazo Humei 2007年09月09日 16:50:26 UTC

Reply to Jascha,
> d-bugmail@puremagic.com wrote:
> 
>> // thinks it's [0 ... 1], no maximal munch taking place
>> assert (Foo[0... 1] == 0);
>> }
> this *is* maximal munch taking place. because of the ".." lexeme,
> float literals are not lexemes. they are context free production rules
> consisting of multiple lexemes. therefore "0." consists of two lexemes
> and "..." wins the max munch over ".".
> 
But is it the the correct way to do it? (Not is is doing what the spec says, 
but is it doing what it should be designed to do)

Comment 9 Walter Bright 2010年11月09日 19:43:04 UTC

http://www.dsource.org/projects/phobos/changeset/2148