Effect of slight modification to LL(1) parse table generation

Asked 1 year, 4 months ago

Viewed 53 times

$\begingroup$

I am experimenting with LL(1) parsers that do not use a separate lexer.

I have the following grammar:

S = (('<' S '>' | 'a'+) ' '?)+

The notation uses ' guotes to indicate terminals and familiar regex like syntax for union, grouping, repetition, optional parts etc. It could also be rewritten more like BNF as follows:

S = F OWS Q
Q = F OWS Q
Q = 
F = '<' S '>'
F = 'a' G
G = 'a'
G = 
OWS = ' '
OWS =

This grammar is not LL(1) because, since the spaces separating the sequences of 'a's is optional, it would not be possible to distinguish between multiple concatenated sequences or even one large sequence.

The grammar is intended to parse languages like in the following lines:

aaa aaa aaa
a a <a a <a>a aa>
a<a>a<aaaaa><a<<<a>>>>

That is, consecutive occurrences of 'a' separated by spaces and or matching brackets.

If I were to add a lexer to greedily tokenize sequences of 'a' into the single token TK_IDENTIFIER the grammar becomes LL(1) again.

S = (('<' S '>' | TK_IDENTIFIER) ' '?)+

however I have found another way:

The rules for building the LL(1) parse table are as follows:

T[A][c] contains the rule A = w if and only if : (c is in First(w)) or (w is nullable and c is in Follow(A))

Having two productions in the same cell in the table is illegal. However, If I say that a non empty production is chosen in favor of an empty one I can produce a table that works.

What is the effect of modifying this rule in terms of the languages that can be parsed? In terms of the meaning of the grammar?

Is it correct to say that this modification is a method of resolving ambiguity? A specific case of ambiguity maybe?

Is there an obvious defect in this approach? any pathologically disgusting grammars you can think of that would produce unexpected results?

Does modifying this rule have the same effect as using a lexer that is purely defined by regular expressions? eg: r/a+/ in this case.

Consider an even simpler example:

S = ('a'+)+
-- or --
S = T U
U = T U
U = 
T = 'a' V
V = 'a' V
V =

and with a tokenizer

S = (TK_IDENTIFIER)+
-- or --
S = TK_IDENTIFIER U
U = TK_IDENTIFIER U
U =

Improve this question

asked Jun 30, 2024 at 13:36

Maximilian's user avatar

Maximilian

1111 bronze badge

$\endgroup$

Add a comment |

0

Sorted by: Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Effect of slight modification to LL(1) parse table generation

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Effect of slight modification to LL(1) parse table generation

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions