-
Notifications
You must be signed in to change notification settings - Fork 250
feat: Allow nested interpolated strings in lexer #4055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Possibly another approach is to instead produce InterpolationStart and InterpolationEnd tokens, and then avoid any token nesting from the lexer.
...which would be how we currently produce parentheses (but not how we produce strings — those return a Literal::String, rather than StringStart and StringEnd)
aljazerzen
commented
Jan 9, 2024
I tried this implementation strategy when rewriting from Pest to chumsky, but my bottom line was the same: the parser expects a flat stream of tokens and not a tree.
This PR is a good effort, but I don't know how to continue from here. The from_nested might work, but even if it does, having nested token streams is quite uncommon.
I like your idea of InterpolationStart and InterpolationEnd much more!
So the following:
a = f"hello {"Mr. " + name}, I'm {age} years old"
... would be lexed as:
Ident: "a"
Ctrl: =
StringQualifier: "f"
String: "hello "
InterpolationStart <--- this means {
String: "Mr. "
Ctrl: +
Ident: "name"
InterpolationEnd
String: ", I'm "
InterpolationStart
Ident: age
InterpolationEnd
String: " years old"
StringQualifier: ""
... or maybe like this:
Ident: "a"
Ctrl: =
InterpolationStart: "f" <--- this means f"
InterpolationString: "hello "
String: "Mr. "
Ctrl: +
Ident: "name"
InterpolationString: ", I'm "
Ident: age
InterpolationString: " years old"
InterpolationEnd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware that this prevents interpolations from containing }?
Following is not supported:
f"result: { my_func_that_takes_a_tuple {a=1} }"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was aware of the issue with colons at least — python also has this problem but doesn't use colons so much.
With closing brackets, that's indeed a more restrictive case, and I agree with your broader point about it being confusing if we don't allow that, given how prominently brackets are used.
One difficulty is that at the lexing stage we don't know about matching opening & closing brackets, so we can't do the thing of "it's the end of the interpolated string if it's the final closing bracket".
We could require escape characters, which seems reasonable if a bit ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the following:
s"LOWER_CASE( {"contents"} )"
and:
s"LOWERCASE( { f"Hello, {"Aljaz"} is my name." } )"
vanillajonathan
commented
Jan 9, 2024
Meh, are nested string interpolation even something that is useful?
Sounds like something that needlessly complicates the parser and that nobody would ever actually use or care about.
max-sixty
commented
Jan 9, 2024
Meh, are nested string interpolation even something that is useful?
Sounds like something that needlessly complicates the parser and that nobody would ever actually use or care about.
This is a good question!
It depends what form of nesting you're asking about. From the linked issue, this seems helpful, and arguably confusing that we don't allow it:
from invoices
derive x = s"{foo - bar} + 5" # currently fails
OTOH, having s & f strings inside other s & f strings I agree isn't that helpful.
aljazerzen
commented
Jan 9, 2024
OTOH, having s & f strings inside other s & f strings I agree isn't that helpful.
I do agree, although there are cases where it would be useful:
... f"my items are { std.array.join this.items "," }"
I worry about language consistency around interpolation:
- if we support only idents, this is not ergonomic, but a reasonable limitation,
- if we support all expressions, this is very ergonomic, but hard to pull off,
- if we support all expressions except string literals and tuples, it gets 90% of ergonomics, but is also quite arbitrary rule that is hard to remember.
I'm not against this incremental improvement, but I would like to have a long-term plan on what we plan on supporting.
aljazerzen
commented
Jan 11, 2024
Ok, I'm impressed with Pablo's work - Python is able to parse this:
>>> f"My dict: { {'a': 1} }"
"My dict: {'a': 1}"
aljazerzen
commented
Jan 11, 2024
I've thought this was hard, not I think this is harder. :D
max-sixty
commented
Jan 12, 2024
Python is able to parse this:
But not this!
f"{lambda x: x+2}"
Taking some of PRQL#4055
bce69f9 to
b649696
Compare
An 80% complete attempt to allow having an interpolated string contain expressions — #3279
Supersedes #3280
Currently it successfully lexes
s"{hello + 5}"into a nested expression of tokens.But I'm not sure how to then parse those nested tokens — chumsky possibly doesn't support this sort of thing, or only a very basic version with https://docs.rs/chumsky/latest/chumsky/stream/struct.Stream.html#method.from_nested. I tried a lot of things and couldn't make progress (this was much harder than I thought it would be...)
It also required refactoring the existing lexer into a function that outputs individual tokens. It uses some
rewindparsers to know when we've reached the end of the parent.Any ideas for making this work? Have I overcomplicated it?
FYI most of the existing tests pass, because I tried to have the proposed interpolation go back to its string form. There are some corner case failures because of the way that spans work.