Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: Allow nested interpolated strings in lexer #4055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
max-sixty wants to merge 5 commits into PRQL:main
base: main
Choose a base branch
Loading
from max-sixty:interpolate-lexer

Conversation

@max-sixty
Copy link
Member

@max-sixty max-sixty commented Jan 6, 2024

An 80% complete attempt to allow having an interpolated string contain expressions — #3279

Supersedes #3280

Currently it successfully lexes s"{hello + 5}" into a nested expression of tokens.

But I'm not sure how to then parse those nested tokens — chumsky possibly doesn't support this sort of thing, or only a very basic version with https://docs.rs/chumsky/latest/chumsky/stream/struct.Stream.html#method.from_nested. I tried a lot of things and couldn't make progress (this was much harder than I thought it would be...)

It also required refactoring the existing lexer into a function that outputs individual tokens. It uses some rewind parsers to know when we've reached the end of the parent.

Any ideas for making this work? Have I overcomplicated it?


FYI most of the existing tests pass, because I tried to have the proposed interpolation go back to its string form. There are some corner case failures because of the way that spans work.

Copy link
Member Author

max-sixty commented Jan 6, 2024
edited
Loading

Possibly another approach is to instead produce InterpolationStart and InterpolationEnd tokens, and then avoid any token nesting from the lexer.

...which would be how we currently produce parentheses (but not how we produce strings — those return a Literal::String, rather than StringStart and StringEnd)

Copy link
Member

I tried this implementation strategy when rewriting from Pest to chumsky, but my bottom line was the same: the parser expects a flat stream of tokens and not a tree.

This PR is a good effort, but I don't know how to continue from here. The from_nested might work, but even if it does, having nested token streams is quite uncommon.

I like your idea of InterpolationStart and InterpolationEnd much more!

So the following:

a = f"hello {"Mr. " + name}, I'm {age} years old"

... would be lexed as:

Ident: "a"
Ctrl: =
StringQualifier: "f"
String: "hello "
InterpolationStart <--- this means {
String: "Mr. "
Ctrl: +
Ident: "name"
InterpolationEnd
String: ", I'm "
InterpolationStart
Ident: age
InterpolationEnd
String: " years old"
StringQualifier: ""

... or maybe like this:

Ident: "a"
Ctrl: =
InterpolationStart: "f" <--- this means f"
InterpolationString: "hello "
String: "Mr. "
Ctrl: +
Ident: "name"
InterpolationString: ", I'm "
Ident: age
InterpolationString: " years old"
InterpolationEnd
max-sixty reacted with thumbs up emoji

// TODO: decide how we want to handle colons in interpolated expressions
// We use rewinds to look ahead and ensure we don't have a closing
// bracket (or colon), before forwarding that to the lexer.
filter(|c| *c != '}' && *c != ':')
Copy link
Member

@aljazerzen aljazerzen Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware that this prevents interpolations from containing }?

Following is not supported:

f"result: { my_func_that_takes_a_tuple {a=1} }"

Copy link
Member Author

@max-sixty max-sixty Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was aware of the issue with colons at least — python also has this problem but doesn't use colons so much.

With closing brackets, that's indeed a more restrictive case, and I agree with your broader point about it being confusing if we don't allow that, given how prominently brackets are used.

One difficulty is that at the lexing stage we don't know about matching opening & closing brackets, so we can't do the thing of "it's the end of the interpolated string if it's the final closing bracket".

We could require escape characters, which seems reasonable if a bit ugly.

),
[],
)
"###);
Copy link
Member

@aljazerzen aljazerzen Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for the following:

s"LOWER_CASE( {"contents"} )"

and:

s"LOWERCASE( { f"Hello, {"Aljaz"} is my name." } )"

Copy link
Collaborator

Meh, are nested string interpolation even something that is useful?
Sounds like something that needlessly complicates the parser and that nobody would ever actually use or care about.

Copy link
Member Author

Meh, are nested string interpolation even something that is useful?
Sounds like something that needlessly complicates the parser and that nobody would ever actually use or care about.

This is a good question!

It depends what form of nesting you're asking about. From the linked issue, this seems helpful, and arguably confusing that we don't allow it:

from invoices
derive x = s"{foo - bar} + 5" # currently fails

OTOH, having s & f strings inside other s & f strings I agree isn't that helpful.

Copy link
Member

OTOH, having s & f strings inside other s & f strings I agree isn't that helpful.

I do agree, although there are cases where it would be useful:

...
f"my items are { std.array.join this.items "," }"

I worry about language consistency around interpolation:

  • if we support only idents, this is not ergonomic, but a reasonable limitation,
  • if we support all expressions, this is very ergonomic, but hard to pull off,
  • if we support all expressions except string literals and tuples, it gets 90% of ergonomics, but is also quite arbitrary rule that is hard to remember.

I'm not against this incremental improvement, but I would like to have a long-term plan on what we plan on supporting.

Copy link
Member

Ok, I'm impressed with Pablo's work - Python is able to parse this:

>>> f"My dict: { {'a': 1} }"
"My dict: {'a': 1}"
max-sixty reacted with eyes emoji

Copy link
Member

I've thought this was hard, not I think this is harder. :D

Copy link
Member Author

Python is able to parse this:

But not this!

f"{lambda x: x+2}"

max-sixty added a commit to max-sixty/prql that referenced this pull request Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@aljazerzen aljazerzen aljazerzen left review comments

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /