Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Feat: support Template Strings, eg .sql(t"SELECT {my_col} * 2 from {my_table} JOIN {other_table}") #370

NickCrews started this conversation in Ideas
Discussion options

Update: I am working on a prototype at https://github.com/NickCrews/duckdb-python/tree/t-strings

Python 3.14 ships with a new exciting feature for SQL libraries: t-strings. I think we should use them here to make our APIs even friendlier

I have been working on a similar PR in ibis. See that for background info.

This feature would allow turning (from @RobinL 's blog, tagging him in in case he has opinions :))

input_data = duckdb.sql("SELECT * FROM read_parquet('...')")
step_1 = duckdb.sql("SELECT ... FROM input_data JOIN ...")
step_2 = duckdb.sql("SELECT ... FROM step_1")
final = duckdb.sql("SELECT ... FROM step_2;")

into

t = duckdb.sql("SELECT * FROM read_parquet('...')")
t = duckdb.sql(t"SELECT ... FROM {t} JOIN ...")
t = duckdb.sql(t"SELECT ... FROM {t};")

which allows for IDE automated refactorings to automatically work, linters like ruff to catch undefined variables, and for users to easily comment lines on and off, or reorder them, etc with things still working.

Or it would allow for auto paramaterization:

duckdb.sql(t"INSERT INTO people ({untrusted_data}, ...")`

which is executed the same as

duckdb.execute(t"INSERT INTO people (1,ドル ...", params={1:untrusted_data})

What I would really love is something like mix and match column expressions and raw string names:

t = duckdb.sql("SELECT * FROM read_parquet('...')")
t = duckdb.sql(t"SELECT {duckdb.ColumnExpression('b').isnull()}, other_col FROM {t}")

Questions:

  • would you want to backport t-strings, or just support python 3.14+ which has t-strings built in?
  • what APIs of this package would support the templating? eg anywhere that SQL is expected?
  • What sorts of objects can get included as values into a template? My initial proposal: Add a Protocol for def __duckdb_sql__(self) -> str. The builtin objects like Relations, ColumnExpressions, etc implement this, eg DuckdbPyRelation just returns self.alias. Then when evaluating a Template string, if an object has this method, we use that, otherwise we assume that it should be included as a parameter. Thus, duckdb.sql(t"SELECT ... FROM {t};") gets turned into the same as duckdb.sql("SELECT ... FROM {t.alias};")Does this make sense?
You must be logged in to vote

Replies: 2 comments 2 replies

Comment options

I'm not sure I fully understand t strings, but am I right in saying they provide a mechanism of avoiding replacement scans?

i.e. DuckDB could get the actual interpolated Python objects directly, instead of having to recover meaning indirectly from SQL identifiers via replacement-scan/name-resolution logic?

If so, this would be a big win, I've mentioned a couple of times before that my team have had a lot of problems with replacements scans and avoid them.

duckdb/duckdb#17033 (reply in thread)
#205 (comment)

I also agree with the benefits re linters

You must be logged in to vote
1 reply
Comment options

DuckDB could get the actual interpolated Python objects directly,

Yes, that is the exact idea. Consider:

my_id=5
mytable = duckdb.sql("SELECT * from penguins")
query = t"select * from {mytable} where id = {my_id}`

The tstring is just syntactic sugar, you could get the same query by writing

Template(
 "select * from ",
 Interpolation(mytable, 'mytable', None, ''),
 " where id = ",
 Interpolation(5, 'my_id', None, ''),
)

Then, pseudocode implementation for how to deal with these:

def process(
 query: Statement | str | Interpolation,
) -> tuple[str, dict[str, Any]]:
 final_params = {}
 if isinstance(query, str):
 final_sql = raw
 else:
 // Its a Template
 final_sql = ""
 for item in raw:
 if isinstance(item, str):
 // eg "select * from "
 final_sql += item
 elif hasattr(item.value, "__duckdb_sql__"):
 // eg Interpolation(mytable, 'mytable', None, '')
 sql, params = item.value.__duckdb_sql__()
 final_sql += sql
 final_params.update(params) // I haven't fully thought this through. Do we need to avoid param naming conflicts??
 else:
 // eg Interpolation(5, 'my_id', None, '')
 param_id = str(uuid.UUID()) // or something like this
 final_sql += f"${param_id}"
 final_params[param_id] = item.value
 return final_sql, final_params
Comment options

Would be pretty cool indeed!
I implemented something somewhat similar in my current project:
(IntoDict is a Mapping[K, V] or an Iterable[tuple[K, V]])

https://github.com/OutSquareCapital/pql/blob/7e936f0a9c14f8c9ecd122674ff81e9801c3dbc1/src/pql/sql/_creation.py#L55

Basically allow to do this afterward:

import polars as pl
import pql
def _get_qry() -> str:
 return """
 SELECT * FROM df
 WHERE a > 1
 ORDER BY b
 """
def _get_df() -> pl.DataFrame:
 return pl.DataFrame({"a": [1, 2, 3], "b": [3, 6, 1]})
def main() -> None:
 rel = pql.from_query(_get_qry(), df=_get_df())
 print(rel)
if __name__ == "__main__":
 main()

output:

┌───────┬───────┐
│ a │ b │
│ int64 │ int64 │
├───────┼───────┤
│ 3 │ 1 │
│ 2 │ 6 │
└───────┴───────┘

This is nice, because I can directly refer the dataframe I want to pass to the query in the parameter of the function from_query, rather than a dummy variable df that often need to be renamed to _df to avoid linter issues, and who is less "intuitively" linked to the actual query (+ I save one line of assignement).
I admit that the eval usage is a bit hacky and could surely be replaced by something else, but this seems like a low cost improvement to the existing replacement scan API.

You must be logged in to vote
1 reply
Comment options

Using t-strings I think you could maybe make this even friendlier, with no need for this feature in duckdb?

import polars as pl
import pql
from string.templatelib import Template
def _get_qry(df: pl.DataFrame) -> Template:
 return t"""
 SELECT * FROM {df}
 WHERE a > 1
 ORDER BY b
 """
def _get_df() -> pl.DataFrame:
 return pl.DataFrame({"a": [1, 2, 3], "b": [3, 6, 1]})
def main() -> None:
 rel = pql.from_template(_get_qry(_get_df()))
 print(rel)
if __name__ == "__main__":
 main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /