-
Notifications
You must be signed in to change notification settings - Fork 91
-
Update: I am working on a prototype at https://github.com/NickCrews/duckdb-python/tree/t-strings
Python 3.14 ships with a new exciting feature for SQL libraries: t-strings. I think we should use them here to make our APIs even friendlier
I have been working on a similar PR in ibis. See that for background info.
This feature would allow turning (from @RobinL 's blog, tagging him in in case he has opinions :))
input_data = duckdb.sql("SELECT * FROM read_parquet('...')") step_1 = duckdb.sql("SELECT ... FROM input_data JOIN ...") step_2 = duckdb.sql("SELECT ... FROM step_1") final = duckdb.sql("SELECT ... FROM step_2;")
into
t = duckdb.sql("SELECT * FROM read_parquet('...')") t = duckdb.sql(t"SELECT ... FROM {t} JOIN ...") t = duckdb.sql(t"SELECT ... FROM {t};")
which allows for IDE automated refactorings to automatically work, linters like ruff to catch undefined variables, and for users to easily comment lines on and off, or reorder them, etc with things still working.
Or it would allow for auto paramaterization:
duckdb.sql(t"INSERT INTO people ({untrusted_data}, ...")`
which is executed the same as
duckdb.execute(t"INSERT INTO people (1,ドル ...", params={1:untrusted_data})
What I would really love is something like mix and match column expressions and raw string names:
t = duckdb.sql("SELECT * FROM read_parquet('...')") t = duckdb.sql(t"SELECT {duckdb.ColumnExpression('b').isnull()}, other_col FROM {t}")
Questions:
- would you want to backport t-strings, or just support python 3.14+ which has t-strings built in?
- what APIs of this package would support the templating? eg anywhere that SQL is expected?
- What sorts of objects can get included as values into a template? My initial proposal: Add a Protocol for
def __duckdb_sql__(self) -> str. The builtin objects like Relations, ColumnExpressions, etc implement this, eg DuckdbPyRelation just returnsself.alias. Then when evaluating a Template string, if an object has this method, we use that, otherwise we assume that it should be included as a parameter. Thus,duckdb.sql(t"SELECT ... FROM {t};")gets turned into the same asduckdb.sql("SELECT ... FROM {t.alias};")Does this make sense?
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1
Replies: 2 comments 2 replies
-
I'm not sure I fully understand t strings, but am I right in saying they provide a mechanism of avoiding replacement scans?
i.e. DuckDB could get the actual interpolated Python objects directly, instead of having to recover meaning indirectly from SQL identifiers via replacement-scan/name-resolution logic?
If so, this would be a big win, I've mentioned a couple of times before that my team have had a lot of problems with replacements scans and avoid them.
duckdb/duckdb#17033 (reply in thread)
#205 (comment)
I also agree with the benefits re linters
Beta Was this translation helpful? Give feedback.
All reactions
-
DuckDB could get the actual interpolated Python objects directly,
Yes, that is the exact idea. Consider:
my_id=5 mytable = duckdb.sql("SELECT * from penguins") query = t"select * from {mytable} where id = {my_id}`
The tstring is just syntactic sugar, you could get the same query by writing
Template( "select * from ", Interpolation(mytable, 'mytable', None, ''), " where id = ", Interpolation(5, 'my_id', None, ''), )
Then, pseudocode implementation for how to deal with these:
def process( query: Statement | str | Interpolation, ) -> tuple[str, dict[str, Any]]: final_params = {} if isinstance(query, str): final_sql = raw else: // Its a Template final_sql = "" for item in raw: if isinstance(item, str): // eg "select * from " final_sql += item elif hasattr(item.value, "__duckdb_sql__"): // eg Interpolation(mytable, 'mytable', None, '') sql, params = item.value.__duckdb_sql__() final_sql += sql final_params.update(params) // I haven't fully thought this through. Do we need to avoid param naming conflicts?? else: // eg Interpolation(5, 'my_id', None, '') param_id = str(uuid.UUID()) // or something like this final_sql += f"${param_id}" final_params[param_id] = item.value return final_sql, final_params
Beta Was this translation helpful? Give feedback.
All reactions
-
Would be pretty cool indeed!
I implemented something somewhat similar in my current project:
(IntoDict is a Mapping[K, V] or an Iterable[tuple[K, V]])
Basically allow to do this afterward:
import polars as pl import pql def _get_qry() -> str: return """ SELECT * FROM df WHERE a > 1 ORDER BY b """ def _get_df() -> pl.DataFrame: return pl.DataFrame({"a": [1, 2, 3], "b": [3, 6, 1]}) def main() -> None: rel = pql.from_query(_get_qry(), df=_get_df()) print(rel) if __name__ == "__main__": main()
output:
┌───────┬───────┐ │ a │ b │ │ int64 │ int64 │ ├───────┼───────┤ │ 3 │ 1 │ │ 2 │ 6 │ └───────┴───────┘
This is nice, because I can directly refer the dataframe I want to pass to the query in the parameter of the function from_query, rather than a dummy variable df that often need to be renamed to _df to avoid linter issues, and who is less "intuitively" linked to the actual query (+ I save one line of assignement).
I admit that the eval usage is a bit hacky and could surely be replaced by something else, but this seems like a low cost improvement to the existing replacement scan API.
Beta Was this translation helpful? Give feedback.
All reactions
-
Using t-strings I think you could maybe make this even friendlier, with no need for this feature in duckdb?
import polars as pl import pql from string.templatelib import Template def _get_qry(df: pl.DataFrame) -> Template: return t""" SELECT * FROM {df} WHERE a > 1 ORDER BY b """ def _get_df() -> pl.DataFrame: return pl.DataFrame({"a": [1, 2, 3], "b": [3, 6, 1]}) def main() -> None: rel = pql.from_template(_get_qry(_get_df())) print(rel) if __name__ == "__main__": main()
Beta Was this translation helpful? Give feedback.