-
Notifications
You must be signed in to change notification settings - Fork 91
Support for SQL identifier composition to avoid f-strings and security linter false positives (B608) #481
-
Context
While developing a QGIS data provider plugin using DuckDB (qduckdb), we build dynamic SQL queries involving:
- table/schema names
- column names
- arbitrary WHERE clauses (
subsetString, a QGIS API concept representing a free-form SQL filter entered by the user)
What already works
DuckDB supports parameterized queries for scalar values via execute(), similar to psycopg2:
# psycopg2 cur.execute(""" INSERT INTO some_table (id, created_at, last_name) VALUES (%s, %s, %s); """, (10, datetime.date(2020, 11, 18), "O'Reilly")) # DuckDB equivalent con.execute( "INSERT INTO some_table (id, created_at, last_name) VALUES (?, ?, ?);", (10, datetime.date(2020, 11, 18), "O'Reilly") )
We already use this wherever possible in our plugin.
What is currently impossible
SQL identifiers (table names, column names) and arbitrary WHERE fragments cannot be passed as ? parameters. We are therefore forced to use f-strings or format:
# _from_clause can be '"schema"."table"' or '(SELECT ... FROM ...)' self._con.sql( f"SELECT count(*) FROM {self._from_clause} WHERE {self.subsetString()}" ) # _column_geom is a geometry column name self._con.sql( f"SELECT st_geometrytype({self._column_geom}) FROM {self._from_clause}" )
These f-strings are flagged as potential SQL injection vulnerabilities by static analysis tools (Bandit, rule B608), even though they are false positives: the values come from our own URI parsing or from the QGIS API, not from direct user input.
Wrapping ? in quotes does not work either, DuckDB treats them as string literals, not placeholders:
# Does NOT work, DuckDB executes: SELECT st_geometrytype('?') FROM '?' con.execute( "SELECT st_geometrytype('?') FROM '?'", [self._column_geom, self._from_clause] )
Reference: how psycopg solves this
psycopg (v3) addresses the SQL identifier case with psycopg.sql:
from psycopg import sql query = sql.SQL("SELECT * FROM {}").format(sql.Identifier(table_name))
DuckDB has no equivalent for this case as far as we know.
Feature request
Would it be feasible to expose a utility similar to psycopg.sql.Identifier in DuckDB's Python API?
Or is there already a recommended approach to safely compose SQL identifiers that we may have missed in the documentation?
This would allow plugin developers to write linter-clean code without resorting to # nosec suppression comments.
Thank you in advance for your responses.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
yeah, you've read it right, there's no sql.Identifier-style helper in duckdb (checked on 1.4.4). ? only binds values, so SELECT ? FROM t with ['name'] returns the literal string 'name', not the column. no quote_ident/quote_literal scalar func either.
what works today is quoting it yourself, same rule psycopg uses: wrap in double quotes and double any embedded quote.
def ident(n): return '"' + n.replace('"', '""') + '"' con.sql(f'SELECT {ident(col)} FROM {ident(tbl)}')
that's injection-safe, I threw name" FROM t; DROP TABLE t; -- at it and it just became one quoted identifier that errors as column-not-found, table left untouched. for table refs the relational API (con.table("schema.tbl").filter(...)) avoids string SQL too, though note con.table() parses qualified names so it's not a general escaper for arbitrary identifiers. for B608 specifically, isolating that one f-string in an ident() helper + a # nosec is the usual pragmatic move.
Beta Was this translation helpful? Give feedback.