-
Notifications
You must be signed in to change notification settings - Fork 91
-
Context:
I'm currently using python + multiple threads to concurrently process many isolated read-only queries.
The parent then gathers responses and writes back to duckdb. All of this happens in the context of a single transaction.
Each worker thread uses a db.cursor() as explained in the documentation.
A new requirement has been introduced that makes the main process write to duckdb before the workers start, but after the transaction has started.
Issue:
I'm observing the unexpected (for me) behavior that the worker threads have no visibility over the uncommitted data introduced before .cursor() was invoked.
Here's a short demo without all the threading abstracted:
#!/usr/bin/env python import duckdb con = duckdb.connect() con.execute("CREATE TABLE t(i INTEGER)") con.begin() con.execute("INSERT INTO t VALUES (1)") parent_count = con.execute("SELECT COUNT(*) FROM t").fetchone()[0] cursor = con.cursor() cursor_count_before_commit = cursor.execute("SELECT COUNT(*) FROM t").fetchone()[0] con.commit() cursor_count_after_commit = cursor.execute("SELECT COUNT(*) FROM t").fetchone()[0] print(f"DuckDB version: {duckdb.__version__}") print(f"Parent connection count inside transaction: {parent_count}") print(f"cursor() count before parent commit: {cursor_count_before_commit}") print(f"cursor() count after parent commit: {cursor_count_after_commit}") assert parent_count == 1 assert cursor_count_before_commit == 0 assert cursor_count_after_commit == 1
I'm unsure if this is expected behavior.
Either the cursor shares the full context to the db connection, in which case it should be able to read the uncommited data.
Or the cursor does not share is state with the parent, and behaves more as a separate connection. If this is the case then the documentation could be more explicit about this.
Beta Was this translation helpful? Give feedback.