-
Notifications
You must be signed in to change notification settings - Fork 91
duckdb and postgres connection management and thread safety in a multithreaded, long lived process #319
-
I'm trying to optimise DuckDB + Postgres connection handling in a long-lived, multithreaded application, but, after reading the docs, I'm not quite sure what the best way is to put all the moving parts together, so I'm looking for some advice.
Preface
I only ever want to perform reads, through DuckDB into postgres, e.g. conn.sql("SELECT * FROM postgres_scan(...);").fetchall()
Postgres side
The two methods to connect to postgres:
ATTACHpostgrespostgres_scan
From what I can tell, ATTACHing a postgres DB will establish a connection and keep it open, making it not a suitable method for this application, since it would not allow concurrent reads because they'd all share the same postgres connection.
I'm assuming (couldn't find anything about this in the docs, and a cursory look into the source also didn't yield a result) that postgres_scan creates an ad-hoc connection, so it feels like that's what I should use.
DuckDB side
According to this, the connection is inherently not thread safe. It does mention .cursor, but also says this about the cursor:
[...] does not open a new connection. Therefore, all cursors created from one connection cannot run queries at the same time.
This seems to imply that within the same DuckDBPyRelation, effectively no multithreading is possible or safe.
However, there's also this guide, which does show using threads and .cursor for concurrent reads / writes, leaving me a bit confused.
Baring this guide, I would have concluded that what I actually want is some sort of connection pool, with one DuckDBPyRelation per thread, but now I'm not sure what to think.
I would appreciate any insights in how this is supposed to be handled, or corrections if I'm missing something really obvious :)
Beta Was this translation helpful? Give feedback.