-
Notifications
You must be signed in to change notification settings - Fork 91
-
I'm not sure if there's an issue here somewhere or me just misunderstanding how to do this properly, I'm generating a python relation where one of the columns is created via gen_random_uuid(), e.g.:
import duckdb db = duckdb.connect() my_relation = db.sql("select gen_random_uuid() as id, 'hello' as some_col")
Just printing this relation out gives me a consistent UUID each time:
>>> my_relation
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ 72aaf191-4c9b-4693-a023-a4c71512534b │ hello │
└──────────────────────────────────────┴──────────┘
>>> my_relation
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ 72aaf191-4c9b-4693-a023-a4c71512534b │ hello │
└──────────────────────────────────────┴──────────┘
However, if I interact with the relation in any other way (select from it, or call df() on it), the UUID gets regenerated each time:
>>> db.sql("select * from my_relation")
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ 29d7d850-ac7e-4ac6-bfc3-0a6cda2c617c │ hello │
└──────────────────────────────────────┴──────────┘
>>> db.sql("select * from my_relation")
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ b042e725-1e43-4fac-b6f1-f105e11add60 │ hello │
└──────────────────────────────────────┴──────────┘
I assume this has to do with the python relational API's lazy evaluation? If I create a table (like with to_table("some_name")) and then select from that table the problem goes away, so it seems like the two possible ways to get around this are:
- Immediately turn the relation into a dataframe, and then back into a relation by selecting from it
my_relation_df = my_relation.df() my_relation = db.sql("select * from my_relation_df")
- Convert the relation to a table and then get it as a relation or just use the table name in later queries
db.sql("select * from my_relation").to_table("temp_relation_table") my_relation = db.table("temp_relation_table")
But I'm curious if I'm missing something obvious/is there a better or conventional way of dealing with this? This problem is coming up because I have multiple queries later on working with my_relation, and combining all those results into a final relation that actually gets written as a table, but each later query is incorrectly working with a different set of UUIDs. Is it considered bad practice to create temporary relations like this instead of always making them actual tables and doing any processing on the tables themselves?
Beta Was this translation helpful? Give feedback.