Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Using gen_random_uuid in python relational API #263

Unanswered
WarmCyan asked this question in Q&A
Discussion options

I'm not sure if there's an issue here somewhere or me just misunderstanding how to do this properly, I'm generating a python relation where one of the columns is created via gen_random_uuid(), e.g.:

import duckdb
db = duckdb.connect()
my_relation = db.sql("select gen_random_uuid() as id, 'hello' as some_col")

Just printing this relation out gives me a consistent UUID each time:

>>> my_relation
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ 72aaf191-4c9b-4693-a023-a4c71512534b │ hello │
└──────────────────────────────────────┴──────────┘
>>> my_relation
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ 72aaf191-4c9b-4693-a023-a4c71512534b │ hello │
└──────────────────────────────────────┴──────────┘

However, if I interact with the relation in any other way (select from it, or call df() on it), the UUID gets regenerated each time:

>>> db.sql("select * from my_relation")
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ 29d7d850-ac7e-4ac6-bfc3-0a6cda2c617c │ hello │
└──────────────────────────────────────┴──────────┘
>>> db.sql("select * from my_relation")
┌──────────────────────────────────────┬──────────┐
│ id │ some_col │
│ uuid │ varchar │
├──────────────────────────────────────┼──────────┤
│ b042e725-1e43-4fac-b6f1-f105e11add60 │ hello │
└──────────────────────────────────────┴──────────┘

I assume this has to do with the python relational API's lazy evaluation? If I create a table (like with to_table("some_name")) and then select from that table the problem goes away, so it seems like the two possible ways to get around this are:

  1. Immediately turn the relation into a dataframe, and then back into a relation by selecting from it
my_relation_df = my_relation.df()
my_relation = db.sql("select * from my_relation_df")
  1. Convert the relation to a table and then get it as a relation or just use the table name in later queries
db.sql("select * from my_relation").to_table("temp_relation_table")
my_relation = db.table("temp_relation_table")

But I'm curious if I'm missing something obvious/is there a better or conventional way of dealing with this? This problem is coming up because I have multiple queries later on working with my_relation, and combining all those results into a final relation that actually gets written as a table, but each later query is incorrectly working with a different set of UUIDs. Is it considered bad practice to create temporary relations like this instead of always making them actual tables and doing any processing on the tables themselves?

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant

AltStyle によって変換されたページ (->オリジナル) /