1

The following code generates 100000 rows with random values for the Id column with uuid_generate_v4(). However, the nested selects are always choosing the same row so all the inserted rows have the same values for those columns. The goal is to create a table with 100k rows with random values taken from the other sample tables. Each of the sample tables only have two columns (Id and the column from which the values are taken). How can this be archived?

insert into "Tag" (
 "Id", "Time", "Account", "Name", "Value", "RollUpTableId"
)
select
 uuid_generate_v4(),
 current_timestamp,
 (select "Account" from "AccountSamples" OFFSET floor(random()*358) LIMIT 1),
 (select "Name" from "TagNameSamples" OFFSET floor(random()*19) LIMIT 1),
 (select "Value" from "TagValueSamples" OFFSET floor(random()*26) LIMIT 1),
 uuid_generate_v4()
from generate_series(1, 100000);

I've also tried with select "Account" from "AccountSamples" where "Id" = (trunc(random() * 358)::integer)

asked Sep 17, 2020 at 20:04

2 Answers 2

2

Likely, Postgres is optimizing the subqueries, and does not reexcutes them for each row.

I would recommend enumerating randomly in subqueries, then joining:

select uuid_generate_v4(), a."Account", tns."Name", tvs."Value"
from (
 select "Account", row_number() over(order by random()) rn from "AccountSamples"
) a
inner join (
 select "Name", row_number() over(order by random()) rn from "TagNameSamples"
) tns on tns.rn = a.rn
inner join (
 select "Value", row_number() over(order by random()) rn from "TagValueSamples"
) tvs on tvs.rn = a.rn
where a.rn <= 10

This is not the exact same logic as in your original query, because a given row may only be selected once - but I think that's a reasonable approximation.

If some of your tables may have more than 10 rows, then generate_series() and left joins are safer:

select uuid_generate_v4(), a."Account", tns."Name", tvs."Value"
from generate_series(1, 10) x(rn)
left join (
 select "Account", row_number() over(order by random()) rn from "AccountSamples"
) a on a.rn = x.rn
left join (
 select "Name", row_number() over(order by random()) rn from "TagNameSamples"
) tns on tns.rn = x.rn
left join (
 select "Value", row_number() over(order by random()) rn from "TagValueSamples"
) tvs on tvs.rn = x.rn
answered Sep 17, 2020 at 20:07

7 Comments

The first option only created up to 19 rows. And the second, modifying the number in generate_series(1, 1000) created the required amount but filled with mostly nulls, there were some values here and there however.
Here is an example of what was generated i.imgur.com/UiYYV1k.png
@user33276346: it looks like you have null values in the original tables. You might want to filter them out with where clauses in the subqueries.
No null in any of the tables. They have 358, 19 and 26 values, verified. I'm not putting those numbers in the second query you provided, though, should I?
By the way, what is x(rn) doing there?
|
1

I'm sharing, for anyone who might deal with something similar, an answer by pehrs from Reddit that solves the problem.

In your first solution, the problem is that the subselect does not have any external dependency, so it will be optimized to call random only once. You can fix this by adding an external dependency. Try something like this:

select uuid_generate_v4(),
 current_timestamp,
 (select "Account" from "AccountSamples" WHERE gen=gen OFFSET floor(random()*358) LIMIT 1),
 (select "Name" from "TagNameSamples" WHERE gen=gen OFFSET floor(random()*19) LIMIT 1),
 (select "Value" from "TagValueSamples" WHERE gen=gen OFFSET floor(random()*26) LIMIT 1 ),
 uuid_generate_v4()
 from generate_series(1, 100000) gen;

By the way, the typical way to pick randomly from a table without having to precalculate the size of the table is something like

SELECT foo FROM bar ORDER BY random() LIMIT 1

It is not overly performant, but it is simple and well understood.

answered Sep 18, 2020 at 13:38

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.