I'm using a query like:
SELECT * FROM items ORDER BY RANDOM()
All is well if the number of rows is low. In my tests however, I would like to have something reproducible to verify. This is why I'm seeding the random number generator:
SELECT setseed(0.123);
SELECT * FROM items ORDER BY RANDOM();
It's nice and working well. It looks like the order is same time on each execution. Except that it's not completely reproducible. In some cases, the test succeeds and I get the expected order and result. In some execution of the same test, I don't. Why is that?
3 Answers 3
You might have better luck with a subquery:
SELECT setseed(0.123);
SELECT *
FROM (SELECT i.*, RANDOM() as rand
items i
) i
ORDER BY rand;
The reason is that the function RANDOM()
is called many times during the sorting. Some sorting algorithms are non-deterministic -- and that affects downstream rows.
This isn't 100% guaranteed, because the subquery could still not be processed in order (although it should be on a single processor system). But you can further rectify this by using a hash rather than a random value. So:
order by md5(item_id || '0.123')
The item_id
is assumed to be different on each row. The '0.123'
is the added so you can easily change the ordering.
Comments
The problem is linked to the fact that rows are first fetched in an unspecified order (if no ORDER BY clause is specified), and only then is the RANDOM() function called for each row. This means that the unspecified order will impact the row order after the ORDER BY RANDOM() is applied.
Example, using the same seed in both cases:
case 1
SELECT * FROM items
returns
item_1
item_2
item_3
item_4
SELECT * FROM items ORDER BY RANDOM();
may return
item_3
item_4
item_1
item_2
case 2
SELECT * FROM items
returns
item_4
item_3
item_2
item_1
SELECT * FROM items ORDER BY RANDOM();
may return
item_2
item_1
item_4
item_3
The solution is then to order the rows before ordering them by RANDOM(). The end result is 100% deterministic.
Comments
You seem to want a repeatable random sort.
setseed()
is the correct approach, however you need to set it within the query, so it applies to all further invocations of random()
.
Here is one solution using union all
:
select item_id
from (
select setseed(0.5), null item_id
union all
select null, item_id from items
offset 1
) s
order by random()
This demonstrates how to proceed with a table that has only one column. You can extend this for more columns by adding more null
columns to the first subquery (and accordingly listing the corresponding columns in the other union all
member and in the outer query).
Comments
Explore related questions
See similar questions with these tags.