3

I'm using a query like:

SELECT * FROM items ORDER BY RANDOM()

All is well if the number of rows is low. In my tests however, I would like to have something reproducible to verify. This is why I'm seeding the random number generator:

SELECT setseed(0.123);
SELECT * FROM items ORDER BY RANDOM();

It's nice and working well. It looks like the order is same time on each execution. Except that it's not completely reproducible. In some cases, the test succeeds and I get the expected order and result. In some execution of the same test, I don't. Why is that?

GMB
224k25 gold badges102 silver badges151 bronze badges
asked Jun 30, 2020 at 17:35

3 Answers 3

6

You might have better luck with a subquery:

SELECT setseed(0.123);
SELECT *
FROM (SELECT i.*, RANDOM() as rand
 items i
 ) i
ORDER BY rand;

The reason is that the function RANDOM() is called many times during the sorting. Some sorting algorithms are non-deterministic -- and that affects downstream rows.

This isn't 100% guaranteed, because the subquery could still not be processed in order (although it should be on a single processor system). But you can further rectify this by using a hash rather than a random value. So:

order by md5(item_id || '0.123')

The item_id is assumed to be different on each row. The '0.123' is the added so you can easily change the ordering.

answered Jun 30, 2020 at 17:43

Comments

1

The problem is linked to the fact that rows are first fetched in an unspecified order (if no ORDER BY clause is specified), and only then is the RANDOM() function called for each row. This means that the unspecified order will impact the row order after the ORDER BY RANDOM() is applied.

Example, using the same seed in both cases:

case 1

SELECT * FROM items
returns
item_1
item_2
item_3
item_4
SELECT * FROM items ORDER BY RANDOM();
may return
item_3
item_4
item_1
item_2

case 2

SELECT * FROM items
returns
item_4
item_3
item_2
item_1
SELECT * FROM items ORDER BY RANDOM();
may return
item_2
item_1
item_4
item_3

The solution is then to order the rows before ordering them by RANDOM(). The end result is 100% deterministic.

answered Jun 30, 2020 at 17:35

Comments

1

You seem to want a repeatable random sort.

setseed() is the correct approach, however you need to set it within the query, so it applies to all further invocations of random().

Here is one solution using union all:

select item_id
from (
 select setseed(0.5), null item_id
 union all
 select null, item_id from items
 offset 1
) s
order by random()

This demonstrates how to proceed with a table that has only one column. You can extend this for more columns by adding more null columns to the first subquery (and accordingly listing the corresponding columns in the other union all member and in the outer query).

answered Jun 30, 2020 at 18:11

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.