Populate a table column randomly with values from a list in PostgreSQL

Question 1

I want to randomly populate a column in my table with the following values: Available, On Hold, Partial Downpayment, Sold/Unavailable, Token

I have tried using the following command but it populates the entire column with the first value it gets.

update mytable
set sold_status = (select (array['Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'])[floor(random() * 5 + 1)])

I know how to achieve this with a programming language, but I would prefer a PostgreSQL query to achive this.

Question 2

While PostgreSQL tracks the volatility of simple expressions, it does not track the volatility of sub-selects which do not contain references to the outer query. You could fix this by introducing an artificial reference to the outer query, but in your case you don't need a sub-select in the first place, so just get rid of it.

update mytable
set sold_status = (array[
 'Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'
 ]) [floor(random() * 5 + 1)];

Question 3

Postgres thinks it is doing you a favor by running random() only once -- forgetting that random() is a volatile function.

You can fix this using case:

update mytable
 set sold_status = (case (random()*5)::int when 0 then 'Available' when 1 then 'On Hold' when 2 then 'Partial Downpayment' when 3 then 'Sold/Unavailable' when 4 then 'Token Recieved' end);

Here is a db<>fiddle.

Question 4

Since the header of this page is "Populate a table column ..." I am putting here my method to create random values in column from a list:

CREATE TABLE IF NOT EXISTS cars(
 id SERIAL PRIMARY KEY,
 brand VARCHAR(30)
);
INSERT INTO cars (
 brand
)
SELECT
 (array['BMW','AUDI','MERCEDES-BENZ'])[floor(random() * 3 + 1)]
FROM generate_series(1, 10);

It will create 10 rows with random values from brand's list.

Question 5

The accpeted answer is incorrect!

You can see this here (all of the code below is in the fiddle here):

CREATE TABLE tab AS
SELECT
 id,
 NULL AS rand_x5,
 NULL AS sold_status
FROM 
 generate_series(1, 10000) AS t(id);

and then run:

WITH r AS
(
 SELECT j, (RANDOM() * 5)::INT AS x
 FROM GENERATE_SERIES(1, 10000) AS j
)
UPDATE tab
SET 
 rand_x5 = r.x, 
 sold_status =
 CASE r.x -- note different brackets (ranges - inclusive/exclusive boundaries)
 WHEN 0 THEN 'Available' -- only happens from [0 - 0.5) 
 WHEN 1 THEN 'On Hold' -- happens from [0.5 - 1.5) &c.
 WHEN 2 THEN 'Partial'
 WHEN 3 THEN 'Sold'
 WHEN 4 THEN 'Token'
 END 
FROM r
WHERE tab.id = r.j;

and then run the aggregate query:

SELECT
 rand_x5, sold_status, COUNT(rand_x5), COUNT(sold_status)
FROM tab
GROUP by rand_x5, sold_status
ORDER BY rand_x5 DESC NULLS first;

Result:

rand_x5 sold_status count count
 5 null 1047 0
 4 Token 2062 2062
 3 Sold 1995 1995
 2 Partial 1912 1912
 1 On Hold 2006 2006
 0 Available 978 978
SELECT 6 -- << 6 instead of 5

Now, there are two major problems with this:

there are 6 possible values (0 - 5), not 5 in the output of (RANDOM() * 5)::INT. Since you haven't dealt with the 5 in the CASE expression, sold_status becomes NULL
the first value (0) and the last (5) are skewed, in that they'll only have 1/2 of the number of values of the others (1 - 4) if you generate a large number of them - since they will only be generated over a range of 0.5 whereas the others are generated over a range of 1

The correct code for this should be FLOOR((RANDOM() * 5)) - see bottom of fiddle. The result when this is run is:

rand_x5 sold_status count count
0 Available 1938 1938
1 On Hold 2048 2048
2 Partial 1983 1983
3 Sold 2001 2001
4 Token 2030 2030

No NULLs and all values are (within reason) equally balanced! The CASTing to INT in (RANDOM() * 5)::INT is the same as what the ROUND() function does and which leads to the errors. FLOOR always rounds down and this issue doesn't occur.

jjanes jjanes 44.8k5 gold badges38 silver badges48 bronze badges · Accepted Answer · 2021-05-24 17:15:05Z

While PostgreSQL tracks the volatility of simple expressions, it does not track the volatility of sub-selects which do not contain references to the outer query. You could fix this by introducing an artificial reference to the outer query, but in your case you don't need a sub-select in the first place, so just get rid of it.

update mytable
set sold_status = (array[
 'Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'
 ]) [floor(random() * 5 + 1)];

CollectivesTM on Stack Overflow

Populate a table column randomly with values from a list in PostgreSQL

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related