I want to randomly populate a column in my table with the following values: Available, On Hold, Partial Downpayment, Sold/Unavailable, Token
I have tried using the following command but it populates the entire column with the first value it gets.
update mytable
set sold_status = (select (array['Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'])[floor(random() * 5 + 1)])
I know how to achieve this with a programming language, but I would prefer a PostgreSQL query to achive this.
4 Answers 4
While PostgreSQL tracks the volatility of simple expressions, it does not track the volatility of sub-selects which do not contain references to the outer query. You could fix this by introducing an artificial reference to the outer query, but in your case you don't need a sub-select in the first place, so just get rid of it.
update mytable
set sold_status = (array[
'Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'
]) [floor(random() * 5 + 1)];
Comments
Postgres thinks it is doing you a favor by running random()
only once -- forgetting that random()
is a volatile function.
You can fix this using case
:
update mytable
set sold_status = (case (random()*5)::int when 0 then 'Available' when 1 then 'On Hold' when 2 then 'Partial Downpayment' when 3 then 'Sold/Unavailable' when 4 then 'Token Recieved' end);
Here is a db<>fiddle.
Comments
Since the header of this page is "Populate a table column ..."
I am putting here my method to create random values in column from a list:
CREATE TABLE IF NOT EXISTS cars(
id SERIAL PRIMARY KEY,
brand VARCHAR(30)
);
INSERT INTO cars (
brand
)
SELECT
(array['BMW','AUDI','MERCEDES-BENZ'])[floor(random() * 3 + 1)]
FROM generate_series(1, 10);
It will create 10 rows with random values from brand's list.
Comments
The accpeted answer is incorrect!
You can see this here (all of the code below is in the fiddle here):
CREATE TABLE tab AS
SELECT
id,
NULL AS rand_x5,
NULL AS sold_status
FROM
generate_series(1, 10000) AS t(id);
and then run:
WITH r AS
(
SELECT j, (RANDOM() * 5)::INT AS x
FROM GENERATE_SERIES(1, 10000) AS j
)
UPDATE tab
SET
rand_x5 = r.x,
sold_status =
CASE r.x -- note different brackets (ranges - inclusive/exclusive boundaries)
WHEN 0 THEN 'Available' -- only happens from [0 - 0.5)
WHEN 1 THEN 'On Hold' -- happens from [0.5 - 1.5) &c.
WHEN 2 THEN 'Partial'
WHEN 3 THEN 'Sold'
WHEN 4 THEN 'Token'
END
FROM r
WHERE tab.id = r.j;
and then run the aggregate query:
SELECT
rand_x5, sold_status, COUNT(rand_x5), COUNT(sold_status)
FROM tab
GROUP by rand_x5, sold_status
ORDER BY rand_x5 DESC NULLS first;
Result:
rand_x5 sold_status count count
5 null 1047 0
4 Token 2062 2062
3 Sold 1995 1995
2 Partial 1912 1912
1 On Hold 2006 2006
0 Available 978 978
SELECT 6 -- << 6 instead of 5
Now, there are two major problems with this:
there are 6 possible values (0 - 5), not 5 in the output of
(RANDOM() * 5)::INT
. Since you haven't dealt with the 5 in theCASE
expression,sold_status
becomesNULL
the first value (0) and the last (5) are skewed, in that they'll only have 1/2 of the number of values of the others (1 - 4) if you generate a large number of them - since they will only be generated over a range of 0.5 whereas the others are generated over a range of 1
The correct code for this should be FLOOR((RANDOM() * 5))
- see bottom of fiddle. The result when this is run is:
rand_x5 sold_status count count
0 Available 1938 1938
1 On Hold 2048 2048
2 Partial 1983 1983
3 Sold 2001 2001
4 Token 2030 2030
No NULL
s and all values are (within reason) equally balanced! The CAST
ing to INT
in (RANDOM() * 5)::INT
is the same as what the ROUND() function does and which leads to the errors. FLOOR always rounds down and this issue doesn't occur.
Comments
Explore related questions
See similar questions with these tags.