I am stuck on the following task (PostgreSQL 9.3). Let's say we have the following table1 (which has 10k rows):
table1:
id
754
800
330
4
59
My goal is to create another table2 with two columns (source, target), where the values of both columns are a random selection of table1.id values. For example:
table2:
source | target
754 | 59
4 | 4
59 | 330
This is what I've done:
CREATE TABLE table2
(
id serial NOT NULL,
source integer,
target integer,
distance double precision
);
-- Select 300 table1.id values and insertion into table2.source
INSERT INTO table2(source)
SELECT id FROM table1 ORDER BY RANDOM() LIMIT 300;
-- Select 300 table1.id values and updating table2.target
UPDATE table2 SET target = i.id
FROM (SELECT id FROM table1 ORDER BY RANDOM() LIMIT 300) i;
I got the following result:
source | target
754 | 59
330 | 59
800 | 59
Unfortunately all the table2.target
values are all the same. How can I update table2.target
with different random values (like in the example) ? Or maybe UPDATE
is not the good way for doing this?
1 Answer 1
I suggest a "data-modifying CTE":
WITH cte AS (
SELECT *, row_number() OVER () AS rn
FROM (
SELECT id
FROM tbl
ORDER BY random()
LIMIT 600 -- 2 x 300
) sub
)
INSERT INTO table2(source, target)
SELECT c1.id, c2.id
FROM cte c1
JOIN cte c2 ON c2.rn = c1.rn + 300;
In the CTE:
- select 600 random rows (to create 300 new rows)
- add a row number in the outer SELECT.
Then couple two values in a self join with 300 offset.
To get random rows from a huge table cheaply, consider:
Best way to select random rows PostgreSQL
-
Hello Erwin ! Thanks a lot for your answer. Your solution works great. I learned from it.Theo– Theo2014年07月03日 15:39:49 +00:00Commented Jul 3, 2014 at 15:39
-
Another idea would be to add
random() AS rnd
in the subquery, addLAG(id) OVER (ORDER BY rnd) AS lagid
in the cte and then useINSERT ... SELECT id, lagid FROM cte WHEN rn % 2 = 0;
ypercubeᵀᴹ– ypercubeᵀᴹ2014年08月09日 18:35:47 +00:00Commented Aug 9, 2014 at 18:35