How to set a value if certain columns are the same across multiple rows

Question 1

I am doing a data migration and I end up with a temporary table as follows:

curid cuid rtid cd dd rm
10 4 4 2016年01月02日 2016年07月02日 
16 4 4 2016年06月12日 2016年12月12日 Remarks Jun 12
18 5 3 2016年07月18日 2017年07月31日 
8 5 3 2015年06月21日 2016年06月30日 Add some test
11 6 4 2017年01月01日 2017年07月01日 
9 7 3 2017年01月01日 2018年01月31日

I need to split the data into two tables. Based on that same data, it should look like this:

Table A
 id curid cuid rtid
 1 10 4 4 
 2 18 5 3 
 3 11 6 4 
 4 9 7 3

That's one row per distinct (cuid, rtid) plus a curid value picked from each set of duplicates. id is just a sequential number.

Table B
 id curid cd dd rm
 1 10 2016年01月02日 2016年07月02日 
 2 10 2016年06月12日 2016年12月12日 Remarks Jun 12
 3 18 2016年07月18日 2017年07月31日 
 4 18 2015年06月21日 2016年06月30日 Add some test
 5 11 2017年01月01日 2017年07月01日 
 6 9 2017年01月01日 2018年01月31日

The actual curid is irrelevant as long as the records in Table B match the associated record in Table A (so we could even use a temp sequence or something to set the curid).

Question 2

I've edited the question to make it clearer. I still think a window function should solve this problem but I can't figure out how to set and then match the curid for both tables.

Question 3

Your test setup

(Best provided this way in your question - hint!)

CREATE TEMP TABLE tmp (
 curid int
, cuid int
, rtid int
, cd date
, dd date
, rm text);
INSERT INTO tmp VALUES
 (10, 4, 4, '2016-01-02', '2016-07-02', NULL) 
 ,(16, 4, 4, '2016-06-12', '2016-12-12', 'Remarks Jun 12')
 ,(18, 5, 3, '2016-07-18', '2017-07-31', NULL) 
 ,(8 , 5, 3, '2015-06-21', '2016-06-30', 'Add some test')
 ,(11, 6, 4, '2017-01-01', '2017-07-01', NULL)
 ,(9 , 7, 3, '2017-01-01', '2018-01-31', NULL);

Solution

Create target tables if they don't exist:

CREATE TEMP TABLE a (
 id serial
 , curid int -- UNIQUE?
 , cuid int
 , rtid int
);
CREATE TEMP TABLE b (
 id serial
 , curid int
 , cd date
 , dd date
 , rm text
);

Use DISTINCT ON for table A:

INSERT INTO a (curid, cuid, rtid)
SELECT DISTINCT ON (cuid, rtid)
 curid, cuid, rtid
FROM tmp
ORDER BY cuid, rtid, curid -- pick smallest curid per group
RETURNING *;

id | curid | cuid | rtid
-: | ----: | ---: | ---:
 1 | 10 | 4 | 4
 2 | 8 | 5 | 3
 3 | 11 | 6 | 4
 4 | 9 | 7 | 3

Detailed explanation here:

Select first row in each GROUP BY group?

Use a simple window function for table B:

INSERT INTO b (curid, cd, dd, rm)
SELECT min(curid) OVER (PARTITION BY cuid, rtid), cd, dd, rm
FROM tmp
ORDER BY cuid, rtid -- optional
RETURNING *;

id | curid | cd | dd | rm 
-: | ----: | :--------- | :--------- | :-------------
 1 | 10 | 2016年01月02日 | 2016年07月02日 | null 
 2 | 10 | 2016年06月12日 | 2016年12月12日 | Remarks Jun 12
 3 | 8 | 2016年07月18日 | 2017年07月31日 | null 
 4 | 8 | 2015年06月21日 | 2016年06月30日 | Add some test 
 5 | 11 | 2017年01月01日 | 2017年07月01日 | null 
 6 | 9 | 2017年01月01日 | 2018年01月31日 | null

curid is guaranteed to match since we picked the smallest per group in both queries.

dbfiddle here

Question 4

@Lee: Consider the update. Next time, rather start a new question than changing the nature of the existing one after answers have been given - even if you realize you asked the wrong question, that's irrelevant for the general public.

Question 5

@ErwinBrandstetter that you so much for teaching how to better ask questions and for teaching me how to be a better developer. This is a great solution!

Question 6

I think what you're looking for is dense_rank()

create table tmp (curid int, cuid int, rtid int) ;
insert into temp
values
 (176,231,23),
 (192,231,23),
 (145,231,23),
 (122,231,23),
 (122,231,24),
 (122,231,25) ;
select dense_rank() over (order by cuid, rtid) as curid,
 cuid, 
 rtid
from temp ;

Question 7

With clarification, what I'd do in Microsoft SQL is,

SELECT cuid, rtid, IDENTITY(int, 1, 1) AS curid,
COUNT(*) AS how_many
INTO another_table
FROM original_table 
GROUP BY cuid, rtid
ORDER BY cuid, rtid

That creates a new table [another_table] of every combination of (cuid, rtid), with a row id in the new table.

Now, how much of that works in PostgreSQL?

score 3 · Accepted Answer · 2017-07-04 16:06:17Z

Your test setup

(Best provided this way in your question - hint!)

CREATE TEMP TABLE tmp (
 curid int
, cuid int
, rtid int
, cd date
, dd date
, rm text);
INSERT INTO tmp VALUES
 (10, 4, 4, '2016-01-02', '2016-07-02', NULL) 
 ,(16, 4, 4, '2016-06-12', '2016-12-12', 'Remarks Jun 12')
 ,(18, 5, 3, '2016-07-18', '2017-07-31', NULL) 
 ,(8 , 5, 3, '2015-06-21', '2016-06-30', 'Add some test')
 ,(11, 6, 4, '2017-01-01', '2017-07-01', NULL)
 ,(9 , 7, 3, '2017-01-01', '2018-01-31', NULL);

Solution

Create target tables if they don't exist:

CREATE TEMP TABLE a (
 id serial
 , curid int -- UNIQUE?
 , cuid int
 , rtid int
);
CREATE TEMP TABLE b (
 id serial
 , curid int
 , cd date
 , dd date
 , rm text
);

Use DISTINCT ON for table A:

INSERT INTO a (curid, cuid, rtid)
SELECT DISTINCT ON (cuid, rtid)
 curid, cuid, rtid
FROM tmp
ORDER BY cuid, rtid, curid -- pick smallest curid per group
RETURNING *;

id | curid | cuid | rtid
-: | ----: | ---: | ---:
 1 | 10 | 4 | 4
 2 | 8 | 5 | 3
 3 | 11 | 6 | 4
 4 | 9 | 7 | 3

Detailed explanation here:

Select first row in each GROUP BY group?

Use a simple window function for table B:

INSERT INTO b (curid, cd, dd, rm)
SELECT min(curid) OVER (PARTITION BY cuid, rtid), cd, dd, rm
FROM tmp
ORDER BY cuid, rtid -- optional
RETURNING *;

id | curid | cd | dd | rm 
-: | ----: | :--------- | :--------- | :-------------
 1 | 10 | 2016年01月02日 | 2016年07月02日 | null 
 2 | 10 | 2016年06月12日 | 2016年12月12日 | Remarks Jun 12
 3 | 8 | 2016年07月18日 | 2017年07月31日 | null 
 4 | 8 | 2015年06月21日 | 2016年06月30日 | Add some test 
 5 | 11 | 2017年01月01日 | 2017年07月01日 | null 
 6 | 9 | 2017年01月01日 | 2018年01月31日 | null

curid is guaranteed to match since we picked the smallest per group in both queries.

dbfiddle here

@Lee: Consider the update. Next time, rather start a new question than changing the nature of the existing one after answers have been given - even if you realize you asked the wrong question, that's irrelevant for the general public.
@ErwinBrandstetter that you so much for teaching how to better ask questions and for teaching me how to be a better developer. This is a great solution!

Stack Exchange Network

How to set a value if certain columns are the same across multiple rows

3 Answers 3

Your test setup

Solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to set a value if certain columns are the same across multiple rows

3 Answers 3

Your test setup

Solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions