1

I have a single table (table_1) of schema below:

row_id identifier col1 col2 col3 col4 status
1 A 1 2 3 4 
2 A 2 3 4 5 
3 B 1 2 3 4
4 B 2 3 4 6
5 C 1 2 3 4

I want to join on identifier = A such that any rows with identifier != A but which match on values for col1, col2, col3 and col4 will update the status column with values for any matched ('B,C'), and another for mismatch (foo).

Note that the status column values need to update with the identifier(s) matched.

row_id identifier col1 col2 col3 col4 status
1 A 1 2 3 4 B,C
2 A 2 3 4 5 foo
3 B 1 2 3 4
4 B 2 3 4 6
5 C 1 2 3 4

I've used a single SELECT statement with multiple subqueries for each separate column to match to on 'A', but had to explicitly select other identifiers and resulted in errors.

Corrected row_id to show the primary key.

asked Sep 25, 2017 at 0:03

2 Answers 2

2

You can do it using a self-join and an aggregate function (string_agg) to aggregate the identifiers that are different from the current row as below.

The first and last lines initializes the status column to NULL and finalizes it to foo if it isn't assigned, respectively:

UPDATE table_1 t1 SET status = NULL;
UPDATE table_1 t1
SET status = (
 SELECT string_agg(t2.identifier,',') FROM table_1 t2
 WHERE t2.identifier != t1.identifier
 AND t1.col1 = t2.col1
 AND t1.col2 = t2.col2
 AND t1.col3 = t2.col3
 AND t1.col4 = t2.col4
);
UPDATE table_1 t1 SET status = 'foo' WHERE status IS NULL;
answered Sep 25, 2017 at 1:39
0
update test_table t1
inner join (
 select *, 
 (case 
 when status_value is null then 'foo'
 else status_value
 end) as to_update_status from
 (select *, group_concat(case
 when identifier='A' then null
 else identifier 
 end
 ) as status_value from test_table
 group by col1, col2, col3, col4) as t1 where identifier='A' 
 ) t2 on t1.row_id = t2.row_id 
 and t1.identifier=t2.identifier
 and t1.col1 = t2.col1
 and t1.col2 = t2.col2
 and t1.col3 = t2.col3
 and t1.col4 = t2.col4
 set t1.status = t2.to_update_status;

In the above query, test_table is the name of the table.

Consider changing the structure of your table. It should have a unique identifier.(From the name, the column row_id should be unique, but it is not in the data you provided, so I had to put 5 additional checks in the join query). If your row has a unique identifier, you could select just the row_id and to_update_status in the subquery and simply check the row_id

You should also have an index on identifer column and a multi-column index on col1, col2, col3, col4.

answered Sep 25, 2017 at 1:41

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.