SQLite - Remove duplicates across multiple columns

Question 1

I have the following table:

C1 | C2 | C3
-------------
A | X | 1
A | Y | 2
B | X | 3
B | Y | 4

I want to deduplicate across columns 1 and 2, and select the max from column 3. In case, since rows 1 and 2 both have "A" in column 1, those are a duplicate. Since rows 3 and 4 both have "B" in column 1, those are a duplicate. Since rows 1 and 3 both have "X" in column 2, those are a duplicate. Finally, since rows 2 and 4 both have "Y" in column 2, those are a duplicate. Hence the first four rows would be considered a duplicate, and the result should return row 4, since that contains the maximum in column 3. I cannot figure out how to deduplicate across multiple columns. Any advice would be appreciated.

Question 2

Specify your MySQL version.

Question 3

Can you provide more sample data and expected output?

Question 4

WITH RECURSIVE cte AS 
( SELECT c1, c2, c3, 1 level
 FROM test
UNION ALL
 SELECT cte.c1, t2.c2, GREATEST(cte.c3, t2.c3), level+1
 FROM cte
 JOIN test t1 ON cte.c2 = t1.c2
 JOIN test t2 ON t1.c1 = t2.c1
 WHERE t2.c3 != cte.c3
 AND level < ( SELECT MAX(c3)
 FROM test )
UNION ALL
 SELECT t2.c1, cte.c2, GREATEST(cte.c3, t2.c3), level+1
 FROM cte
 JOIN test t1 ON cte.c1 = t1.c1
 JOIN test t2 ON t1.c2 = t2.c2
 WHERE t2.c3 != cte.c3
 AND level < ( SELECT MAX(c3)
 FROM test )
)
SELECT cte.c1, cte.c2, MAX(cte.c3) c3
FROM cte
JOIN test ON (cte.c1, cte.c2) = (test.c1, test.c2)
GROUP BY c1, c2
HAVING c3 = MAX(test.c3);

fiddle

Question 5

it doesn't work if you will add 5th line: C | X | 5

Question 6

@Nikita Test new variant. Be aware - it is extremely expensive! Test it on not more than 8-10 records, or search a way to decrease max nested level (from extensive level < ( SELECT MAX(c3) FROM test )). PS. I'd recommend to create SP instead of a query...

Question 7

Thank you @Akina.

Akina Akina 20.8k2 gold badges20 silver badges22 bronze badges · Accepted Answer · 2020-02-12 18:07:40Z

WITH RECURSIVE cte AS 
( SELECT c1, c2, c3, 1 level
 FROM test
UNION ALL
 SELECT cte.c1, t2.c2, GREATEST(cte.c3, t2.c3), level+1
 FROM cte
 JOIN test t1 ON cte.c2 = t1.c2
 JOIN test t2 ON t1.c1 = t2.c1
 WHERE t2.c3 != cte.c3
 AND level < ( SELECT MAX(c3)
 FROM test )
UNION ALL
 SELECT t2.c1, cte.c2, GREATEST(cte.c3, t2.c3), level+1
 FROM cte
 JOIN test t1 ON cte.c1 = t1.c1
 JOIN test t2 ON t1.c2 = t2.c2
 WHERE t2.c3 != cte.c3
 AND level < ( SELECT MAX(c3)
 FROM test )
)
SELECT cte.c1, cte.c2, MAX(cte.c3) c3
FROM cte
JOIN test ON (cte.c1, cte.c2) = (test.c1, test.c2)
GROUP BY c1, c2
HAVING c3 = MAX(test.c3);

fiddle

@Nikita Test new variant. Be aware - it is extremely expensive! Test it on not more than 8-10 records, or search a way to decrease max nested level (from extensive level < ( SELECT MAX(c3) FROM test )). PS. I'd recommend to create SP instead of a query...

Stack Exchange Network

SQLite - Remove duplicates across multiple columns

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQLite - Remove duplicates across multiple columns

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions