I have the following table:
C1 | C2 | C3
-------------
A | X | 1
A | Y | 2
B | X | 3
B | Y | 4
I want to deduplicate across columns 1 and 2, and select the max from column 3. In case, since rows 1 and 2 both have "A" in column 1, those are a duplicate. Since rows 3 and 4 both have "B" in column 1, those are a duplicate. Since rows 1 and 3 both have "X" in column 2, those are a duplicate. Finally, since rows 2 and 4 both have "Y" in column 2, those are a duplicate. Hence the first four rows would be considered a duplicate, and the result should return row 4, since that contains the maximum in column 3. I cannot figure out how to deduplicate across multiple columns. Any advice would be appreciated.
-
1Specify your MySQL version.Akina– Akina2020年02月12日 18:34:09 +00:00Commented Feb 12, 2020 at 18:34
-
Can you provide more sample data and expected output?Colin 't Hart– Colin 't Hart2020年02月12日 19:17:01 +00:00Commented Feb 12, 2020 at 19:17
1 Answer 1
WITH RECURSIVE cte AS
( SELECT c1, c2, c3, 1 level
FROM test
UNION ALL
SELECT cte.c1, t2.c2, GREATEST(cte.c3, t2.c3), level+1
FROM cte
JOIN test t1 ON cte.c2 = t1.c2
JOIN test t2 ON t1.c1 = t2.c1
WHERE t2.c3 != cte.c3
AND level < ( SELECT MAX(c3)
FROM test )
UNION ALL
SELECT t2.c1, cte.c2, GREATEST(cte.c3, t2.c3), level+1
FROM cte
JOIN test t1 ON cte.c1 = t1.c1
JOIN test t2 ON t1.c2 = t2.c2
WHERE t2.c3 != cte.c3
AND level < ( SELECT MAX(c3)
FROM test )
)
SELECT cte.c1, cte.c2, MAX(cte.c3) c3
FROM cte
JOIN test ON (cte.c1, cte.c2) = (test.c1, test.c2)
GROUP BY c1, c2
HAVING c3 = MAX(test.c3);
-
it doesn't work if you will add 5th line: C | X | 5NikitaSerbskiy– NikitaSerbskiy2020年02月12日 18:25:57 +00:00Commented Feb 12, 2020 at 18:25
-
@Nikita Test new variant. Be aware - it is extremely expensive! Test it on not more than 8-10 records, or search a way to decrease max nested level (from extensive
level < ( SELECT MAX(c3) FROM test )
). PS. I'd recommend to create SP instead of a query...Akina– Akina2020年02月13日 06:01:32 +00:00Commented Feb 13, 2020 at 6:01 -