Given two results consisting of single key|value pair (coming from CTEs), I want to join and group them by key, aggregate their values and return two different things:
a) those keys where aggregated list of values in first resultset exactly matches aggregated list of values in second resultset
b) those keys where aggregated list of values in first resultset matches second resultset independent of order
I know of string_agg()
, but it seems I can use it in the SELECT
list, and it's inefficient anyway. Is there something more efficient?
Set 1
|key | value |
|-----|-------|
| 1 | 1 |
| 1 | 2 |
| 3 | 4 |
| 2 | 5 |
| 2 | 7 |
| 1 | 3 |
Set 2
|key | value |
|-----|-------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 7 |
| 2 | 5 |
| 4 | 6 |
Desired result:
a) key 1
(1,2,3 = 1,2,3)
b) key 1 and key 2
(5,7 = 7,5)
2 Answers 2
For b) you can use intersection to find out what tuples that are in both resultsets, and then aggregate on top of that:
with rs1 (key, value) as ( values (1,1),(1,2),(1,3),(2,5),(2,7),(3,4))
, rs2 (key, value) as ( values (1,1),(1,2),(1,3),(2,5),(2,7),(4,6))
select key, array_agg(value)
from (
select key, value from rs1
intersect
select key, value from rs2
) t
group by key;
1 {3,2,1}
2 {7,5}
If you are dealing with bags instead of sets, you can use intersect all
to preserve duplicates:
with rs1 (key, value) as ( values (1,1),(1,2),(1,1),(4,6))
, rs2 (key, value) as ( values (1,1),(1,2),(1,1),(5,7))
select key, array_agg(value)
from (
select key, value from rs1
intersect all
select key, value from rs2
) t
group by key;
1 {2,1,1}
a) does not really make sense, since there is no order to take into concideration. We can create one by adding an ordering number
n for each key
with rs1 (key, n, value) as ( values (1,1,1),(1,2,2),(1,3,3),(2,1,5),(2,2,7),(3,1,4))
, rs2 (key, n, value) as ( values (1,1,1),(1,3,2),(1,2,3),(2,2,5),(2,1,7),(4,1,6))
select key, array_agg(value)
from (
select key, n, value from rs1
intersect
select key, n, value from rs2
) t
group by key;
1 {1}
Another possible interpreation is that n is a total ordering (i.e. not within each key). The same solution can be used to deal with that.
You need to define the order of rows somehow. (A given set has no natural order.) I added an ordering column ord
. You might achieve that in your CTEs with row_number()
, or you actually have additional columns establishing order.
Also assuming no duplicates on (key, value)
and no null values. Else you have to define how to deal with those.
WITH t1 (ord, key, value) AS (
VALUES (1,1,1),(2,1,2),(3,3,4),(4,2,5),(5,2,7),(6,1,3)
)
, t2 (ord, key, value) AS (
VALUES (1,1,1),(2,1,2),(3,1,3),(4,2,7),(5,2,5),(6,4,6)
)
SELECT key, a1.sort_arr1, a2.sort_arr2
, a1.ord_arr1 = a2.ord_arr2 AS match_arr -- match in given order
, a1.sort_arr1 = a2.sort_arr2 AS match_set -- match after ordering
FROM (
SELECT key
, array_agg(value ORDER BY ord) AS ord_arr1
, array_agg(value ORDER BY value) AS sort_arr1
FROM t1 -- ordered input!
GROUP BY 1
) a1
JOIN (
SELECT key
, array_agg(value ORDER BY ord) AS ord_arr2
, array_agg(value ORDER BY value) AS sort_arr2
FROM t2 -- ordered input!
GROUP BY 1
) a2 USING (key)
ORDER BY match_arr DESC;
key | ord_arr1 | ord_arr2 | match_arr | match_set |
---|---|---|---|---|
1 | {1,2,3} | {1,2,3} | t | t |
2 | {5,7} | {7,5} | f | t |