I have a basic question on how JOIN
works on multiple tables. I want to count occurrences of Foreign Key in link1
& link2
CREATE TABLE main (
id SERIAL PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE link1 (
id SERIAL PRIMARY KEY,
main_id integer NOT NULL,
CONSTRAINT main_id_fk FOREIGN KEY (main_id) REFERENCES main (id)
);
-- link2 is similar to link1
Why does the query below give a product of counts (rather than sum) when the count is non-zero in both columns.
SELECT main.id, COUNT(link1.main_id) + COUNT(link2.main_id)
FROM main
LEFT JOIN link1 ON main.id=link1.main_id
LEFT JOIN link2 ON main.id=link2.main_id
GROUP BY main.id
2 Answers 2
What you see is a "proxy cross join". Aggregate first, then join:
SELECT m.id, COALESCE(l1.ct, 0) + COALESCE(l2.ct, 0) AS total_ct
FROM main m
LEFT JOIN (
SELECT main_id, count(*) AS ct
FROM link1
GROUP BY main_id
) l1 ON l1.main_id = m.id
LEFT JOIN (
SELECT main_id, count(*) AS ct
FROM link2
GROUP BY main_id
) l2 ON l2.main_id = m.id
ORDER BY m.id;
Old sqlfiddle
Do not multiply rows with multiple unqualified joins and count(DISTINCT ...)
later to fix that mistake. It happens to work in this case since counting distinct link1.id
/ link2.id
coincides with the desired result, but it's needlessly expensive and error prone.
Detailed explanation and a couple of syntax variants in these related answers on SO:
-
4This has a better performance & works well for my usecase. It should be the accepted answerPhilip E– Philip E2018年03月08日 10:03:29 +00:00Commented Mar 8, 2018 at 10:03
I'll attempt to answer it myself. Consider a LEFT JOIN
between main
& link1
. The output would be
main.id link1.main_id
1 1
1 1
2 2
3 NULL
4 NULL
Now do a LEFT JOIN
of the above table with link2
, output would be:
main.id link1.main_id link2.main_id
1 1 NULL
1 1 NULL
2 2 2
2 2 2 -- Error : double counting for link1
3 NULL 3
4 NULL
Now count the occurrences of main_id
& sum them (grouped by main.id
)
main.id Count
1 2
2 2 + 2
3 1
4 0
So two successive LEFT JOIN
are happening sequentially rather than in parallel. The correct approach to get the count would be do conduct 2 queries separately and then add the results
Update Another way according to @a1ex07 is
SELECT main.id, COUNT(DISTINCT link1.id) + COUNT(DISTINCT link2.id)
FROM main
LEFT JOIN link1 ON main.id=link1.main_id
LEFT JOIN link2 ON main.id=link2.main_id
GROUP BY main.id
-
You seem to confuse 0 with
NULL
. The first 2 outputs won't have any zeros.link1.main_id
andlink2.main_id
might be either equals tomain.id
orNULL
. And why not justCOUNT(DISTINCT link1.id) + COUNT(DISTINCT link2.id)
?a1ex07– a1ex072014年12月30日 15:00:11 +00:00Commented Dec 30, 2014 at 15:00 -
@a1ex07 Thanks for pointing out about
NULL
. I'll update the answer. RegardingDISTINCT
, I want the count to reflect multiple entries in any table.user4150760– user41507602014年12月30日 15:06:48 +00:00Commented Dec 30, 2014 at 15:06 -
1I believe
distinct
will do that (note, it's counting distinct link1.id and link2.id, not link1.main_id/ link2.main_id ). Thus, in your example for main.id = 2 there will be 1 distinct link1.id and 2 distinct link2.id...a1ex07– a1ex072014年12月30日 15:24:18 +00:00Commented Dec 30, 2014 at 15:24 -
Right, updated the answer. Cool technique.user4150760– user41507602014年12月30日 17:36:01 +00:00Commented Dec 30, 2014 at 17:36
-
2@a1ex07: While
DISTINCT
does the trick here, it's a bit like putting lipstick on a pig ...Erwin Brandstetter– Erwin Brandstetter2015年01月03日 05:35:27 +00:00Commented Jan 3, 2015 at 5:35