6

I have a basic question on how JOIN works on multiple tables. I want to count occurrences of Foreign Key in link1 & link2

CREATE TABLE main (
 id SERIAL PRIMARY KEY,
 name text NOT NULL
);
CREATE TABLE link1 (
 id SERIAL PRIMARY KEY,
 main_id integer NOT NULL,
 CONSTRAINT main_id_fk FOREIGN KEY (main_id) REFERENCES main (id)
);
-- link2 is similar to link1

SQL Fiddle

Why does the query below give a product of counts (rather than sum) when the count is non-zero in both columns.

SELECT main.id, COUNT(link1.main_id) + COUNT(link2.main_id)
FROM main
LEFT JOIN link1 ON main.id=link1.main_id
LEFT JOIN link2 ON main.id=link2.main_id
GROUP BY main.id
asked Dec 30, 2014 at 13:54

2 Answers 2

8

What you see is a "proxy cross join". Aggregate first, then join:

SELECT m.id, COALESCE(l1.ct, 0) + COALESCE(l2.ct, 0) AS total_ct
FROM main m
LEFT JOIN (
 SELECT main_id, count(*) AS ct
 FROM link1
 GROUP BY main_id
 ) l1 ON l1.main_id = m.id
LEFT JOIN (
 SELECT main_id, count(*) AS ct
 FROM link2
 GROUP BY main_id
 ) l2 ON l2.main_id = m.id
ORDER BY m.id;

Old sqlfiddle

Do not multiply rows with multiple unqualified joins and count(DISTINCT ...) later to fix that mistake. It happens to work in this case since counting distinct link1.id / link2.id coincides with the desired result, but it's needlessly expensive and error prone.

Detailed explanation and a couple of syntax variants in these related answers on SO:

answered Jan 3, 2015 at 5:34
1
  • 4
    This has a better performance & works well for my usecase. It should be the accepted answer Commented Mar 8, 2018 at 10:03
3

I'll attempt to answer it myself. Consider a LEFT JOIN between main & link1. The output would be

main.id link1.main_id
 1 1
 1 1
 2 2
 3 NULL 
 4 NULL 

Now do a LEFT JOIN of the above table with link2, output would be:

main.id link1.main_id link2.main_id
 1 1 NULL 
 1 1 NULL 
 2 2 2
 2 2 2 -- Error : double counting for link1
 3 NULL 3
 4 NULL 

Now count the occurrences of main_id & sum them (grouped by main.id)

main.id Count
 1 2
 2 2 + 2 
 3 1
 4 0

So two successive LEFT JOIN are happening sequentially rather than in parallel. The correct approach to get the count would be do conduct 2 queries separately and then add the results

Update Another way according to @a1ex07 is

SELECT main.id, COUNT(DISTINCT link1.id) + COUNT(DISTINCT link2.id)
FROM main
LEFT JOIN link1 ON main.id=link1.main_id
LEFT JOIN link2 ON main.id=link2.main_id
GROUP BY main.id
answered Dec 30, 2014 at 14:11
11
  • You seem to confuse 0 with NULL . The first 2 outputs won't have any zeros. link1.main_id and link2.main_id might be either equals to main.id or NULL. And why not just COUNT(DISTINCT link1.id) + COUNT(DISTINCT link2.id) ? Commented Dec 30, 2014 at 15:00
  • @a1ex07 Thanks for pointing out about NULL. I'll update the answer. Regarding DISTINCT, I want the count to reflect multiple entries in any table. Commented Dec 30, 2014 at 15:06
  • 1
    I believe distinct will do that (note, it's counting distinct link1.id and link2.id, not link1.main_id/ link2.main_id ). Thus, in your example for main.id = 2 there will be 1 distinct link1.id and 2 distinct link2.id... Commented Dec 30, 2014 at 15:24
  • Right, updated the answer. Cool technique. Commented Dec 30, 2014 at 17:36
  • 2
    @a1ex07: While DISTINCT does the trick here, it's a bit like putting lipstick on a pig ... Commented Jan 3, 2015 at 5:35

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.