I have the following setup:
- First, I create a temp table
q10c_debug_sql
to avoid clutter
create table q10c_debug_sql as
SELECT
movie_id,
company_id,
company_type_id
FROM
"postgres"."imdb_int"."movie_companies"
WHERE
(company_id) IN (
SELECT
company_id
FROM
"postgres"."imdb"."q10c_company_name"
)
AND (company_type_id) NOT IN (
SELECT
company_type_id
FROM
"postgres"."imdb_int"."company_type"
)
The resulting table is an empty table
postgres=# select * from q10c_debug_sql;
movie_id | company_id | company_type_id
----------+------------+-----------------
(0 rows)
- Now, I issue the following two queries
postgres=# select count(*) from (select * from imdb_int.movie_companies except select * from q10c_debug_sql) as foo;
count
---------
2549109
(1 row)
postgres=# select count(*) from (select * from imdb_int.movie_companies as a left join q10c_debug_sql as b on a.movie_id = b.movie_id and a.company_id = b.company_id and a.company_type_id = b.company_type_id) as foo;
count
---------
2609129
(1 row)
As one can see they return different count. On paper, these two queries are equivalent and should return 2609129
, the size of movie_companies
table:
postgres=# select count(*) from imdb_int.movie_companies;
count
---------
2609129
(1 row)
I don't know why this happens? I want to use EXCEPT for clarity but the query gives unexpected result. Any pointers are appreciated.
My psql versions
psql (15.3 (Ubuntu 15.3-1.pgdg20.04+1), server 13.11 (Ubuntu 13.11-1.pgdg20.04+1))
1 Answer 1
It turns out EXCEPT
returns only distinct values, i.e., EXCEPT returns any distinct values from the left query that are not also found on the right query. Thus, semantically, EXCEPT
is not the same as left join: the former is set semantics but the latter is bag semantics.
Thanks to this page for pointer.
postgres=# select count(distinct(movie_id, company_id, company_type_id)) from imdb_int.movie_companies;
count
---------
2549109
(1 row)
-
1EXCEPT is synonym for EXCEPT DISTINCT. There Is also EXCEPT ALL. In the same way the other set operators have two variants, UNION, UNION ALL, INTERSECT and INTERSECT ALLLennart - Slava Ukraini– Lennart - Slava Ukraini2023年05月27日 06:06:20 +00:00Commented May 27, 2023 at 6:06
-
1@Lennart-SlavaUkraini Thanks for the informative comment.zack– zack2023年05月27日 22:14:46 +00:00Commented May 27, 2023 at 22:14
-
FWIW, there are some surprising effects using <SETOP> DISTINCT on bags. One example is that A UNION A may have less cardinality than A. Then throw null into the mix and we are in for a treat;-)Lennart - Slava Ukraini– Lennart - Slava Ukraini2023年05月28日 05:11:19 +00:00Commented May 28, 2023 at 5:11
-
About
EXCEPT ALL
: dba.stackexchange.com/a/120680/3684, stackoverflow.com/a/19364694/939860Erwin Brandstetter– Erwin Brandstetter2023年05月30日 04:26:23 +00:00Commented May 30, 2023 at 4:26