My query is:
SELECT Acol1, Acol2, Bcol1, Bcol2, Ccol1, Ccol2
FROM tableA LEFT JOIN
(tableB FULL JOIN tableC ON (Bcol1 = Ccol1))
ON (Acol1 = Bcol1)
EXPLAIN ANALYZE
gives me:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=99.65..180.45 rows=1770 width=24) (actual time=0.043..0.103 rows=3 loops=1)
Hash Cond: (tableb.bcol1 = tablea.acol1)
-> Hash Left Join (cost=49.83..104.08 rows=1770 width=16) (actual time=0.011..0.062 rows=3 loops=1)
Hash Cond: (tableb.bcol1 = tablec.ccol1)
-> Seq Scan on tableb (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
-> Hash (cost=27.70..27.70 rows=1770 width=8) (actual time=0.004..0.004 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on tablec (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
-> Hash (cost=27.70..27.70 rows=1770 width=8) (actual time=0.014..0.014 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on tablea (cost=0.00..27.70 rows=1770 width=8) (actual time=0.009..0.011 rows=3 loops=1)
Total runtime: 0.151 ms
Postgres changed the full outerjoin between tableB
and tableC
to a right outerjoin, since later the left outerjoin with tableA
will eliminate null values anyway. It is equivalent to the original query.
However, I am hacking Postgres to implement my join enumeration related algorithm and to do experiments. I do not want Postgres to change the full outerjoin to a left outerjoin. Is there a way to do so?
2 Answers 2
You can introduce optimization barriers for your purpose.
Preface
Plain EXPLAIN
(without ANALYZE
) is enough for the purpose of this question. Parentheses around the ON
expression are just noise. Adding table aliases to be unambiguous.
We do see "Full Join" for the full join by itself:
SELECT * FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1;
We could rewrite the full join with a subquery:
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
SELECT * -- sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
) d ON a.Acol1 = d.Bcol1;
You have to sort out conflicting names with aliases in the subquery - but then again, you would need to do this in any case in the outer SELECT
for multiple columns of the same name in the underlying tables.
The query is still optimized as a whole since subqueries do not impose optimization barriers. You'll still see a "Left Join" or "Right Join". However, we can expand on this form to arrive at a solution:
Solution 1. OFFSET 0
hack (undocumented)
EXPLAIN
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
SELECT * -- you'll have to sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
OFFSET 0 -- undocumented hack
) d ON a.Acol1 = d.Bcol1;
You'll see a "Full Join".
Why? As soon as a subquery uses an OFFSET
clause, the query planner / optimiser plans the subquery separately. OFFSET 0
is logical noise, but Postgres still considers the clause which makes this a query hint to materialize the subquery, effectively. (Even though Postgres does not otherwise support query hints.) It's a much debated issue. Related:
- How can I prevent Postgres from inlining a subquery?
- http://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/
Solution 2. Use a CTE (documented)
EXPLAIN
WITH cte AS MATERIALIZED ( -- requires "MATERIALIZED" in Postgres 12 or later!
SELECT * -- you'll have to sort out conflicting names with aliases
FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
)
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN cte d ON a.Acol1 = d.Bcol1;
You'll see a "Full Join" as well.
The manual for Postgres 11 (before AS MATERIALIZED
was added):
A useful property of
WITH
queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or siblingWITH
queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. TheWITH
query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)
Since Postgres 12, the manual adds:
However, if a
WITH
query is non-recursive and side-effect-free (that is, it is aSELECT
containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references theWITH
query just once, but not if it references theWITH
query more than once. You can override that decision by specifyingMATERIALIZED
to force separate calculation of theWITH
query, or by specifyingNOT MATERIALIZED
to force it to be merged into the parent query. The latter choice risks duplicate computation of theWITH
query, but it can still give a net savings if each usage of theWITH
query needs only a small part of theWITH
query's full output.
Bold emphasis mine.
Try fixing the code
SELECT Acol1, Acol2, K.D.Bcol1, K.D.Bcol2, K.D.Ccol1, K.D.Ccol2
FROM tableA LEFT JOIN
( SELECT D.Bcol1, D.Bcol2, D.Ccol1, D.Ccol2
FROM (tableB FULL JOIN
tableC ON
Bcol1 = Ccol1 ) D ) K
ON Acol1 = K.D.Bcol1
I'm using K.D. to indicate the data flow.
-
Hi, thanks for the answer, but I get the following error: ERROR: missing FROM-clause entry for table "d" LINE 8: ON Acol1 = K.D.Bcol1;Doris W.– Doris W.2016年03月06日 03:01:31 +00:00Commented Mar 6, 2016 at 3:01
Explore related questions
See similar questions with these tags.