How to prevent PostgreSQL from rewriting OUTER JOIN queries?

Question 1

My query is:

SELECT Acol1, Acol2, Bcol1, Bcol2, Ccol1, Ccol2
FROM tableA LEFT JOIN
 (tableB FULL JOIN tableC ON (Bcol1 = Ccol1))
 ON (Acol1 = Bcol1)

EXPLAIN ANALYZE gives me:

 QUERY PLAN 
----------------------------------------------------------------------------------------------------------------------
 Hash Right Join (cost=99.65..180.45 rows=1770 width=24) (actual time=0.043..0.103 rows=3 loops=1)
 Hash Cond: (tableb.bcol1 = tablea.acol1)
 -> Hash Left Join (cost=49.83..104.08 rows=1770 width=16) (actual time=0.011..0.062 rows=3 loops=1)
 Hash Cond: (tableb.bcol1 = tablec.ccol1)
 -> Seq Scan on tableb (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
 -> Hash (cost=27.70..27.70 rows=1770 width=8) (actual time=0.004..0.004 rows=3 loops=1)
 Buckets: 1024 Batches: 1 Memory Usage: 1kB
 -> Seq Scan on tablec (cost=0.00..27.70 rows=1770 width=8) (actual time=0.001..0.002 rows=3 loops=1)
 -> Hash (cost=27.70..27.70 rows=1770 width=8) (actual time=0.014..0.014 rows=3 loops=1)
 Buckets: 1024 Batches: 1 Memory Usage: 1kB
 -> Seq Scan on tablea (cost=0.00..27.70 rows=1770 width=8) (actual time=0.009..0.011 rows=3 loops=1)
 Total runtime: 0.151 ms

Postgres changed the full outerjoin between tableB and tableC to a right outerjoin, since later the left outerjoin with tableA will eliminate null values anyway. It is equivalent to the original query.

However, I am hacking Postgres to implement my join enumeration related algorithm and to do experiments. I do not want Postgres to change the full outerjoin to a left outerjoin. Is there a way to do so?

Question 2

You can introduce optimization barriers for your purpose.

Preface

Plain EXPLAIN (without ANALYZE) is enough for the purpose of this question. Parentheses around the ON expression are just noise. Adding table aliases to be unambiguous.

We do see "Full Join" for the full join by itself:

SELECT * FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1;

We could rewrite the full join with a subquery:

SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
 SELECT * -- sort out conflicting names with aliases
 FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
 ) d ON a.Acol1 = d.Bcol1;

You have to sort out conflicting names with aliases in the subquery - but then again, you would need to do this in any case in the outer SELECT for multiple columns of the same name in the underlying tables.

The query is still optimized as a whole since subqueries do not impose optimization barriers. You'll still see a "Left Join" or "Right Join". However, we can expand on this form to arrive at a solution:

Solution 1. `OFFSET 0` hack (undocumented)

EXPLAIN
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
 SELECT * -- you'll have to sort out conflicting names with aliases
 FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
 OFFSET 0 -- undocumented hack
 ) d ON a.Acol1 = d.Bcol1;

You'll see a "Full Join".

Why? As soon as a subquery uses an OFFSET clause, the query planner / optimiser plans the subquery separately. OFFSET 0 is logical noise, but Postgres still considers the clause which makes this a query hint to materialize the subquery, effectively. (Even though Postgres does not otherwise support query hints.) It's a much debated issue. Related:

Solution 2. Use a CTE (documented)

EXPLAIN
WITH cte AS MATERIALIZED ( -- requires "MATERIALIZED" in Postgres 12 or later!
 SELECT * -- you'll have to sort out conflicting names with aliases
 FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
 ) 
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN cte d ON a.Acol1 = d.Bcol1;

You'll see a "Full Join" as well.

The manual for Postgres 11 (before AS MATERIALIZED was added):

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

Since Postgres 12, the manual adds:

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once. You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged into the parent query. The latter choice risks duplicate computation of the WITH query, but it can still give a net savings if each usage of the WITH query needs only a small part of the WITH query's full output.

Bold emphasis mine.

db<>fiddle here
_{Old sqlfiddle}

Question 3

Try fixing the code

SELECT Acol1, Acol2, K.D.Bcol1, K.D.Bcol2, K.D.Ccol1, K.D.Ccol2 
FROM tableA LEFT JOIN 
( SELECT D.Bcol1, D.Bcol2, D.Ccol1, D.Ccol2 
 FROM (tableB FULL JOIN 
 tableC ON 
Bcol1 = Ccol1 ) D ) K 
ON Acol1 = K.D.Bcol1

I'm using K.D. to indicate the data flow.

Question 4

Hi, thanks for the answer, but I get the following error: ERROR: missing FROM-clause entry for table "d" LINE 8: ON Acol1 = K.D.Bcol1;

score 5 · Answer 1 · 2016-03-06 19:49:07Z

You can introduce optimization barriers for your purpose.

Preface

Plain EXPLAIN (without ANALYZE) is enough for the purpose of this question. Parentheses around the ON expression are just noise. Adding table aliases to be unambiguous.

We do see "Full Join" for the full join by itself:

SELECT * FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1;

We could rewrite the full join with a subquery:

SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
 SELECT * -- sort out conflicting names with aliases
 FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
 ) d ON a.Acol1 = d.Bcol1;

You have to sort out conflicting names with aliases in the subquery - but then again, you would need to do this in any case in the outer SELECT for multiple columns of the same name in the underlying tables.

The query is still optimized as a whole since subqueries do not impose optimization barriers. You'll still see a "Left Join" or "Right Join". However, we can expand on this form to arrive at a solution:

Solution 1. `OFFSET 0` hack (undocumented)

EXPLAIN
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN (
 SELECT * -- you'll have to sort out conflicting names with aliases
 FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
 OFFSET 0 -- undocumented hack
 ) d ON a.Acol1 = d.Bcol1;

You'll see a "Full Join".

Why? As soon as a subquery uses an OFFSET clause, the query planner / optimiser plans the subquery separately. OFFSET 0 is logical noise, but Postgres still considers the clause which makes this a query hint to materialize the subquery, effectively. (Even though Postgres does not otherwise support query hints.) It's a much debated issue. Related:

Solution 2. Use a CTE (documented)

EXPLAIN
WITH cte AS MATERIALIZED ( -- requires "MATERIALIZED" in Postgres 12 or later!
 SELECT * -- you'll have to sort out conflicting names with aliases
 FROM tableB b FULL JOIN tableC c ON b.Bcol1 = c.Ccol1
 ) 
SELECT a.Acol1, a.Acol2, d.Bcol1, d.Bcol2, d.Ccol1, d.Ccol2
FROM tableA a
LEFT JOIN cte d ON a.Acol1 = d.Bcol1;

You'll see a "Full Join" as well.

The manual for Postgres 11 (before AS MATERIALIZED was added):

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

Since Postgres 12, the manual adds:

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once. You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged into the parent query. The latter choice risks duplicate computation of the WITH query, but it can still give a net savings if each usage of the WITH query needs only a small part of the WITH query's full output.

Bold emphasis mine.

db<>fiddle here
_{Old sqlfiddle}

user88899 user88899 212 bronze badges · Answer 2 · 2016-03-05 17:53:13Z

0

Try fixing the code

SELECT Acol1, Acol2, K.D.Bcol1, K.D.Bcol2, K.D.Ccol1, K.D.Ccol2 
FROM tableA LEFT JOIN 
( SELECT D.Bcol1, D.Bcol2, D.Ccol1, D.Ccol2 
 FROM (tableB FULL JOIN 
 tableC ON 
Bcol1 = Ccol1 ) D ) K 
ON Acol1 = K.D.Bcol1

I'm using K.D. to indicate the data flow.

Share

Improve this answer

edited Mar 5, 2016 at 17:56

Paul White's user avatar

Paul White ♦

95.4k30 gold badges440 silver badges689 bronze badges

answered Mar 5, 2016 at 17:53

user88899's user avatar

user88899 user88899

212 bronze badges

1

Hi, thanks for the answer, but I get the following error: ERROR: missing FROM-clause entry for table "d" LINE 8: ON Acol1 = K.D.Bcol1;

Doris W.
– Doris W.

2016年03月06日 03:01:31 +00:00
Commented Mar 6, 2016 at 3:01

Add a comment |

Stack Exchange Network

How to prevent PostgreSQL from rewriting OUTER JOIN queries?

2 Answers 2

Preface

Solution 1. `OFFSET 0` hack (undocumented)

Solution 2. Use a CTE (documented)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to prevent PostgreSQL from rewriting OUTER JOIN queries?

2 Answers 2

Preface

Solution 1. OFFSET 0 hack (undocumented)

Solution 2. Use a CTE (documented)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Solution 1. `OFFSET 0` hack (undocumented)