1

How does ON predicate of Postgres LATERAL JOIN work?

Let me clarify question a bit. I've read the official documentation and a bunch of articles about this kind of JOIN. As far as I understood it is a foreach loop with a correlated subquery inside - it iterates over all records of a table A, allowing to reference columns of a "current" row in a correlated subquery B and join a result set of the B to that "current" row of A - if the B query returns 1 row there is only one pair, and if the B query return N rows there are N pairs with duplicated "current" row of the A. The same behavior like in usual JOINs.

But why is there a need in ON predicate? For me, in usual JOINs we use ON because we have a cartesian product of 2 tables to be filtered out, and it is not the case of LATERAL JOIN, which produces resulting pairs directly. In other words, in my developer experience I've only seen CROSS JOIN LATERAL and LEFT JOIN LATERAL () ON TRUE (the latter looks quite clumsy, though) but one day a colleague showed me

SELECT
r.acceptance_status, count(*) as count
FROM route r
LEFT JOIN LATERAL (
 SELECT rts.route_id, array_agg(rts.shipment_id) shipment_ids
 FROM route_to_shipment rts
 where rts.route_id = r.route_id
 GROUP BY rts.route_id
) rts using (route_id)

and this exploded my mind. Why using (route_id)? We already have where rts.route_id = r.route_id inside the subquery!!! Maybe I understand the mechanics of LATERAL joins wrong?

asked Dec 10, 2023 at 5:30
1
  • You example is odd in that it generates an array in the lateral subquery, but does not use it. Also, count(*) is suspicious. What do you want to count exactly? Commented Dec 11, 2023 at 12:28

2 Answers 2

1

Short answer: LEFT JOIN requires a join condition - as opposed to CROSS JOIN. Basics in the manual.
See also:

But the join condition can still make sense to filter which rows to attach on the right side after having computed a set in the lateral subqery. Like:

SELECT r.acceptance_status
 , count(*) AS count_routes
 , count(rts.shipment_ids) AS count_routes_with_more_than_one_shipment
FROM route r
LEFT JOIN LATERAL (
 SELECT array_agg(rts.shipment_id) shipment_ids
 , count(*) AS shipments
 FROM route_to_shipment rts
 WHERE rts.route_id = r.route_id
 -- GROUP BY rts.route_id -- just noise
 ) rts ON shipments > 1; -- !!!

This returns all rows from table route, but only attaches shipment_ids where more that one related row in table route_to_shipment is found.

There is no need to add rts.route_id to the SELECT list of the subquery.
GROUP BY rts.route_id is just noise after WHERE rts.route_id = r.route_id.
And I am still generating the array shipment_ids in vain, like your original.

Also demonstrating different results for count(*) vs. count(shipment_ids).

The join condition cannot move to the WHERE clause, where it would have a different effect. You might add a HAVING clause to the suquery, though:

SELECT r.acceptance_status
 , count(*) AS ct_routes
 , count(rts.shipment_ids) AS ct_routes_with_more_than_1_shipment
FROM route r
LEFT JOIN LATERAL (
 SELECT array_agg(rts.shipment_id) shipment_ids
 FROM route_to_shipment rts
 WHERE rts.route_id = r.route_id
 HAVING count(*) > 1 -- !!!
 ) rts ON true
GROUP BY r.acceptance_status;

But there are lateral subqueries without aggregation (so no HAVING clause possible). For your case:

SELECT r.acceptance_status
 , count(*) AS ct_routes
 , count(rts.shipment_ids) AS ct_routes_with_more_than_1_shipment
FROM route r
LEFT JOIN LATERAL (
 SELECT ARRAY (
 SELECT rts.shipment_id
 FROM route_to_shipment rts
 WHERE rts.route_id = r.route_id
 ) AS shipment_ids
 ) rts ON cardinality(shipment_ids) > 1 -- !!!
GROUP BY r.acceptance_status;

fiddle

Only makes sense if we are going to use that array, of course. Then, an array constructor is probably the optimum for your query anyway. See:

answered Dec 11, 2023 at 12:44
1
CREATE TABLE ta (aid INT, a INT);
CREATE TABLE tb (aid INT, b INT);
INSERT INTO ta VALUES (1,10),(2,20);
INSERT INTO tb VALUES (1,100),(1,200);
SELECT * FROM ta LEFT JOIN LATERAL (SELECT * FROM tb WHERE tb.aid=ta.aid) ON true;
 aid | a | aid | b
-----+----+------+------
 1 | 10 | 1 | 100
 1 | 10 | 1 | 200
 2 | 20 | Null | Null
SELECT * FROM ta LEFT JOIN LATERAL (SELECT * FROM tb) USING (aid);
 aid | a | b
-----+----+------
 1 | 10 | 100
 1 | 10 | 200
 2 | 20 | Null

The USING (columns) clause does not duplicate the specified columns in the result set, whereas the ON (ta.column=tb.column) does duplicate the columns. Here the duplicated column is "aid". In the case of a standard JOIN on equality, the columns will be equal, so the duplication is useless, which means USING is preferable. It is also more readable. In the case of an outer JOIN (right,left,full) you may want the two columns to be duplicated in order to know if one of them is NULL.

If you want a CROSS JOIN (no ON condition):

SELECT * FROM ta CROSS JOIN LATERAL (SELECT * FROM tb WHERE tb.aid=ta.aid);

You can also use a JOIN and put move some of the conditions that would be in the WHERE of the LATERAL table into the ON() clause, the result is the same:

SELECT * FROM ta JOIN LATERAL (SELECT * FROM tb WHERE ...) ON (tb.aid=ta.aid);

But there is no CROSS LEFT JOIN so if you want a LEFT JOIN LATERAL you have to explicitly state LEFT JOIN, and that requires the ON clause.

SELECT * FROM ta JOIN LATERAL (SELECT * FROM tb WHERE tb.aid=ta.aid) ON true WHERE ta.aid<10;

Indeed in the case of a LATERAL join, the ON clause can be superfluous.

answered Dec 10, 2023 at 10:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.