What's the order of operations in PostgreSQL?

Question 1

I'm a DB student, and I executed the following query to learn a couple things at the same time (LEFT/RIGHT JOIN, UNION, WHERE + RegEx). What's troubling me is the order of execution. I have two tables, as such:

create table practicaleft(
 id smallint primary key,
 nombre varchar,
 cumple date
);
create table practicaright(
 id smallint primary key,
 apellido varchar,
 cumpleanios date
);

Then, I insert some random data:

INSERT INTO practicaleft VALUES
(1, 'John', CURRENT_DATE - 1),
(5, 'Alice', CURRENT_DATE - 5),
(3, 'Bob', CURRENT_DATE - 3),
(7, 'Eva', CURRENT_DATE - 7);
INSERT INTO practicaright VALUES
(5, 'Doe', CURRENT_DATE - 5),
(6, 'Smith', CURRENT_DATE - 6),
(3, 'Johnson', CURRENT_DATE - 3),
(4, 'Brown', CURRENT_DATE - 4);

Afterwards, I execute this query:

select id, nombre
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright 
where cumpleanios > current_date - 5;

The results? Here you go:

4 "Brown"
5 "Alice"
5 "Doe"
3 "Johnson"
3 "Johnson"
4 "Brown"

TL;DR: this query is divided in three parts, and results are merged with the operator UNION ALL.

Now comes the question. One might believe this is executed instruction by instruction, and so, the order should be:

5 "Alice"
5 "Doe"
3 "Johnson"
4 "Brown"
3 "Johnson"
4 "Brown"

But that isn't happening. The only way to fix that is if I add some random string as a field, like so:

select id, nombre, 'part1' as query_part
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido, 'part2' as query_part
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido, 'part3' as query_part
from practicaright 
where cumpleanios > current_date - 25;

What is happening? Did I skip over some truly important SQL mechanic?

Question 2

Order of operations is in no way related to the order of rows returned by a query.

Question 3

The first UNION does not have ALL.

Question 4

Question should be re-titled to ordering rows ... order of operations is different.

Question 5

Closely related: Are results from UNION ALL clauses always appended in order?

Question 6

Jasen's answer is correct - PostgreSQL is free to return the rows in any order it pleases unless you add an ORDER BY clause:

(SELECT ... UNION SELECT ... UNION ALL SELECT ...) ORDER BY ...;

But let me explain why PostgreSQL doesn't return the rows in the order you expect. The reason is that the first UNION is not UNION ALL. If you had used UNION ALL everywhere, PostgreSQL would execute the query like this:

EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union all
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright 
where cumpleanios > current_date - 5;
 QUERY PLAN 
══════════════════════════════════════════════════════════════════
 Append
 -> Seq Scan on practicaleft
 Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
 -> Seq Scan on practicaright pr
 Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
 -> Seq Scan on practicaright
 Filter: (cumpleanios > (CURRENT_DATE - 5))
(7 rows)

That is, PostgreSQL would execute the three queries and simply append the results, and you would end up with the ordering you expected.

But you used union the first time, and union eliminates duplicates. This is executed as follows:

EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union 
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright
where cumpleanios > current_date - 5;
 QUERY PLAN 
══════════════════════════════════════════════════════════════════════════════
 Append
 -> HashAggregate
 Group Key: practicaleft.id, practicaleft.nombre
 -> Append
 -> Seq Scan on practicaleft
 Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
 -> Seq Scan on practicaright pr
 Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
 -> Seq Scan on practicaright
 Filter: (cumpleanios > (CURRENT_DATE - 5))
(10 rows)

PostgreSQL uses a hash aggregate to remove duplicates from the first two branches. The result rows are returned in the order they happen to have in the hash table, which is pretty random (good hash functions behave like that).

Question 7

"The result rows are returned in the order they happen to have in the hash table" - oh, interesting. I would naively have expected them to still be returned in the same order as they come from the sequence scan, just be filtered out if they already exist in the hash set.

Question 8

btw, it's a really bad idea to depend on any ordering as output from a seq scan. Postgres heap storage is unordered, and even if you control how rows are inserted into a table, they might get shuffled around during compaction. If you need a specific order, use ORDER BY.

Question 9

@Laurenz Albe: about "That is, PostgreSQL would execute the three queries and simply append the results, and you would end up with the ordering you expected." And even that is not guaranteed. If for example the optimizer/planner chooses a parallel plan, you may get different order as well.

Question 10

@ypercubeTM Theoretically you are right, but I don't think you will ever get synchronized sequential scans or parallel plans with tables that contain five rows.

Question 11

haha no, certainly not. I meant it in general.

Question 12

SQL makes no guarantee of result ordering unless you have an order by clause in your query.

If you don't say "order by" your results will come in whatever order the query planner and database engine decide is most efficient (or sufficiently efficient).

Parallel table scans are a thing where several queries scan through the same table at the same time. but your example tables are probably too short for even that.

When I have a string of union alls that I want in order I add an ordering column with constant values to the query. 1 as sort 2 as sort 3 as sort order by sort

Question 13

And an ORDER BY on that column.

Question 14

I am sorry, I am not an expert in Postgres but the question was about order of operations but not the order of the rows in result set. There are union and union all between 3 datasets. Let's say we have dataset 1,2,3 union 3,4,5 and union all 5,6,7. If union is applied first the result should be 1,2,3,4,5,5,6,7. if union all is executed first then result should be 1,2,3,4,5,6,7 because implicit group by from union is applied on the last step. Please correct me if I am wrong.

Question 15

As per the documentation, the order is left-to-right. So while the engine is free to do whatever it thinks is optimal: logically, the result will be such that it reflects this order of operations:

(A UNION B) UNION ALL C

you can use parentheses to control the order of evaluation. Without parentheses, UNION and EXCEPT associate left-to-right

Note that this is not in any way related to a particular ordering of the rows in the resultset.

Laurenz Albe Laurenz Albe 62.1k4 gold badges57 silver badges93 bronze badges · Accepted Answer · 2023-11-15 05:20:48Z

Jasen's answer is correct - PostgreSQL is free to return the rows in any order it pleases unless you add an ORDER BY clause:

(SELECT ... UNION SELECT ... UNION ALL SELECT ...) ORDER BY ...;

But let me explain why PostgreSQL doesn't return the rows in the order you expect. The reason is that the first UNION is not UNION ALL. If you had used UNION ALL everywhere, PostgreSQL would execute the query like this:

EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union all
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright 
where cumpleanios > current_date - 5;
 QUERY PLAN 
══════════════════════════════════════════════════════════════════
 Append
 -> Seq Scan on practicaleft
 Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
 -> Seq Scan on practicaright pr
 Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
 -> Seq Scan on practicaright
 Filter: (cumpleanios > (CURRENT_DATE - 5))
(7 rows)

That is, PostgreSQL would execute the three queries and simply append the results, and you would end up with the ordering you expected.

But you used union the first time, and union eliminates duplicates. This is executed as follows:

EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union 
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright
where cumpleanios > current_date - 5;
 QUERY PLAN 
══════════════════════════════════════════════════════════════════════════════
 Append
 -> HashAggregate
 Group Key: practicaleft.id, practicaleft.nombre
 -> Append
 -> Seq Scan on practicaleft
 Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
 -> Seq Scan on practicaright pr
 Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
 -> Seq Scan on practicaright
 Filter: (cumpleanios > (CURRENT_DATE - 5))
(10 rows)

PostgreSQL uses a hash aggregate to remove duplicates from the first two branches. The result rows are returned in the order they happen to have in the hash table, which is pretty random (good hash functions behave like that).

"The result rows are returned in the order they happen to have in the hash table" - oh, interesting. I would naively have expected them to still be returned in the same order as they come from the sequence scan, just be filtered out if they already exist in the hash set.
btw, it's a really bad idea to depend on any ordering as output from a seq scan. Postgres heap storage is unordered, and even if you control how rows are inserted into a table, they might get shuffled around during compaction. If you need a specific order, use ORDER BY.
@Laurenz Albe: about "That is, PostgreSQL would execute the three queries and simply append the results, and you would end up with the ordering you expected." And even that is not guaranteed. If for example the optimizer/planner chooses a parallel plan, you may get different order as well.
@ypercubeTM Theoretically you are right, but I don't think you will ever get synchronized sequential scans or parallel plans with tables that contain five rows.

Stack Exchange Network

What's the order of operations in PostgreSQL?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

What's the order of operations in PostgreSQL?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions