I'm a DB student, and I executed the following query to learn a couple things at the same time (LEFT/RIGHT JOIN, UNION, WHERE + RegEx). What's troubling me is the order of execution. I have two tables, as such:
create table practicaleft(
id smallint primary key,
nombre varchar,
cumple date
);
create table practicaright(
id smallint primary key,
apellido varchar,
cumpleanios date
);
Then, I insert some random data:
INSERT INTO practicaleft VALUES
(1, 'John', CURRENT_DATE - 1),
(5, 'Alice', CURRENT_DATE - 5),
(3, 'Bob', CURRENT_DATE - 3),
(7, 'Eva', CURRENT_DATE - 7);
INSERT INTO practicaright VALUES
(5, 'Doe', CURRENT_DATE - 5),
(6, 'Smith', CURRENT_DATE - 6),
(3, 'Johnson', CURRENT_DATE - 3),
(4, 'Brown', CURRENT_DATE - 4);
Afterwards, I execute this query:
select id, nombre
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright
where cumpleanios > current_date - 5;
The results? Here you go:
4 "Brown"
5 "Alice"
5 "Doe"
3 "Johnson"
3 "Johnson"
4 "Brown"
TL;DR: this query is divided in three parts, and results are merged with the operator UNION ALL.
Now comes the question. One might believe this is executed instruction by instruction, and so, the order should be:
5 "Alice"
5 "Doe"
3 "Johnson"
4 "Brown"
3 "Johnson"
4 "Brown"
But that isn't happening. The only way to fix that is if I add some random string as a field, like so:
select id, nombre, 'part1' as query_part
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido, 'part2' as query_part
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido, 'part3' as query_part
from practicaright
where cumpleanios > current_date - 25;
What is happening? Did I skip over some truly important SQL mechanic?
-
11Order of operations is in no way related to the order of rows returned by a query.mustaccio– mustaccio2023年11月15日 00:26:59 +00:00Commented Nov 15, 2023 at 0:26
-
2The first UNION does not have ALL.jjanes– jjanes2023年11月15日 09:23:53 +00:00Commented Nov 15, 2023 at 9:23
-
2Question should be re-titled to ordering rows ... order of operations is different.JosephDoggie– JosephDoggie2023年11月16日 14:41:22 +00:00Commented Nov 16, 2023 at 14:41
-
Closely related: Are results from UNION ALL clauses always appended in order?Erwin Brandstetter– Erwin Brandstetter2023年11月17日 17:22:46 +00:00Commented Nov 17, 2023 at 17:22
4 Answers 4
Jasen's answer is correct - PostgreSQL is free to return the rows in any order it pleases unless you add an ORDER BY
clause:
(SELECT ... UNION SELECT ... UNION ALL SELECT ...) ORDER BY ...;
But let me explain why PostgreSQL doesn't return the rows in the order you expect. The reason is that the first UNION
is not UNION ALL
. If you had used UNION ALL
everywhere, PostgreSQL would execute the query like this:
EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union all
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright
where cumpleanios > current_date - 5;
QUERY PLAN
══════════════════════════════════════════════════════════════════
Append
-> Seq Scan on practicaleft
Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
-> Seq Scan on practicaright pr
Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
-> Seq Scan on practicaright
Filter: (cumpleanios > (CURRENT_DATE - 5))
(7 rows)
That is, PostgreSQL would execute the three queries and simply append the results, and you would end up with the ordering you expected.
But you used union
the first time, and union
eliminates duplicates. This is executed as follows:
EXPLAIN (COSTS OFF)
select id, nombre
from practicaleft
where nombre similar to 'A%'
union
select pr.id, pr.apellido
from practicaright pr
where pr.id = 4 or pr.apellido ilike '_o%'
union all
select id, apellido
from practicaright
where cumpleanios > current_date - 5;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════
Append
-> HashAggregate
Group Key: practicaleft.id, practicaleft.nombre
-> Append
-> Seq Scan on practicaleft
Filter: ((nombre)::text ~ '^(?:A.*)$'::text)
-> Seq Scan on practicaright pr
Filter: ((id = 4) OR ((apellido)::text ~~* '_o%'::text))
-> Seq Scan on practicaright
Filter: (cumpleanios > (CURRENT_DATE - 5))
(10 rows)
PostgreSQL uses a hash aggregate to remove duplicates from the first two branches. The result rows are returned in the order they happen to have in the hash table, which is pretty random (good hash functions behave like that).
-
1"The result rows are returned in the order they happen to have in the hash table" - oh, interesting. I would naively have expected them to still be returned in the same order as they come from the sequence scan, just be filtered out if they already exist in the hash set.Bergi– Bergi2023年11月15日 20:28:36 +00:00Commented Nov 15, 2023 at 20:28
-
8btw, it's a really bad idea to depend on any ordering as output from a seq scan. Postgres heap storage is unordered, and even if you control how rows are inserted into a table, they might get shuffled around during compaction. If you need a specific order, use ORDER BY.Josh Berkus– Josh Berkus2023年11月15日 20:40:38 +00:00Commented Nov 15, 2023 at 20:40
-
1@Laurenz Albe: about "That is, PostgreSQL would execute the three queries and simply append the results, and you would end up with the ordering you expected." And even that is not guaranteed. If for example the optimizer/planner chooses a parallel plan, you may get different order as well.ypercubeᵀᴹ– ypercubeᵀᴹ2023年11月20日 19:25:04 +00:00Commented Nov 20, 2023 at 19:25
-
@ypercubeTM Theoretically you are right, but I don't think you will ever get synchronized sequential scans or parallel plans with tables that contain five rows.Laurenz Albe– Laurenz Albe2023年11月20日 20:40:09 +00:00Commented Nov 20, 2023 at 20:40
-
haha no, certainly not. I meant it in general.ypercubeᵀᴹ– ypercubeᵀᴹ2023年11月21日 12:40:08 +00:00Commented Nov 21, 2023 at 12:40
SQL makes no guarantee of result ordering unless you have an order by
clause in your query.
If you don't say "order by" your results will come in whatever order the query planner and database engine decide is most efficient (or sufficiently efficient).
Parallel table scans are a thing where several queries scan through the same table at the same time. but your example tables are probably too short for even that.
When I have a string of union all
s that I want in order I add an ordering column with constant values to the query. 1 as sort
2 as sort
3 as sort
order by sort
-
1And an
ORDER BY
on that column.jcaron– jcaron2023年11月15日 08:54:55 +00:00Commented Nov 15, 2023 at 8:54
I am sorry, I am not an expert in Postgres but the question was about order of operations but not the order of the rows in result set. There are union and union all between 3 datasets. Let's say we have dataset 1,2,3 union 3,4,5 and union all 5,6,7. If union is applied first the result should be 1,2,3,4,5,5,6,7. if union all is executed first then result should be 1,2,3,4,5,6,7 because implicit group by from union is applied on the last step. Please correct me if I am wrong.
As per the documentation, the order is left-to-right. So while the engine is free to do whatever it thinks is optimal: logically, the result will be such that it reflects this order of operations:
(A UNION B) UNION ALL C
you can use parentheses to control the order of evaluation. Without parentheses, UNION and EXCEPT associate left-to-right
Note that this is not in any way related to a particular ordering of the rows in the resultset.