average order total. A subquery that returns exactly one value (one row, one column) is called a scalar subquery. You can drop it right where a number would go.
SELECT name, total
FROM orders
JOIN users ON users.id = orders.user_id
WHERE total > (
SELECT AVG(total) -- one single number, the average order total
FROM orders
);
The inner SELECT AVG(total) returns one number, say 45.00. SQL swaps it in, so the line becomes WHERE total > 45.00. The sauce is made, now use it.
Subquery #3: in FROM (a derived table)
You can also use a subquery as if it were a temporary table. This is called a derived table. Say you want the count of orders per user, then only the busy users:
SELECT user_id, order_count
FROM (
SELECT user_id, COUNT(*) AS order_count -- a mini-table built on the fly
FROM orders
GROUP BY user_id
) AS user_orders
WHERE order_count > 3;
The inner query builds a little table with two columns. The outer query then treats it like any normal table and filters it. Note the AS user_orders: a derived table needs a name.
A quick word on correlated subqueries
Most subqueries run once. A correlated subquery is different: it points back at the outer query, so it runs again for every single row. See the o2.user_id = o1.user_id link below:
SELECT o1.id, o1.total
FROM orders o1
WHERE o1.total > (
SELECT AVG(o2.total)
FROM orders o2
WHERE o2.user_id = o1.user_id -- depends on the current outer row
);
This finds orders bigger than that user's own average. Handy, but careful: if you have a million rows, that inner query runs a million times. It can get slow.
CTEs: the same thing, but readable
When subqueries pile up, your query turns into a nest of parentheses that nobody wants to read. A CTE (Common Table Expression) uses the WITH keyword to name a sub-step up front, like writing "Sauce" on your prep board.
Here is a nested mess:
-- Hard to read: where does one query end and the next begin?
SELECT name
FROM users
WHERE id IN (
SELECT user_id
FROM orders
WHERE total > (SELECT AVG(total) FROM orders)
);
Now the same logic with CTEs. Read it top to bottom like a story:
WITH avg_order AS (
SELECT AVG(total) AS amount FROM orders
),
big_spenders AS (
SELECT DISTINCT user_id
FROM orders, avg_order
WHERE total > avg_order.amount
)
SELECT name
FROM users
WHERE id IN (SELECT user_id FROM big_spenders);
First we name the average avg_order. Then we name the users who beat it big_spenders. Then the final query just reads "give me the names of the big spenders." Each step has a clear label.
When to use which
-
Subquery: quick, one-off, used in one spot. A simple
IN or scalar check.
-
CTE: when the query gets hard to read, or when you want to reuse the same sub-result more than once. A CTE can be referenced again and again. A buried subquery has to be copy-pasted.
Both give the same answers. The difference is how kind you are being to the next person who reads your code (often future you).
Gotchas
-
Scalar subquery that returns more than one row. If you write
WHERE total > (SELECT total FROM orders) and that inner query returns many rows, the database errors out. A > wants one value. Use AVG, MAX, or add a LIMIT 1.
-
Correlated subqueries can be slow. They run per row. If a query crawls, check whether the inner part secretly depends on the outer row, and consider a JOIN or CTE instead.
-
Forgetting to name a derived table. A subquery in
FROM must have an alias (AS user_orders), or SQL complains.
Recap
- A subquery is a query inside a query. The inner one runs first.
-
IN subquery: filter by a list of values from another table.
-
Scalar subquery: returns one value, drop it where a number goes.
-
Derived table: a subquery in
FROM, used like a temporary table (needs a name).
-
Correlated subquery: runs per row, can be slow.
- A CTE (
WITH name AS (...)) names a sub-step so the query reads top to bottom.
Your turn
Write a query that finds every product that has never been ordered. Hint: the ordered product ids live in order_items, and you want products whose id is not in that list. Try it first as a subquery, then rewrite it as a CTE. If you can explain why the CTE reads easier, you have got it.
That wraps the core querying skills. Next up: Part 11: Indexes, where we make all these queries run fast.