name, not a number. The name lives in a different table. So how do you pull two tables together into one answer? That is what a JOIN does, and once it clicks, half of SQL opens up.
The idea in one line
A JOIN matches rows from two tables using a shared id, so you can read facts from both in a single result.
Why are tables split up at all?
Look at our schema. A user has a name, email, and country. An order has a total and a status. Why not just stuff the user's name into every order row?
Because you would repeat it. If Sara places 50 orders, you would write "Sara" 50 times. Change her name once and you would have to fix 50 rows. Messy and easy to break.
So instead, the user lives once in users, and each order just holds a tiny pointer back to her:
users orders
----- ------
id | name id | user_id | total
1 | Sara 1 | 1 | 50
2 | Omar 2 | 1 | 30 <-- user_id 1 = Sara again
3 | 2 | 90 <-- user_id 2 = Omar
That user_id column is the string that ties them together. orders.user_id points at users.id. A JOIN is just you following that string.
The metaphor: matching name tags between two guest lists
Imagine a party with two clipboards. List A is "people who RSVP'd." List B is "people who paid." Each person wears a name tag with an id.
To find who both RSVP'd and paid, you walk down List A, and for each person you flip to List B looking for the same id. When the tags match, you staple their two rows together. That stapled row is one row in your result.
The big question (same as the laundry and notebook posts): what do you do with people who appear on only one list? Your answer is the whole difference between join types.
INNER JOIN: only the matches
INNER JOIN keeps only rows that have a match on both sides. No match, no row.
SELECT users.name, orders.total
FROM users
INNER JOIN orders
ON users.id = orders.user_id;
The ON part is the matching rule: "an order belongs to a user when orders.user_id equals users.id."
users.id = orders.user_id
| |
1 --- matches --- 1 keep
1 --- matches --- 1 keep
2 --- matches --- 2 keep
3 --- no order --- dropped (no match)
So a user with no orders simply does not show up. Great for "show me users with orders," bad for "show me users without orders."
LEFT JOIN: keep everyone on the left
LEFT JOIN keeps all rows from the left table, even when the right side has no match. The empty spots get filled with NULL.
SELECT users.name, orders.total
FROM users
LEFT JOIN orders
ON users.id = orders.user_id;
name | total
-----+------
Sara | 50
Sara | 30
Omar | 90
Lina | NULL <-- kept, even with no orders
This is your tool for "find what is missing." Want users who never ordered? LEFT JOIN, then keep the NULL rows. We go deep on this in the bonus post understanding-sql-left-join.md, so check that out if you want the full story.
RIGHT JOIN and FULL OUTER JOIN (the short version)
RIGHT JOIN is just LEFT JOIN with the tables flipped. It keeps everything on the right. Honestly, most people pick LEFT and reorder their tables instead of thinking backwards. Keep your brain calm and stick with LEFT.
FULL OUTER JOIN keeps both sides, matched where it can, NULL where it cannot. Heads up: MySQL does not have it (Postgres does). In MySQL you fake it by combining a LEFT and a RIGHT with UNION.
Forget the ON and boom: a cartesian explosion
If you join two tables but forget the ON rule, the database does not error. It does something scary instead: it pairs every left row with every right row.
-- DANGER: no ON
SELECT users.name, orders.total
FROM users
JOIN orders;
10 users and 1,000 orders gives you 10,000 rows of nonsense. That is a cartesian product, and it is almost never what you want. Always give your join an ON.
Joining three tables: what did each user buy?
Real questions often cross more than two tables. "What products did each user buy?" needs four: users -> orders -> order_items -> products. You just chain the joins, following the ids like stepping stones.
SELECT users.name, products.name AS product
FROM users
JOIN orders ON users.id = orders.user_id
JOIN order_items ON orders.id = order_items.order_id
JOIN products ON order_items.product_id = products.id;
users --(user_id)--> orders --(order_id)--> order_items --(product_id)--> products
Each ON is one hop. Follow the trail and you can answer almost anything about the whole schema.
Gotchas
Forgetting ON. No ON means a cartesian explosion. If your result has way too many rows, this is suspect number one.
Ambiguous column names. Both users and products have a name column. If you just write SELECT name, the database does not know which one. Use users.name or give it an alias:
SELECT u.name AS user_name, p.name AS product_name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items i ON o.id = i.order_id
JOIN products p ON i.product_id = p.id;
Short aliases (u, o, p) keep long queries readable.
-
Surprise duplicate rows. A one-to-many join repeats the left row once per match. If Sara has 3 orders, her name shows up 3 times. That is correct, not a bug, but it surprises people. If you only want one row per user, combine joins with GROUP BY from Part 05.
Recap
- Tables are split so data is not repeated. Ids tie them back together.
-
INNER JOIN keeps only matching rows on both sides.
-
LEFT JOIN keeps every left row, filling missing matches with NULL.
-
RIGHT JOIN is LEFT flipped. FULL OUTER JOIN keeps both sides (no MySQL).
- The ON rule is the match condition. Forget it and you get a cartesian explosion.
- Chain joins to walk across many tables, one
ON hop at a time.
Your turn
Using the schema, write a query that lists every user's name next to the names of the products they bought. Then ask yourself: which join keeps users who bought nothing, INNER or LEFT? If you can explain it to a friend, you have got it.
Next up: Part 07: INSERT, UPDATE, DELETE, where we stop just reading data and start changing it.