-- WRONG: orders copies the user's name and email into every row
orders
+----+-----------+-------------------+--------+
| id | user_name | user_email | total |
+----+-----------+-------------------+--------+
| 10 | Sara | sara@old.com | 50.00 |
| 11 | Sara | sara@old.com | 20.00 |
| 12 | Sara | sara@old.com | 99.00 |
+----+-----------+-------------------+--------+
Looks fine, until Sara changes her email to sara@new.com. Now you have to update it in every single order row. Miss one, and your data lies to you. You will have three rows saying old.com and pretend that is still Sara.
This bug has a name: an update anomaly. The same fact lives in many places, so one fact can fall out of sync with itself.
The fix is the address book idea. Store Sara once, and point to her:
-- RIGHT: store the user once, point by id
users
+----+------+----------------+
| id | name | email |
+----+------+----------------+
| 1 | Sara | sara@new.com | <-- change it here, ONE time
+----+------+----------------+
orders
+----+---------+--------+
| id | user_id | total |
+----+---------+--------+
| 10 | 1 | 50.00 | <-- just points to Sara
| 11 | 1 | 20.00 |
| 12 | 1 | 99.00 |
+----+---------+--------+
Sara changes her email? You update one row in users. Every order is instantly correct, because no order stored a copy in the first place. One source of truth.
(You may hear the fancy names for the rules behind this: 1NF, 2NF, 3NF. They exist, you can read them later, but the plain idea "do not repeat the same fact" gets you 90% of the way.)
The full schema, all together
Here is how our four tables connect:
users
| (one-to-many: user_id on orders)
v
orders
| (many-to-many through order_items)
v
order_items -----> products
Read it as: a user has many orders, and an order links to many products through order_items.
Gotchas
-
Over-normalizing everything. You do not need a separate table for every tiny thing. A
country column on users is usually fine as plain text. Do not build a 6-table maze for data that never changes and is never reused.
-
A little duplication is sometimes okay. Storing the product's price at the time of sale inside
order_items is smart, not a sin. Prices change, but an old order should keep the price the customer actually paid. The test is: is this a snapshot (keep it) or a live fact (point to it)?
-
Forgetting the foreign key constraint. Pointing by id only helps if the id is real. Add a
FOREIGN KEY so the database refuses orphan rows (an order pointing to a user that does not exist).
Recap
-
One-to-many: foreign key on the many side (
orders.user_id).
-
Many-to-many: a join table in the middle (
order_items).
-
One-to-one: like one-to-many, but the key is
UNIQUE.
-
Normalization: store each fact once, point to it by id, change it in one spot.
- Copying the same data everywhere causes update anomalies (stale, conflicting copies).
- Do not overdo it. Snapshots like the sale price are fine to store.
Your turn
You are building a blog. A post can have many tags, and a tag can be on many posts. What three tables do you need, and which one is the join table? If you can draw the little arrow diagram for a friend, you have got it.
Next up: Part 10: Subqueries and CTEs, where we put a query inside a query and then clean it all up with a tidy WITH block.