sneaky N+1 bug that haunts almost every ORM app.
The idea in one line
An index is a sorted shortcut that lets the database jump straight to the
rows you want, instead of reading the whole table to find them.
The metaphor: a dictionary
Imagine you want the word "ninja" in a paper dictionary.
You do not start at page 1 and read every word until you hit "ninja". That
would take all day. Instead, you flip to the "N" section, then narrow down fast.
The dictionary is sorted, so you jump close and zoom in.
Now imagine a dictionary with the words in random order. To find "ninja" you
would have to read it cover to cover. Painful.
That is the whole story:
No index --> read the dictionary cover to cover (full table scan)
With index --> jump to "N", then zoom in (sorted lookup)
A table with no index is the random-order dictionary. An index puts things in
order so the database can flip right to the page.
Why a query gets slow
When you run this:
SELECT * FROM orders WHERE user_id = 42;
If there is no index on user_id, the database does a full table scan. It
literally checks every row: "is this user 42? no. is this one? no..." all the
way down.
Row 1 user_id=7 nope
Row 2 user_id=3 nope
Row 3 user_id=42 yes! keep it
...
Row 9,999,999 (still reading)
With 100 rows, that is instant. With 10 million rows, that is your 8-second page.
Adding an index
You create an index on the column you search by:
CREATE INDEX idx_orders_user_id ON orders (user_id);
Now the database keeps a sorted list of user_id values, each pointing to where
that row lives. The mental model is a B-tree: a sorted structure that lets
the database binary-search (split in half, again and again) instead of reading
everything.
Looking for user_id = 42 in a sorted index:
[ 50 ]
/ \
[ 25 ] [ 75 ]
|
42 is here-ish, jump straight to it
Each step throws away half the remaining rows. That is how it finds one row out
of millions in a blink.
When indexes help
Index the columns you actually look things up by:
- Columns in a
WHERE filter: WHERE user_id = 42
- Columns in a
JOIN ... ON: ON orders.user_id = users.id
- Columns in
ORDER BY: ORDER BY created_at
If you sort or filter or join on it a lot, it is a good index candidate.
The cost (indexes are not free)
An index is like keeping that sorted list updated. Every time you INSERT,
UPDATE, or DELETE, the database also has to fix the index. So:
- Writes get a little slower.
- Indexes take up disk space.
That is why you do not index every column. Index the ones you search on, and
leave the rest alone.
See it with EXPLAIN
You do not have to guess whether your query uses an index. Ask the database with
EXPLAIN (or EXPLAIN ANALYZE to actually run it and time it):
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
You are looking for one word:
Seq Scan on orders --> bad, it read the whole table
Index Scan on orders --> good, it used the index
"Seq Scan" (sequential scan) means cover-to-cover reading. "Index Scan" means it
jumped to the right page. If you added an index and still see a Seq Scan, your
query is hiding an index killer (more on that below).
The N+1 problem (read this twice)
This one bites almost every web dev who uses an ORM. Say you load 50 users, then
show each user's orders. The lazy ORM does this:
1 query: SELECT * FROM users LIMIT 50; -- get the users
then for EACH user:
SELECT * FROM orders WHERE user_id = 1; -- query #2
SELECT * FROM orders WHERE user_id = 2; -- query #3
...
SELECT * FROM orders WHERE user_id = 50; -- query #51
That is 1 + 50 = 51 queries for one page. Each one is a round trip to the
database. This is the N+1 problem: 1 query to get the list, then N more, one per
item.
The fix: ask once
Grab all the orders in a single query instead of 50:
-- One batched query for everyone's orders
SELECT * FROM orders WHERE user_id IN (1, 2, 3, ..., 50);
Or join them together:
SELECT users.name, orders.total
FROM users
LEFT JOIN orders ON orders.user_id = users.id;
In ORM land this is called eager loading. You tell the ORM "load the orders
with the users" up front:
// Sequelize
User.findAll({ include: Order })
// Prisma
prisma.user.findMany({ include: { orders: true } })
// Rails / ActiveRecord
User.includes(:orders)
// Laravel / Eloquent
User::with('orders')->get()
51 queries become 2. Your slow page is fast again.
Gotchas juniors hit
-
Indexing a tiny table. A 50-row table is faster to just scan. An index
there adds work for no win. Indexes shine on big tables.
-
Indexing a low-variety column. A column like
is_active that is only
true or false barely helps. The index can only split the table in two,
so the database often just scans anyway.
-
Leading wildcard
LIKE. WHERE name LIKE '%ohn' cannot use a normal
index, because the index is sorted by the start of the word. It is like
asking the dictionary for every word that ends in "ohn". Sorting by first
letter does not help. WHERE name LIKE 'Joh%' is fine though.
Recap
- No index means a full table scan: the database reads every row.
- An index is a sorted shortcut (think B-tree) so it can jump straight in.
- Index columns used in
WHERE, JOIN ON, and ORDER BY.
- Indexes cost write speed and disk, so do not index everything.
- Use
EXPLAIN to see "Index Scan" (good) vs "Seq Scan" (bad).
- The N+1 problem is 1 + N queries. Fix it with a JOIN, a batched
WHERE ... IN (...), or your ORM's eager loading.
Your turn
You have a query: SELECT * FROM users WHERE email = 'a@b.com'. Which column
should you index, and why? And if a page loads 30 products then runs one extra
query per product to get its category, what is that bug called and how do you fix
it? Explain both to a friend and you have got it.
Next up is Part 12: Transactions, where we make several statements happen
all-or-nothing, so you never end up with a paid order that has no items.