I am looking to optimize a JOIN between tables that uses a small subset of columns in the second table.
CREATE TABLE `employees` (
`employee_id` bigint NOT NULL,
`manager_id` bigint NOT NULL,
`org_id` bigint NOT NULL,
`union_id` bigint NOT NULL
...
PRIMARY KEY (employee_id),
INDEX (union_id)
);
CREATE TABLE `managers` (
`manager_id` bigint NOT NULL,
`org_id` bigint NOT NULL,
`some_condition` boolean NOT NULL,
PRIMARY KEY (manager_id)
);
Now I want to optimize two types of queries. Both join the tables together on manager_id and org_id, and optionally apply a filter to the some_condition column.
SELECT employees.*
FROM employees
JOIN managers
ON (employees.manager_id = managers.manager_id AND employees.org_id = managers.org_id)
WHERE (employees.union_id = ? AND managers.some_condition);
SELECT employees.*
FROM employees
JOIN managers
ON (employees.manager_id = managers.manager_id AND employees.org_id = managers.org_id)
WHERE (employees.union_id = ?);
Assuming these are very large tables and employees > manager. I am trying to create an index on managers that will speed up the query. Right now the query is slow because for each row it has to read org_id and some_condition. I want to avoid going to disk if possible.
So far I have two indexes that might work:
INDEX `join_index` (`org_id`,`some_condition`)
INDEX `id_join_index` (`manager_id`, `org_id`, `some_condition`)
My main issue is that MySQL does not use either index in the EXPLAIN statement unless I force it to with use index (...)
.
Which index (if either) will speed up my query, and do I need manager_id in the index to speed up the join if I do not filter on some_condition
?
1 Answer 1
As said in the comment, you need the composite index for both tables. Also, you do need manager_id
in the index even if you're not filtering on some_condition
. Without it, MySQL can't efficiently locate the matching rows for the join.
Explore related questions
See similar questions with these tags.
(manager_id, org_id)
in both tables at least. Maybe(manager_id, org_id, union_id)
inemployees
and(manager_id, org_id, some_condition)
inmanagers
(depends on the data statistic, should be tested).