A query is painstakingly slow in both production environment and (fortunately reproducable) on my local dev box. Production is Linux and my dev is currently Windows 10 with MySQL 5.7.11:
SHOW VARIABLES LIKE "%version%";
'innodb_version', '5.7.11'
'protocol_version', '10'
'slave_type_conversions', ''
'tls_version', 'TLSv1,TLSv1.1'
'version', '5.7.11-log'
'version_comment', 'MySQL Community Server (GPL)'
'version_compile_machine', 'x86_64'
'version_compile_os', 'Win64'
The tables are InnoDB. The query involves just one table and is generated by Django
ORM. Here is the table:
select column_name, column_type, is_nullable
from information_schema.columns
where table_name='main_contactinfo'
and table_schema='live040616'
id int(11) NO
first_name varchar(100) YES
last_name varchar(100) YES
club_name varchar(100) YES
email varchar(254) YES
team_name_id int(11) YES
address varchar(100) YES
city varchar(100) YES
state varchar(100) YES
zip varchar(20) YES
contact_type varchar(100) YES
player_id int(11) YES
last_modified datetime NO
object_id varchar(300) YES
sync_inactive tinyint(1) NO
customer_account_id int(11) YES
is_asgr tinyint(1) NO
relation varchar(150) YES
note longtext YES
twitter varchar(100) YES
facebook varchar(100) YES
other longtext YES
aau longtext YES
aau_coachnum varchar(300) YES
hs_coachnum varchar(300) YES
parent_status varchar(300) YES
instagram varchar(100) YES
The table has about 21702503 record in it. The query:
SELECT main_contactinfo.id, main_contactinfo.first_name, main_contactinfo.last_name,
main_contactinfo.club_name, main_contactinfo.email, main_contactinfo.team_name_id,
main_contactinfo.address, main_contactinfo.city, main_contactinfo.state,
main_contactinfo.zip, main_contactinfo.contact_type, main_contactinfo.relation,
main_contactinfo.player_id, main_contactinfo.last_modified,
main_contactinfo.object_id, main_contactinfo.sync_inactive,
main_contactinfo.customer_account_id, main_contactinfo.is_asgr,
main_contactinfo.note, main_contactinfo.twitter, main_contactinfo.facebook,
main_contactinfo.instagram, main_contactinfo.other, main_contactinfo.aau,
main_contactinfo.aau_coachnum, main_contactinfo.hs_coachnum,
main_contactinfo.parent_status
FROM main_contactinfo
WHERE ( main_contactinfo.customer_account_id = 12345
AND ( (main_contactinfo.sync_inactive = 1
AND main_contactinfo.last_modified >= '2016-04-17 00:16:55')
OR (main_contactinfo.last_modified > '2015-09-28 17:40:21'
AND main_contactinfo.sync_inactive = 0)
OR (main_contactinfo.sync_inactive = 0
AND main_contactinfo.last_modified = '2015-09-28 17:40:21'
AND main_contactinfo.id > 19968645)
)
)
ORDER BY main_contactinfo.last_modified ASC,
main_contactinfo.id ASC
LIMIT 83;
The WHERE clause operates on the last_modified
, because it's a part of a system where only changes should be returned since a certain time stamp (datetime). The query on my machine takes ~190 seconds, 80 records returned. I see the HDD I/O light lit up during that time, and I'm running two Crucial M550 in RAID0. That's nowhere near today's nVME SSDs, but it's still ~1GBps throughput.
- I was suspicious about the
sync_inactive
(which is a boolean field on the other end of the ORM), that's not covered by an index, creating an index onsync_inactive
didn't help. - I gradually created composite indexes covering more and more columns, but even a
CREATE UNIQUE INDEX composite_index ON main_contactinfo(id, customer_account_id, last_modified, sync_inactive);
doesn't help much. - Can it be a problem that I use a range for datetime?
- Also notice that I have 3
longtext
columns involved in the query (I have no clue why theORM
includes those, I wish it'd do some lazy loading. But I removed those 3 fields from the query and those didn't help either - I wonder if there can be some unwanted conversion character set or collation conversion? What I see is that the character set of the columns are latin1, and collation is latin1_swwedish_ci, while the collation and character set enforced by the ORM side is utf8.
However the main problem seems to be by the OPTIMIZER_TRACE
is that even with a huge index which covers a lot of columns the engine chooses another index, which only covers customer_account_id
. The reason why my covering composite index was not chosen is "unknown". How can I improve that query?
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "composite_index",
"chosen": false,
"cause": "unknown"
},
{
"index": "main_contactinfo_f4b91458",
"ranges": [
"12345 <= customer_account_id <= 12345"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 42294,
"cost": 50754,
"chosen": true
},
{
"index": "main_contactinfo_last_modified_4769d8a97da686d4_uniq",
"ranges": [
"0x9997391a15 <= last_modified <= 0x9997391a15 AND 19968645 < id",
"0x9997391a15 < last_modified"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 10272032,
"cost": 1.23e7,
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
More complete optimize trace (unfortunately truncated though): https://gist.github.com/MrCsabaToth/62ed290b6c8ddd4f40a172a87a8c0f84
1 Answer 1
For this WHERE
clause, I would try an index on (customer_account_id, sync_inactive, last_modified, id)
:
ALTER TABLE main_contactinfo
ADD INDEX customer_active_last_modified
( customer_account_id,
sync_inactive,
last_modified,
id
) ;
The ORDER BY
with e small LIMIT
complicates things though, so a different order of the columns in the index might be better: (customer_account_id, last_modified, id, sync_inactive)
.
-
So you say that the order of the elements of the composite index matter that much? In the 2nd point I mention that I created a unique index on
(id, customer_account_id, last_modified, sync_inactive)
. And that's the index I'm referring later in my post. I'll try your order.Csaba Toth– Csaba Toth2016年04月18日 21:06:32 +00:00Commented Apr 18, 2016 at 21:06 -
You are the man! Looks like workingCsaba Toth– Csaba Toth2016年04月18日 21:35:24 +00:00Commented Apr 18, 2016 at 21:35
-
For me your first proposed index provided fast resultsCsaba Toth– Csaba Toth2016年04月18日 21:49:33 +00:00Commented Apr 18, 2016 at 21:49
-
I wonder why the second variation could be faster. These queries are used in a REST API protocol, with a special paging criteria (hence the special WHERE clauses). Usually the page size is bigger, like 1000 or so. Once the initial download by a client is done, the query gives back only changed items, which allows differential updates. So in the initial load there are many items, while later there are lower counts.Csaba Toth– Csaba Toth2016年04月18日 22:31:49 +00:00Commented Apr 18, 2016 at 22:31
-
1I'll update the answer with more details when I find time. (more details: why your
(id, ...)
index is not useful, why this order and not some other, why the 2nd suggestion, etc.)ypercubeᵀᴹ– ypercubeᵀᴹ2016年04月19日 06:35:00 +00:00Commented Apr 19, 2016 at 6:35
Explore related questions
See similar questions with these tags.
SHOW CREATE TABLE
is a better way to show the schema.