Complex key performance problem (MySQL, InnoDB)

Question 1

A query is painstakingly slow in both production environment and (fortunately reproducable) on my local dev box. Production is Linux and my dev is currently Windows 10 with MySQL 5.7.11:

SHOW VARIABLES LIKE "%version%";
'innodb_version', '5.7.11'
'protocol_version', '10'
'slave_type_conversions', ''
'tls_version', 'TLSv1,TLSv1.1'
'version', '5.7.11-log'
'version_comment', 'MySQL Community Server (GPL)'
'version_compile_machine', 'x86_64'
'version_compile_os', 'Win64'

The tables are InnoDB. The query involves just one table and is generated by Django ORM. Here is the table:

select column_name, column_type, is_nullable
 from information_schema.columns
 where table_name='main_contactinfo'
 and table_schema='live040616'
id int(11) NO
first_name varchar(100) YES
last_name varchar(100) YES
club_name varchar(100) YES
email varchar(254) YES
team_name_id int(11) YES
address varchar(100) YES
city varchar(100) YES
state varchar(100) YES
zip varchar(20) YES
contact_type varchar(100) YES
player_id int(11) YES
last_modified datetime NO
object_id varchar(300) YES
sync_inactive tinyint(1) NO
customer_account_id int(11) YES
is_asgr tinyint(1) NO
relation varchar(150) YES
note longtext YES
twitter varchar(100) YES
facebook varchar(100) YES
other longtext YES
aau longtext YES
aau_coachnum varchar(300) YES
hs_coachnum varchar(300) YES
parent_status varchar(300) YES
instagram varchar(100) YES

The table has about 21702503 record in it. The query:

SELECT main_contactinfo.id, main_contactinfo.first_name, main_contactinfo.last_name,
 main_contactinfo.club_name, main_contactinfo.email, main_contactinfo.team_name_id,
 main_contactinfo.address, main_contactinfo.city, main_contactinfo.state,
 main_contactinfo.zip, main_contactinfo.contact_type, main_contactinfo.relation,
 main_contactinfo.player_id, main_contactinfo.last_modified,
 main_contactinfo.object_id, main_contactinfo.sync_inactive,
 main_contactinfo.customer_account_id, main_contactinfo.is_asgr,
 main_contactinfo.note, main_contactinfo.twitter, main_contactinfo.facebook,
 main_contactinfo.instagram, main_contactinfo.other, main_contactinfo.aau,
 main_contactinfo.aau_coachnum, main_contactinfo.hs_coachnum,
 main_contactinfo.parent_status
FROM main_contactinfo 
WHERE ( main_contactinfo.customer_account_id = 12345 
 AND ( (main_contactinfo.sync_inactive = 1 
 AND main_contactinfo.last_modified >= '2016-04-17 00:16:55') 
 OR (main_contactinfo.last_modified > '2015-09-28 17:40:21' 
 AND main_contactinfo.sync_inactive = 0) 
 OR (main_contactinfo.sync_inactive = 0
 AND main_contactinfo.last_modified = '2015-09-28 17:40:21'
 AND main_contactinfo.id > 19968645)
 )
 ) 
ORDER BY main_contactinfo.last_modified ASC, 
 main_contactinfo.id ASC 
LIMIT 83;

The WHERE clause operates on the last_modified, because it's a part of a system where only changes should be returned since a certain time stamp (datetime). The query on my machine takes ~190 seconds, 80 records returned. I see the HDD I/O light lit up during that time, and I'm running two Crucial M550 in RAID0. That's nowhere near today's nVME SSDs, but it's still ~1GBps throughput.

I was suspicious about the sync_inactive (which is a boolean field on the other end of the ORM), that's not covered by an index, creating an index on sync_inactive didn't help.
I gradually created composite indexes covering more and more columns, but even a CREATE UNIQUE INDEX composite_index ON main_contactinfo(id, customer_account_id, last_modified, sync_inactive); doesn't help much.
Can it be a problem that I use a range for datetime?
Also notice that I have 3 longtext columns involved in the query (I have no clue why the ORM includes those, I wish it'd do some lazy loading. But I removed those 3 fields from the query and those didn't help either
I wonder if there can be some unwanted conversion character set or collation conversion? What I see is that the character set of the columns are latin1, and collation is latin1_swwedish_ci, while the collation and character set enforced by the ORM side is utf8.

However the main problem seems to be by the OPTIMIZER_TRACE is that even with a huge index which covers a lot of columns the engine chooses another index, which only covers customer_account_id. The reason why my covering composite index was not chosen is "unknown". How can I improve that query?

 "analyzing_range_alternatives": {
 "range_scan_alternatives": [
 {
 "index": "composite_index",
 "chosen": false,
 "cause": "unknown"
 },
 {
 "index": "main_contactinfo_f4b91458",
 "ranges": [
 "12345 <= customer_account_id <= 12345"
 ],
 "index_dives_for_eq_ranges": true,
 "rowid_ordered": true,
 "using_mrr": false,
 "index_only": false,
 "rows": 42294,
 "cost": 50754,
 "chosen": true
 },
 {
 "index": "main_contactinfo_last_modified_4769d8a97da686d4_uniq",
 "ranges": [
 "0x9997391a15 <= last_modified <= 0x9997391a15 AND 19968645 < id",
 "0x9997391a15 < last_modified"
 ],
 "index_dives_for_eq_ranges": true,
 "rowid_ordered": false,
 "using_mrr": false,
 "index_only": false,
 "rows": 10272032,
 "cost": 1.23e7,
 "chosen": false,
 "cause": "cost"
 }
 ],
 "analyzing_roworder_intersect": {
 "usable": false,
 "cause": "too_few_roworder_scans"
 }
 },

More complete optimize trace (unfortunately truncated though): https://gist.github.com/MrCsabaToth/62ed290b6c8ddd4f40a172a87a8c0f84

Question 2

SHOW CREATE TABLE is a better way to show the schema.

Question 3

For this WHERE clause, I would try an index on (customer_account_id, sync_inactive, last_modified, id):

ALTER TABLE main_contactinfo
 ADD INDEX customer_active_last_modified
 ( customer_account_id, 
 sync_inactive, 
 last_modified, 
 id
 ) ;

The ORDER BY with e small LIMIT complicates things though, so a different order of the columns in the index might be better: (customer_account_id, last_modified, id, sync_inactive).

Question 4

So you say that the order of the elements of the composite index matter that much? In the 2nd point I mention that I created a unique index on (id, customer_account_id, last_modified, sync_inactive). And that's the index I'm referring later in my post. I'll try your order.

Question 5

You are the man! Looks like working

Question 6

For me your first proposed index provided fast results

Question 7

I wonder why the second variation could be faster. These queries are used in a REST API protocol, with a special paging criteria (hence the special WHERE clauses). Usually the page size is bigger, like 1000 or so. Once the initial download by a client is done, the query gives back only changed items, which allows differential updates. So in the initial load there are many items, while later there are lower counts.

Question 8

I'll update the answer with more details when I find time. (more details: why your (id, ...) index is not useful, why this order and not some other, why the 2nd suggestion, etc.)

ypercubeTM ypercubeTM 99.7k13 gold badges217 silver badges306 bronze badges · Accepted Answer · 2016-04-18 19:25:37Z

3

For this WHERE clause, I would try an index on (customer_account_id, sync_inactive, last_modified, id):

ALTER TABLE main_contactinfo
 ADD INDEX customer_active_last_modified
 ( customer_account_id, 
 sync_inactive, 
 last_modified, 
 id
 ) ;

The ORDER BY with e small LIMIT complicates things though, so a different order of the columns in the index might be better: (customer_account_id, last_modified, id, sync_inactive).

Share

Improve this answer

edited Apr 18, 2016 at 21:39

Csaba Toth's user avatar

Csaba Toth

2393 silver badges9 bronze badges

answered Apr 18, 2016 at 19:25

ypercubeTM's user avatar

ypercubeTM ypercubeTM

99.7k13 gold badges217 silver badges306 bronze badges

7

So you say that the order of the elements of the composite index matter that much? In the 2nd point I mention that I created a unique index on (id, customer_account_id, last_modified, sync_inactive). And that's the index I'm referring later in my post. I'll try your order.

Csaba Toth
– Csaba Toth

2016年04月18日 21:06:32 +00:00
Commented Apr 18, 2016 at 21:06
You are the man! Looks like working

Csaba Toth
– Csaba Toth

2016年04月18日 21:35:24 +00:00
Commented Apr 18, 2016 at 21:35
For me your first proposed index provided fast results

Csaba Toth
– Csaba Toth

2016年04月18日 21:49:33 +00:00
Commented Apr 18, 2016 at 21:49
I wonder why the second variation could be faster. These queries are used in a REST API protocol, with a special paging criteria (hence the special WHERE clauses). Usually the page size is bigger, like 1000 or so. Once the initial download by a client is done, the query gives back only changed items, which allows differential updates. So in the initial load there are many items, while later there are lower counts.

Csaba Toth
– Csaba Toth

2016年04月18日 22:31:49 +00:00
Commented Apr 18, 2016 at 22:31
1

I'll update the answer with more details when I find time. (more details: why your (id, ...) index is not useful, why this order and not some other, why the 2nd suggestion, etc.)

ypercubeᵀᴹ
– ypercubeᵀᴹ

2016年04月19日 06:35:00 +00:00
Commented Apr 19, 2016 at 6:35

| Show 2 more comments

Stack Exchange Network

Complex key performance problem (MySQL, InnoDB)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Complex key performance problem (MySQL, InnoDB)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions