On a 5.5.23 server (that needs to go to 5.7 at least, I know) I have an InnoDB table with 3 columns (there are a few more that are not relevant here):
TimeC (bigint, unix timestamp)
id (integer)
value (integer).
The table currently has an index on id
. The table contains ~15mil rows. The three columns above all allow NULL values (I suspect that's a first clue here).
The query in question is like this:
select id, sum(value) as TotalV
from table_name
where timeC >=154013600000 and timeC < 1540742400000
group by id
order by TotalV desc;
This query takes about 280sec on our server.
I decided to add an index on the date-related column TimeC
.
The query now takes about 260sec, i.e. the improvement is very small.
The "explain" utility shows, without the new index:
select_type: SIMPLE
table: TableName
type: index
possible_keys: NULL
key: TableName_CC_IDX
key_len: 768
ref: NULL
rows: 14830632
Extra: Using where; Using temporary; Using filesort
...and WITH the new index:
select_type: SIMPLE
table: TableName
type: range
possible_keys: TableName_TimeC
key: TableName_TimeC
key_len: 9
ref: NULL
rows: 540694
Extra: Using where; Using temporary; Using filesort
So my question is, why is it not using BOTH indexes in the latter case when both indexes exist and when both are relevant to the specific query? Generally, what could I do to speed up this query?
(from Comment)
CREATE TABLE TableName (
id varchar(255) DEFAULT NULL,
timeC bigint(20) DEFAULT NULL,
value decimal(42,2) DEFAULT NULL,
KEY TableName (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> show indexes from TableName\G
Table: TableName
Non_unique: 1
Key_name: TableName_CC_IDX
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 29
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment: Index_comment:
2 Answers 2
You need a multicolumn index (id, timeC, value)
or (timeC, id, value)
because existing timeC
and defailt id
indexes are not sufficient.
The common rule is to create the index containing all the columns used by JOIN, GROUP BY, ORDER BY and WHERE clauses. The order columns should be listed in the index can vary due to the exact queries, columns cardinality and even the amount of data.
-
For more details on what columns to put in what order: mysql.rjweb.org/doc.php/index_cookbook_mysqlRick James– Rick James2018年11月02日 16:59:45 +00:00Commented Nov 2, 2018 at 16:59
The time change (280->260) is probably not significant but reflects what might be cached in RAM.
If that time range is a 'small' fraction of the total, then INDEX(timeC, id, value)
should help -- because fewer rows would need to be touched, and the index is "covering". Please add that index and provide EXPLAIN SELECT ...
The GROUP BY one-thing ORDER BY another-thing
will necessitate at least one "filesort".
How much RAM do you have? What is the value of innodb_buffer_pool_size
?
There may actually be an optimization improvement in 5.7 that a particular index could take advantage of.
Reach about Summary Tables as a possible solution to performance.
Please provide SHOW CREATE TABLE
. There is no way what you presented to us could lead to key_length=768. That smells like VARCHAR(255) NULL CHARACTER SET utf8
.
The table
Before we can fix the indexing, let's try to do some fixes to the table.
(Caution: The suggested SELECTs, below, will take a lengthy table scan.)
SELECT MAX(CHAR_LENGTH(id));
, then consider lowering the255
to some value not too much bigger than that max.154013600000
- Are you using Java? Do you need the milliseconds?BIGINT SIGNED
is 8 bytes;INT UNSIGNED
is 4 bytes.DECIMAL(42,2)
occupies 20 bytes.SELECT MAX(ABS(value))
and consider shrinking the size. Or considerFLOAT
(4 bytes).- All
NULL
?SELECT SUM(id IS NULL), SUM(timeC is NULL), SUM(value IS NULL)
to see if you actually have any nulls. SELECT id, timeC, value, COUNT(*) AS ct FROM t GROUP BY id, timeC, value ORDER BY ct DESC LIMIT 5;
-- to see if you have any dups and what some of them are. If no dups, then one of the proposed indexes could bePRIMARY KEY
, which is essentially 'free' in InnoDB.- "there are a few more that are not relevant here" -- Check the other columns for excessive size.
Let's try to get a PK added and the columns shrunken before continuing the INDEX
discussion.
My comments apply to both 5.5 and 5.7. (TIMESTAMP(3)
, which I did not bring up, is not available until 5.6.)
-
CREATE TABLE
TableName
(id
varchar(255) DEFAULT NULL,timeC
bigint(20) DEFAULT NULL,value
decimal(42,2) DEFAULT NULL, KEYTableName
(id
) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ID represents a cell phone number but it's modeled as varchar(255)GID– GID2018年11月01日 08:11:38 +00:00Commented Nov 1, 2018 at 8:11 -
I have innodb buffer size = 1GB. I'm reluctant to add the (large) triple column index because I don't want to delay the numerous inserts on it for the sake of speeding up a single query that runs occasionally.GID– GID2018年11月01日 08:22:03 +00:00Commented Nov 1, 2018 at 8:22
-
For testing purposes and to satisfy my curiosity I added the triple column index (on timeC, id and value, in that order) and confirmed that it's used in the query using "explain". Now the query takes 277sec, so the gain in negligible.GID– GID2018年11月01日 09:57:01 +00:00Commented Nov 1, 2018 at 9:57
-
@GID - Let's tackle the table; it may be twice as bulky as needed. See additions to my AnswerRick James– Rick James2018年11月01日 18:26:14 +00:00Commented Nov 1, 2018 at 18:26
-
Thanks a lot for your time. Yes we need the milliseconds and use Java. The
value
column is alreadydecimal(42,2)
. There are no NULLs in those three columns, and the cellphone number is, of course, never longer than 16 characters, but the DB has been designed years ago in my absence by Java developers and cannot really change structure, I'm afraid, as it is in production. How much impact does the 256-size of the ID have, though? There is one more columnvarchar(2048)
that also has excessive length.GID– GID2018年11月02日 10:15:21 +00:00Commented Nov 2, 2018 at 10:15
id
... too long.show indexes
tells me that theTableName_CC_IDX
index is a B-tree index onid
.show indexes
tells me that It tells for you ONLY. Nobody else see it.