I have a table which currently contains around 33 millions rows of data, and is constantly recording more rows second by second. The query to pull data out of this table is gradually getting slower and slower, to the point where it is practically unusable.
My table schema is like so:
and the table has the following indexes:
The usual query from this table consists of selecting rows where the shift is equal to X, and the timestamp is between X and Y.
Can anyone provide/suggest a better table set up, or how I can improve upon the indexes to make this table more efficient?
Right now I'm trying to go through and re-index it, as I believe the indexes have become fragmented, and MySQL keeps loosing connection just trying to reindex the table. So I'm not sure how I can even accomplish this. My thought is to re-create the table structure and then slowly migrate over the data using a more efficient index set.
2 Answers 2
As a direct answer, what your query needs is a composite index on (shift, timestamp): ALTER TABLE historical_data ADD INDEX shift_ts(shift, timestamp)
However, you have a lot of indexes. and some are redundant.
tag_id
: has low cardinality. Drop it if you don't have "WHERE tag_id=x
" queriesshift, timestamp
: combine them in 1 index (This is useful for your query)hd_multicolumn
: dropvalue
: if you don't query it, drop ithd_multicolumn2
: drop it- [IMPORTANT] If your table is an InnoDB table, add an auto increment field, and make it a primary key
-
Thank you! Question, how would I go about combining the shift timestamp into 1 index? I can drop the others that are not needed.Phil– Phil2017年10月16日 19:28:32 +00:00Commented Oct 16, 2017 at 19:28
-
yes, you can drop all what you don't need. To create the required index, I provided the statement already:
ALTER TABLE historical_data ADD INDEX shift_ts(shift, timestamp)
Jehad Keriaki– Jehad Keriaki2017年10月16日 19:35:04 +00:00Commented Oct 16, 2017 at 19:35 -
1why is the auto-increment field so important?mcmillab– mcmillab2018年12月12日 20:55:24 +00:00Commented Dec 12, 2018 at 20:55
"shift is equal to X, and the timestamp is between X and Y" -- If this is the main query, then I recommend this:
id INT UNSIGNED AUTO_INCREMENT NOT NULL -- make it `BIGINT` if expect > 4 billion
PRIMARY KEY(shift, timestamp, id) -- In this order
INDEX(id), -- for AUTO_INCREMENT
INDEX(shift) -- DROP; it is now redundant with new PK
INDEX(tag_id) -- get rid of this; it is redundant with the next two
INDEX(tag_id, shift, timestamp) -- what query is this for?
INDEX(tag_id, timestamp) -- what query is this for?
This
- Satisfies the need for a PK on InnoDB tables,
- Provides the optimal index
(shift, timestamp)
for the query, - Puts that index as the PK, for better efficiency,
- Makes it unique by adding
id
, INDEX(id)
keepsAUTO_INCREMENT
happy.
If you have other important queries, we must see them, else this improvement may hurt them too much.
What kind of value is shift
? If it is "small" numbers, save space by using TINYINT UNSIGNED
or something else smaller than a 4-byte INT
.
If there are other columns that can be shrunk, let's do them at the same time. And fix anything else that might be needed. Please provide SHOW CREATE TABLE
so I can provide suggestions. utf8
handles most of the world, but not all of Chinese, nor most Emoji. Perhaps you need to shift to utf8mb4
now, too.
Changing this will take a significant amount of downtime. And without a unique key, you won't be able to use pt-online-schema-change. So, plan for an outage. However, if you must continue to receive new data, consider the following:
CREATE TABLE h_new LIKE historical_data; -- copy schema
ALTER TABLE h_new ... -- to get PK, better datatypes, indexes, etc.
RENAME TABLE historical_data TO h_old,
h_new TO historical_data; -- atomically swap
then...
Copy h_old
data into historical_data
, chunk by chunk: See chunking for an outline of how to do it. Note: You should use timestamp
for walking through the table, and not worry about it not being unique or PK. (And change from DELETE
to INSERT ... SELECT...
.)
When finished with all the chunks, DROP TABLE h_old
.
-
I do use the tag_id in my queries. I'm sorry I overlooked it. The query joins the tags table on tags.tag_id = historical_data.tag_id and shift = X and timestamp between X and YPhil– Phil2017年10月17日 12:19:56 +00:00Commented Oct 17, 2017 at 12:19
-
Since InnoDB tables are clustered based on the primary key, it is not recommended to have (shift, timestamp) as a primary key. It is better to have the auto incremented ID as a PK, and add an index on (shift, timestamp). In addition, uniqueness doesn't seems to be guaranteed in this table for the combination of (shift, timestamp).Jehad Keriaki– Jehad Keriaki2017年10月17日 15:24:55 +00:00Commented Oct 17, 2017 at 15:24
-
@Phil - That's OK. I was focusing on handling the
WHERE
as being the optimal strategy. So, I still recommend withPK(shift, ts, ...)
Rick James– Rick James2017年10月17日 21:38:37 +00:00Commented Oct 17, 2017 at 21:38 -
@JehadKeriaki - Notice that I added
id
on the end of the PK in order to guarantee uniqueness. (True, one could mess withid
. But normal usage of it asAUTO_INCREMENT
keeps int unique.)Rick James– Rick James2017年10月17日 21:40:58 +00:00Commented Oct 17, 2017 at 21:40 -
@JehadKeriaki - And... You get extra performance by having the
WHERE
handled by the PK instead of a secondary key. Secondary would involve an extra lookup to get the rest of the row.Rick James– Rick James2017年10月17日 21:42:04 +00:00Commented Oct 17, 2017 at 21:42
SHOW CREATE TABLE historical_data\G
- and format usingCode Block
above the edit box where you wrote your question!SHOW CREATE TABLE
and the query that is getting slower and slower