MySQL - Large Table best practices/indexes

Question 1

I have a table which currently contains around 33 millions rows of data, and is constantly recording more rows second by second. The query to pull data out of this table is gradually getting slower and slower, to the point where it is practically unusable.

My table schema is like so:

enter image description here

and the table has the following indexes:

enter image description here

The usual query from this table consists of selecting rows where the shift is equal to X, and the timestamp is between X and Y.

Can anyone provide/suggest a better table set up, or how I can improve upon the indexes to make this table more efficient?

Right now I'm trying to go through and re-index it, as I believe the indexes have become fragmented, and MySQL keeps loosing connection just trying to reindex the table. So I'm not sure how I can even accomplish this. My thought is to re-create the table structure and then slowly migrate over the data using a more efficient index set.

Question 2

Please don't use images where text will do - as it would here! You could copy and past SHOW CREATE TABLE historical_data\G - and format using Code Block above the edit box where you wrote your question!

Question 3

@Vérace - i am sorry I will do so from now on, thank you.

Question 4

Don't worry too much - and welcome to the forum! :-)

Question 5

Please provide SHOW CREATE TABLE and the query that is getting slower and slower

Question 6

As a direct answer, what your query needs is a composite index on (shift, timestamp): ALTER TABLE historical_data ADD INDEX shift_ts(shift, timestamp)

However, you have a lot of indexes. and some are redundant.

tag_id: has low cardinality. Drop it if you don't have "WHERE tag_id=x" queries
shift, timestamp: combine them in 1 index (This is useful for your query)
hd_multicolumn: drop
value: if you don't query it, drop it
hd_multicolumn2: drop it
[IMPORTANT] If your table is an InnoDB table, add an auto increment field, and make it a primary key

Question 7

Thank you! Question, how would I go about combining the shift timestamp into 1 index? I can drop the others that are not needed.

Question 8

yes, you can drop all what you don't need. To create the required index, I provided the statement already: ALTER TABLE historical_data ADD INDEX shift_ts(shift, timestamp)

Question 9

why is the auto-increment field so important?

Question 10

"shift is equal to X, and the timestamp is between X and Y" -- If this is the main query, then I recommend this:

id INT UNSIGNED AUTO_INCREMENT NOT NULL -- make it `BIGINT` if expect > 4 billion
PRIMARY KEY(shift, timestamp, id) -- In this order
INDEX(id), -- for AUTO_INCREMENT
INDEX(shift) -- DROP; it is now redundant with new PK
INDEX(tag_id) -- get rid of this; it is redundant with the next two
INDEX(tag_id, shift, timestamp) -- what query is this for?
INDEX(tag_id, timestamp) -- what query is this for?

This

Satisfies the need for a PK on InnoDB tables,
Provides the optimal index (shift, timestamp) for the query,
Puts that index as the PK, for better efficiency,
Makes it unique by adding id,
INDEX(id) keeps AUTO_INCREMENT happy.

If you have other important queries, we must see them, else this improvement may hurt them too much.

What kind of value is shift? If it is "small" numbers, save space by using TINYINT UNSIGNED or something else smaller than a 4-byte INT.

If there are other columns that can be shrunk, let's do them at the same time. And fix anything else that might be needed. Please provide SHOW CREATE TABLE so I can provide suggestions. utf8 handles most of the world, but not all of Chinese, nor most Emoji. Perhaps you need to shift to utf8mb4 now, too.

Changing this will take a significant amount of downtime. And without a unique key, you won't be able to use pt-online-schema-change. So, plan for an outage. However, if you must continue to receive new data, consider the following:

CREATE TABLE h_new LIKE historical_data; -- copy schema
ALTER TABLE h_new ... -- to get PK, better datatypes, indexes, etc.
RENAME TABLE historical_data TO h_old,
 h_new TO historical_data; -- atomically swap
then...

Copy h_old data into historical_data, chunk by chunk: See chunking for an outline of how to do it. Note: You should use timestamp for walking through the table, and not worry about it not being unique or PK. (And change from DELETE to INSERT ... SELECT....)

When finished with all the chunks, DROP TABLE h_old.

Question 11

I do use the tag_id in my queries. I'm sorry I overlooked it. The query joins the tags table on tags.tag_id = historical_data.tag_id and shift = X and timestamp between X and Y

Question 12

Since InnoDB tables are clustered based on the primary key, it is not recommended to have (shift, timestamp) as a primary key. It is better to have the auto incremented ID as a PK, and add an index on (shift, timestamp). In addition, uniqueness doesn't seems to be guaranteed in this table for the combination of (shift, timestamp).

Question 13

@Phil - That's OK. I was focusing on handling the WHERE as being the optimal strategy. So, I still recommend with PK(shift, ts, ...)

Question 14

@JehadKeriaki - Notice that I added id on the end of the PK in order to guarantee uniqueness. (True, one could mess with id. But normal usage of it as AUTO_INCREMENT keeps int unique.)

Question 15

@JehadKeriaki - And... You get extra performance by having the WHERE handled by the PK instead of a secondary key. Secondary would involve an extra lookup to get the rest of the row.

Jehad Keriaki Jehad Keriaki 3,1271 gold badge16 silver badges17 bronze badges · Accepted Answer · 2017-10-16 19:21:10Z

As a direct answer, what your query needs is a composite index on (shift, timestamp): ALTER TABLE historical_data ADD INDEX shift_ts(shift, timestamp)

However, you have a lot of indexes. and some are redundant.

tag_id: has low cardinality. Drop it if you don't have "WHERE tag_id=x" queries
shift, timestamp: combine them in 1 index (This is useful for your query)
hd_multicolumn: drop
value: if you don't query it, drop it
hd_multicolumn2: drop it
[IMPORTANT] If your table is an InnoDB table, add an auto increment field, and make it a primary key

Thank you! Question, how would I go about combining the shift timestamp into 1 index? I can drop the others that are not needed.
yes, you can drop all what you don't need. To create the required index, I provided the statement already: ALTER TABLE historical_data ADD INDEX shift_ts(shift, timestamp)

Stack Exchange Network

MySQL - Large Table best practices/indexes

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MySQL - Large Table best practices/indexes

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions