1

I have a large table with a timestamp column and it holds a very large amount of data (a few million rows).

Whenever running a filtered query by range on the timestamp column for this table, depending on the size of the range, it takes a considerable amount of time (2+ seconds). This column has an index created for it which would lead me to believe that the query return should be faster.

I noticed that by destroying the index and re-adding it, the query performs easily 40% better.

Bellow is a simple example of how the table is structured:

create table foo (
 id int(11) not null auto_increment,
 fk_id int(11) not null,
 datecolumn timestamp,
 primary key (id, fk_id),
 key idx_timestamp (datecolumn)
 constraint fk_fk_id foreign key (fk_id) references bla (id)
 ) engine = InnoDB;

Any reason why the re-creation of the index would make the query run faster and how can I make this as part of a maintenance ? I understand that MySQL InnoDB default index is BTree and it can get messy but it should have a way to optimize or re-organize these so the query performance is not so bad.

Note: I have ran

optimize table foo;
flush tables;

And these didn't seem to solve anything related to the index.

When you use explain it shows some indexes being used but mainly, it shows the type ranged and under this type that there is most of number of rows analyzed to return just a few of them.

fk_id is part of the where clause, needs to be, it is also part of a composite key which means that it is deterministic to return the results.

The query is basically this:

select id, fk_id, datecolumn, bar.id, bar.name, bar.type
 from foo
 join bar on bar.id = foo.fk_id
 where foo.datecolumn between '2016-01-01 05:00:00' and '2017-01-01 04:59:59'
 and foo.fk_id in (1, 2, 4)

Even if the query was a simple select where datecolumn between range I get the same result.

Paul White
95.4k30 gold badges440 silver badges689 bronze badges
asked Nov 2, 2016 at 21:20
2
  • 1
    When you use explain does it show that the index is being used? There is a good chance that it isn't because the index probably doesn't cover your query. use-the-index-luke.com/sql/clustering/… Additional info dev.mysql.com/doc/refman/5.7/en/… Commented Nov 2, 2016 at 22:58
  • I have a similar case, index performance decreases by factor 50 within 30 hours. Query cost goes from 1.1 to 1 million in that time. I tried HASH and BTREE, no difference except that hash takes longer to kill the server. Mysql is deeply flawed in terms of indexation when you have a write-heavy environment Commented Jan 26, 2019 at 18:01

2 Answers 2

4

BTree blocks inherently go from 100% full (actually about 15/16) to about 69% full after a lot of random INSERTs. This may account for the 40% you are seeing. But note, it won't get any slower over time. Hence, I see frequent OPTIMIZE TABLE as a waste of time, especially if you need the table to be accessible all the time.

And, your 40% faster will rapidly degrade.

I would expect FLUSH TABLES to actually hurt -- since it will remove cached info from RAM.

In order to discuss the query, please show us the query, its EXPLAIN SELECT ..., and perhaps SHOW CREATE TABLE bla.

foo is essentially two tables, one ordered (id, fk_id, datecolumn), the other ordered (datecolumn, id, fk_id). Plus, I think, a secondary index of (fk_id, id) (no datecolumn).

If fk_id is part of the WHERE clause, none of them may be optimal.

Addenda

(From OP's comment:)

select id, fk_id, datecolumn, bar.id, bar.name, bar.type
 from foo
 join bar ON bar.id = foo.fk_id
 where foo.datecolumn between '2016-01-01 05:00:00'
 AND '2017-01-01 04:59:59'
 and foo.fk_id in (1, 2, 4)

The optimal indexes are

foo: INDEX(fk_id, datecolumn) -- 3 disjoint ranges
bar: PRIMARY KEY(id) -- I assume it is already the PK; if not, then INDEX(id)

Note that you do not currently have the optimal index for foo. Add it, and your need for OPTIMIZE TABLE should diminish.

As for why the optimize seemed to help -- probably caching. How much RAM do you have? What is the value of innodb_buffer_pool_size?

I prefer this pattern:

 where foo.datecolumn between '2016-01-01 05:00:00'
 AND '2016-01-01 05:00:00' + INTERVAL 1 YEAR

(No hassles with leap years, datatypes, etc)

More on indexing.

answered Nov 3, 2016 at 0:13
11
  • Stumpbled over this one while researching on my issue, take a look here: dba.stackexchange.com/questions/228115/… Index performance reduces by 50 times within less than 2 days. Query cost increases by 750,000 times within that time. Recreating the index solves it (cost 1.1). It happens with HASH and BTREE indexes. Recreating the index is the ONLY way (except for optimize) that fixes the issue for another day. Commented Jan 26, 2019 at 18:11
  • @John - I suspect there is something about indexing virtual columns that led to that problem. I have been studying MySQL indexes for nearly 2 decades; I think I know more of the quirks. Commented Jan 26, 2019 at 18:18
  • I thought so too in the beginning but I have invested about 20 hours into this single problem by now. I have recreated the table countless times and created all variants of fields. I tried VIRTUAL, STORED and normal TINYINT() with same results. I tried BTREE and HASH indexes and can reproduce the issue at any moment. I don't know what the issue is but it's not related to virtual/generation anymore. Currently I use a trigger to write the value into TINYINT(1) and I replaced it so it only holds NULLs and a few 1's. I know you are experienced but the issue is infront of my eyes:( Commented Jan 26, 2019 at 18:28
  • InnoDB has no HASH indexes; that keyword is silently ignored. Unfortunately, I do not have enough experience specifically with Virtual/Stored/Generated + Indexed to spot the issue. One thing to note is that low-cardinality indexes (eg flags or TINYINT with few distinct values) are rarely used. You seem to be depending on using such? Commented Jan 26, 2019 at 19:44
  • I am not using a virtual/stored/generated column anymore. It's a normal tinyint(1) column. Thanks for the info on Hash. That explains why it didn't solve the problem. I am using it because it should be the fastest way to identify if a column is ready to be enqueued. Switches from NULL to 1, once finished from 1 to NULL Commented Jan 26, 2019 at 20:08
0

To me, the table design looks broken.

The PK is (id, fk_id), which allows multiple rows per 'id'. Is that really wanted?

The timestamp is a fully dynamic one (DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP), is that needed?

The name 'datecolumn' suggests it is a 'date', so why not use the 'date' type for it?

answered Feb 24, 2017 at 10:47

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.