2
SELECT id_num
 , sum(expired) as expired
 , max(`date`) as max_date
FROM accounts
where `date`<=20170505
group by id_num;

The accounts table has a compound index on id_num,date and has about 100mil rows. This query seems pretty basic but it takes forever and I'm not sure how to break it up to speed things up. I thought about first creating a helper table for DISTINCT id_num (~3mil rows) but then I'm not sure how to get the sum(expired) and max(date) columns without joining the helper against the accounts table and doing the same thing as the original query.

CREATE TABLE `accounts` (
 `id_num` int(11) NOT NULL,
 `date` date NOT NULL,
 `time` datetime NOT NULL,
 `price` decimal(10,4) NOT NULL,
 `cost` decimal(10,4) NOT NULL,
 `time_slices` int(11) NOT NULL,
 `sub_expired` tinyint(1) NOT NULL,
 PRIMARY KEY (`id_num`,`date`),
 KEY `date` (`date`),
 CONSTRAINT `accounts_ibfk_1` FOREIGN KEY (`id_num`) REFERENCES `cust` (`id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1

The date range goes back roughly 18 months and the data is pretty evenly distributed over all dates. The rows are being inserted chronologically and there are rarely updates/deletes.

asked Jul 26, 2017 at 5:47
3
  • You have multiple rows for a given id_num, but never two on the same day? (I ask because the PK is rather unusual.) Commented Jul 27, 2017 at 22:26
  • Is time the full date+time? And date is just the date part of time? Commented Jul 27, 2017 at 22:26
  • Yes, multiple rows for a given id_num, but only one per day. Also, time is full date+time and date is just date part of time as you mentioned. Commented Jul 31, 2017 at 16:07

1 Answer 1

1
INDEX(date, -- to satisfy the WHERE; must be first
 id_num, expired) -- to complete "covering"; must be last (either order)

Because it is "covering", it will run entirely in the index's BTree.
By putting date first, it looks through the minimal number of rows.

This is a rather unusual query, so don't expect the principles to carry over to other queries.

Even when you are somewhat happy with that, I may tell you how to make it 10 times as fast. But first, I need to see SHOW CREATE TABLE and provide some clues of the date range and distribution of dates in the table. Also, are the rows being inserted somewhat chronologically? And are there any UPDATEs or DELETEs?

Summary Table?

Perhaps the table could be summarized into a Summary Table that has one row per month per id_num. Then the original query would do most of the work 30x faster from that table, plus still do some work from the huge table. The summary table would be incrementally augmented.

answered Jul 26, 2017 at 5:56
2
  • Ok, I'll have to ask my admin to add that index. Hopefully the query is significantly faster, but I am also curious to know how else it can be improved. I've edited my question with the information you requested. Commented Jul 27, 2017 at 5:45
  • If you did not have the occasional update/delete, it would be fairly easy to build and maintain a Summary Table and get a big speed improvement. It should be possible to use a Trigger to help with the updates/deletes. Commented Aug 12, 2017 at 19:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.