DB: Amazon RDS MySQL (OS: Linux, 2 vCPU, Memory: 8GB)
I have a table with almost 14M rows of data.
CREATE TABLE `meterreadings` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`meterid` varchar(16) DEFAULT NULL,
`metervalue` int(11) DEFAULT NULL,
`date_time` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `meterid` (`meterid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
As you can see, I use an index on meterid.
Another table which stores device IDs (around 100 rows of data)
CREATE TABLE `devices` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`meterid` varchar(16) DEFAULT NULL,
`location` varchar(8) DEFAULT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `meterid_UNIQUE` (`meterid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
To get 15 minute aggregated data, I use the below query
SELECT AVG(metervalue) as value
, DATE_FORMAT(date_time, "%d %b %Y %H:%i") as label
FROM meterreadings
WHERE meterid IN (SELECT meterid from devices)
AND date_time BETWEEN '2018-07-23' AND '2018-07-24'
GROUP BY DATE(date_time), HOUR(date_time), MINUTE(date_time) DIV 15
ORDER BY date_time ASC;
Query performance is very bad - It takes approximately around 12 seconds to execute, and causes a temporary spike in DB server usage as well.
EXPLAIN on this query returned this:
1 SIMPLE devices index meterid_UNIQUE meterid_UNIQUE 19 125
Using where; Using index; Using temporary; Using filesort
1 SIMPLE meterreadings ref meterid meterid 19 devices.meterid 322
Using where
I dropped the index on meterreadings and surprisingly the query performance is better - almost about 6 seconds now. I am still wondering why?
EXPLAIN on the query after dropping the index
1 SIMPLE meterreadings ALL 14580167 Using where;
Using temporary; Using filesort
1 SIMPLE devices ref meterid_UNIQUE meterid_UNIQUE 19
meterreadings.meterid 1 Using index
I am currently doing my query operation on the table without index - Is there a way I can optimize the table / query to do the operation faster (like a composite index on two columns?)
[The table is growing approximately by around 40 rows per second]
3 Answers 3
You should play with it a bit, because it might not be clear beforehand what solution will produce the best results.
A few points to consider
It is very likely that an index on the date column will provide you with better selectivity, as it has a higher selectivity.
Composite indexes are usually a good idea, but please make sure to chose the order correctly
date_time, meterid
vs.meterid, date_time
. In most cases it makes more sense to leave columns with dense values (i.e dates, floats) to the end, as any column in the index following them is unlikely to have any effect. ( trymeterid, date_time
for an index.)Subselects might force the optimizer to use a specific plan. Try converting it into a join if possible.
Why have WHERE meterid IN (SELECT meterid from devices)
? Aren't all the devices represented in both tables?
PRIMARY KEY (`Id`),
UNIQUE KEY `meterid_UNIQUE` (`meterid`)
Get rid of id
and change to
PRIMARY KEY(meterid)
To be clean you should use 15-minute intervals everywhere, not
DATE_FORMAT(date_time, "%d %b %Y %H:%i") as label
Instead consider
FLOOR(UNIX_TIMESTAMP(date_time) / (15*60))
Which can be converted back via
FROM_UNIXTIME(... * (15*60))
And formatted. For example:
mysql> SELECT NOW(), DATE_FORMAT(
FROM_UNIXTIME(
FLOOR(UNIX_TIMESTAMP(now()) / (15*60))
*(15*60)
), "%d %b %Y %H:%i") as label;
+---------------------+-------------------+
| NOW() | label |
+---------------------+-------------------+
| 2018年08月21日 13:43:42 | 21 Aug 2018 13:30 |
+---------------------+-------------------+
SELECT AVG(metervalue) as value,
DATE_FORMAT(
FROM_UNIXTIME(
FLOOR(UNIX_TIMESTAMP(date_time) / (15*60))
*(15*60)
), "%d %b %Y %H:%i") as label
FROM meterreadings
WHERE date_time >= '2018-07-23'
AND date_time < '2018-07-23' + INTERVAL 1 DAY -- bug fix
GROUP BY meterid, -- Don't you want this, too?
label
ORDER BY meterid,
label;
I fixed the case where you were including two midnights in one day.
More more efficiency, change
PRIMARY KEY (`Id`),
KEY `meterid` (`meterid`)
to this if your query is the main one
PRIMARY KEY (date_time, Id), -- to make it a range scan
INDEX(id) -- to keep AUTO_INCREMENT happy
If you can be sure that there are never two readings for a meter in the same second, get rid of id
and have
PRIMARY KEY(date_time, meterid)
(Again, this may not be optimal, depending on what other queries you have.)
All of that will help some. If you want another 10x speedup, build and maintain Summary tables.
As @akuzminsky indicated a generated column can help. The obvious way of doing this with UNIX_TIMESTAMP(date_time) DIV 15*60
however UNIX_TIMESTAMP
is one of the functions not allowed in a generated column. So despite the ugliness of expression below, it does result in interval15 being the rounded timestamp to 15 minutes.
ALTER TABLE meterreadings ADD interval15 timestamp AS (
SUBTIME(date_time,
CONCAT("0:", MINUTE(date_time) MOD 15, ":", SECOND(date_time),".", MICROSECOND(date_time)))),
ADD INDEX interval15 (interval15);
The new query is:
SELECT AVG(metervalue) as value
, DATE_FORMAT(interval15, "%d %b %Y %H:%i") as label
FROM meterreadings
WHERE interval15 >= '2018-07-23'
AND interval15 < '2018-07-23' + INTERVAL 1 DAY
GROUP BY interval15
ORDER BY interval15 ASC;
If you where optimizing this query only you could append the metervalue to the interval15 index and then the result would be from the index alone.
-
using
TO_SECONDS
in the generated expression might have been possible. Didn't test it.danblack– danblack2018年08月22日 03:14:36 +00:00Commented Aug 22, 2018 at 3:14
Explore related questions
See similar questions with these tags.
(date_time, meterid)
. Or ever covering index(date_time, meterid, metervalue)
.