I'm using mysql 5.5
Here is my table:
CREATE TABLE `temperature_information` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`device` int(11) NOT NULL,
`temperature` int(11) NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `device` (`device`),
KEY `date` (`date`),
KEY `idx` (`device`, `date`)
) ENGINE=InnoDB AUTO_INCREMENT=25602738 DEFAULT CHARSET=latin1
This table has ~50m. records.
Here is my query:
SELECT date as ValueDate, MAX(temperature)
FROM (
SELECT date, temperature FROM temperature_information WHERE device = 1111
ORDER BY temperature DESC) c
GROUP BY DATE(ValueDate),HOUR(ValueDate)
This query returns Maximum temperature for each day. It's execution time is ~0.9 s. and subquery's time 0.003 sec.
I have separate indexes on the date
and device
columns and a multi-column index idx
on device,date
. The EXPLAIN
says it uses device
index for query, which is good. The sub query is very fast.
But to get MAX
temperature for each day I need to use GROUP BY
. I know that functions on indexed column disables the use of indexes, but I don't know a workaround to make it efficient and produce same results.
Here is EXPLAIN
:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE temperature_information ref device,idx device 4 const 29330 Using where; Using temporary; Using filesort
My question:
Is it possible to write a query which would be more efficient and would produce same results or should I process rows returned by sub query and find MAX
temperature for each day myself (This would be written in c)?
Sub query returns 20-40k rows on average.
PS. I know you can remove sub query, but I left it like this to clarify question
Creating index on those (3) columns improved performance ~20 times. But I have another concern regarding indexes. I provided, not full table. table have another columns like signal
, etc. I'm planning to do same queries to find MAX
for each day on these columns too. Does adding too many indexes make SELECT
slower for other queries. I know that inserts and updates will be slower.
1 Answer 1
- The subquery is not needed at all.
- The
ORDER BY
inside a subquery like this (without aLIMIT
) makes no sense. - The external query has
SELECT date
butGROUP BY DATE(date). HOUR(date)
. This, while allowed in MySQL older versions (i.e. before 5.7), is not valid SQL.
I suggest you rewrite:
SELECT
DATE(date) AS value_date,
HOUR(date) AS value_hour,
MAX(temperature) AS max_temperature
FROM temperature_information
WHERE device = 1111
GROUP BY DATE(date), HOUR(date) ;
Regarding performance:
- an index on
(device, date, temperature)
will make the query more efficient than the current indexes on(device)
alone and on(device, date)
. If you add this 3-column index, you could drop the other two ("device"
and"idx"
) indexes. - Another option would be to store the
date
andhour
part in separate columns and add a 4-column index on(device, date_part, hour_part, temperature)
. - If you move to version 5.7, you coul dhave the date and hour parts as generated columns. See MySQL docs: Ganerated Columns.
- After the comment/edit that there are many more data columns and since it looks like you'll be running analytic type of queries, another option would be to change the
PRIMARY KEY
to(device, date)
- or(device, date_part, hour_part, min_sec_part)
. This effectively - for an InnoDB table - clusters the data in the way you want them for this query. Of course you should test first this alternative design and how it will affect other queries, too.
-
Thank you for your answer. Creating index on those columns improved performence ~20 times. But i have another concern regarding indexes. I provided, not full table. table have another columns like signal etc. I'm planning to do same queries to find max for each day on these columns too. Does adding too many indexes will make Select slower for other queries. I know that inserts and updates will be slower.Vygintas B– Vygintas B2017年11月07日 07:54:43 +00:00Commented Nov 7, 2017 at 7:54
-
Extra indexes consume RAM (InnoDB pools). Sure each insert/update/delete need the indexes to be updated so some slowdown take place. But if you have to choose between 20x (or 1000x) burst and 0.5% slowdown I'm sure what you choose.Kondybas– Kondybas2017年11月07日 10:23:52 +00:00Commented Nov 7, 2017 at 10:23
Explore related questions
See similar questions with these tags.
KEY idx
- why not call itdevice_date_idx
? Much more meaningful and better for debugging!