I am trying to select a range of data for multiple devices (unique serial numbers) from a historical table and was wondering why there is such a big difference in time for the following queries:
Basically I am trying to use the IN clause to indicate what items I want to fetch data for. If I "hard code" the items in the IN clause, the query is fast, if I use a subquery or join to select the items the performance is poor.
This query completes in 0.15s and returns 7382 rows.
SELECT `readings`.* FROM `readings`
WHERE
(SerialNumber IN ('091146000121', *snip 25*, '091146000556'))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC
The same query rewritten using a subquery to get the serial numbers takes over 30 seconds, and seems to spend most of its time in the Preparing state. It returns the same data as the first query.
SELECT `readings`.* FROM `readings`
WHERE
(SerialNumber IN (SELECT `boards`.`id` AS `SerialNumber` FROM `boards` WHERE (siteId = '1')))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC
The subquery returns the same values that are in the first query, but as stated, this takes a lot longer to run. Are they not functionally equivalent?
Here is the explain for both queries:
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------ +------+-----------------------------+
| 1 | SIMPLE | readings | range | PRIMARY,time | PRIMARY | 22 | NULL | 7339 | Using where; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
| 1 | PRIMARY | readings | range | time | time | 4 | NULL | 6353234 | Using where |
| 2 | DEPENDENT SUBQUERY | boards | unique_subquery | PRIMARY,siteId | PRIMARY | 18 | func | 1 | Using where |
+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
For some reason the query with the subselect isn't using the primary key. I tried using USE INDEX, but that actually made it take a lot longer.
The readings table has PRIMARY KEY SerialNumber, time with an index on time.
The boards table has PRIMARY KEY id (SerialNumber) and index on siteId.
The MySQL version I'm using is 5.5.8-log MySQL Community Server (GPL)
I'm just wondering why the performance of both queries isn't very similar. Thanks.
Update: Here are the create table statements:
mysql> SHOW CREATE TABLE readings\G
*************************** 1. row ***************************
Table: readings
Create Table: CREATE TABLE `readings` (
`time` int(11) NOT NULL,
`boxsn` varchar(16) NOT NULL,
`rev` varchar(16) NOT NULL,
`schema` tinyint(3) unsigned NOT NULL,
`interval` smallint(5) unsigned NOT NULL,
`relay` tinyint(4) NOT NULL,
`inputV` decimal(10,6) NOT NULL,
`inputA` decimal(10,6) NOT NULL,
`outputV` decimal(10,6) NOT NULL,
`outputA` decimal(10,6) NOT NULL,
`phase` tinyint(4) NOT NULL,
`outputVA` decimal(10,6) NOT NULL,
`watts` decimal(10,6) NOT NULL DEFAULT '0.000000',
`var` decimal(10,6) NOT NULL,
`kiloVAHours` decimal(9,9) DEFAULT '0.000000000',
`kilowattHours` decimal(9,9) NOT NULL,
`kilovarHours` decimal(9,9) NOT NULL,
PRIMARY KEY (`boxsn`,`time`),
KEY `time` (`time`),
KEY `boxsn_time_ndx` (`boxsn`,`time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE boards\G
*************************** 1. row ***************************
Table: boards
Create Table: CREATE TABLE `boards` (
`id` varchar(16) NOT NULL,
`siteId` int(11) NOT NULL,
`groupId` int(11) DEFAULT '0',
`lastReport` int(11) DEFAULT NULL,
`lastIp` varchar(15) DEFAULT '0.0.0.0',
`label` varchar(24) DEFAULT '',
PRIMARY KEY (`id`),
KEY `siteId` (`siteId`),
KEY `siteId_id_ndx` (`siteId`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
1 Answer 1
Refactor the query as follows:
SELECT
readings.*
FROM
(
SELECT boxsn FROM readings
WHERE (time >= 1325404800)
AND (time < 1326317400)
ORDER BY `time` ASC
) readings_keys
LEFT JOIN
(
SELECT id AS boxsn FROM boards WHERE siteId = '1'
) boards
USING (boxsn)
LEFT JOIN readings
USING (boxsn)
;
Make sure you have the following indexes:
ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);
You can drop the other index
ALTER TABLE readings DROP INDEX boxsn_time_ndx;
You should definitely see a dramatic improvement in performance as the tables grow.
In your case,
- The first EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in
readings
against a list of value in memory - The second EXPLAIN plan says you have to perform a lookup of SerialNumber for each row in
readings
against a table.
UPDATE 2012年01月12日 14:03 EDT
I refactored it again to make sure the readings
keys and boards
keys are combined correctly before retrieving the data from the readings
table:
SELECT
readings.*
FROM
(
SELECT A.* FROM
(
SELECT boxsn FROM readings
WHERE (time >= 1325404800)
AND (time < 1326317400)
ORDER BY `time` ASC
) A
LEFT JOIN
(
SELECT id AS boxsn
FROM boards
WHERE siteId = '1'
) B
USING (boxsn)
WHERE B.boxsn IS NOT NULL
) readings_keys
LEFT JOIN readings
USING (boxsn)
;
-
Thanks for the answer, I get a syntax error when I try to run that query, I added a comma after
) readings
and now the syntax error isnear USING(SerialNumber)
and I'm not sure what I need to change at this point, sorry.drew010– drew0102012年01月12日 00:53:12 +00:00Commented Jan 12, 2012 at 0:53 -
Sorry, I forgot the INNER JOIN between the subqueries. I updated it now !!!RolandoMySQLDBA– RolandoMySQLDBA2012年01月12日 01:40:08 +00:00Commented Jan 12, 2012 at 1:40
-
Hey Rolando, I created those indexes and tried the query but it still runs pretty slow and seems to process a lot of unnecessary records. Here are some details on pastebin that show the explain and the indexes. Also I just noticed the index SerialNumber_time_ndx is the same as my primary key so I probably shouldn't have added it. Any help is appreciated.drew010– drew0102012年01月12日 17:53:08 +00:00Commented Jan 12, 2012 at 17:53
-
Please do
SHOW CREATE TABLE readings\G
andSHOW CREATE TABLE boards\G
and display these in your question on setup another pastebin with it.RolandoMySQLDBA– RolandoMySQLDBA2012年01月12日 18:03:48 +00:00Commented Jan 12, 2012 at 18:03 -
I updated the question with the table structures.drew010– drew0102012年01月12日 18:09:17 +00:00Commented Jan 12, 2012 at 18:09