I'm trying to show a table, for each ID, to see if there is data available. In this case, its measurements. If there's no data, that means the logger isn't doing its job properly.
I currently have two tables: data
and times
. data
contains id
, datetime
, sensor_id
, and value
. times
contains id
and value
.
Times is filled with 00:00
, 00:01
, 00:02
, etc., all the way through to 23:59
.
I have this query:
SELECT t.`value`, d.`sensor_id`, COUNT(t.`value`) as `numrows`
FROM `data` d
RIGHT JOIN `times` t
ON d.`datetime` LIKE CONCAT('% ', t.`value`, '%')
WHERE d.`datetime` LIKE '%2014ドル-11-05%'
AND d.`sensor_id` IN(1,2,3,4,5,999)
GROUP BY d.`sensor_id`, t.`value`
ORDER BY d.`sensor_id` ASC, t.`id` ASC
while($s = $select->fetch_assoc()) {
$checkArray[$s['sensor_id']][$s['value']] = $s['numrows'];
}
foreach($checkArray as $key => $arr) {
echo 'Sensor: ' . $key;
for($i = 0; $i <= 23; $i++) {
for($j = 0; $j <= 59; $j++) {
$time = strlen($i) == 1 ? '0' . $i : '' . $i;
$time .= ':';
$time .= strlen($j) == 1 ? '0' .$j : '' . $j;
if(isset($arr[$time]) && $arr[$time] >= 1) { //See if has at least one row
echo 'YES DATA FOR ' . $time . '<br>';
}
}
}
}
Of course, I sort this in a table, and the result is this:
Result
Just for sensors 1, 2, 3, 4, and 5, the load time is over 5.5 seconds. I don't know how to optimize this further. I've put indexes on the queried columns, but I can't think of anything else.
My SHOW CREATE TABLE
for data
:
CREATE TABLE `data` (
`id` int(13) NOT NULL AUTO_INCREMENT,
`sensor_id` int(13) NOT NULL,
`datetime` datetime NOT NULL,
`value` float NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `sensor_id_2` (`sensor_id`,`datetime`,`value`),
KEY `sensor_id` (`sensor_id`),
KEY `value` (`value`),
KEY `datetime` (`datetime`),
KEY `sensor_id_3` (`sensor_id`),
KEY `datetime_2` (`datetime`)
) ENGINE=InnoDB AUTO_INCREMENT=103921 DEFAULT CHARSET=utf8
And my SHOW CREATE TABLE
for times
:
CREATE TABLE `minutes` (
`id` int(13) NOT NULL AUTO_INCREMENT,
`value` varchar(16) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_3` (`id`),
KEY `value` (`value`),
KEY `id` (`id`),
KEY `value_2` (`value`),
KEY `id_2` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1441 DEFAULT CHARSET=utf8
2 Answers 2
Is this the only query on data
that matters? You have too many simple indexes. For example, there's no reason to have an index on just sensor_id
if you already have one with sensor_id
in the front.
CREATE TABLE `data` (
`id` int(13) NOT NULL AUTO_INCREMENT,
`sensor_id` int(13) NOT NULL,
`datetime` datetime NOT NULL,
`value` float NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `sensor_id_2` (`sensor_id`,`datetime`,`value`)
)
The minutes table is even worse. There's only two columns but you have six indexes (the PRIMARY KEY
is an index on id
). Three would be serious overkill.
CREATE TABLE `minutes` (
`id` int(13) NOT NULL AUTO_INCREMENT,
`value` varchar(16) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `value_id` (`value`, `id`),
UNIQUE KEY `id_value` (`id`, `value`)
)
Your query looks weird to me. You're doing string operations (LIKE
) on a datetime
field. This probably means that you are skipping the index and almost certainly that you aren't using the index to limit the possible rows. A more common form would be
SELECT t.`value`, d.`sensor_id`, COUNT(t.`value`) as `numrows`
FROM `data` d
RIGHT JOIN `times` t
ON TIME_FORMAT(d.`datetime`, '%H:%i') = t.`value`
WHERE d.`datetime` >= '2014-11-05 00:00:00'
AND d.`datetime` <= '2014-11-05 23:59:00'
AND d.`sensor_id` IN (1,2,3,4,5,999)
GROUP BY d.`sensor_id`, t.`value`
ORDER BY d.`sensor_id` ASC, t.`value` ASC
And you should be aware that when you specify IN
, you are repeating the query once for each value. So this query would run six times. You may find it easier to run the six queries on the PHP side, as they'll be smaller. This can also help your programming logic.
You might want to look up EXPLAIN
plans, as they should help hint at what your problems are. You might consider posting your SQL and explain plan to Stack Overflow (hint, just add EXPLAIN
before the SELECT
and look at the results in an admin tool like phpMySQL).
It's possible that the thing to do is to replace your DATETIME
column with a DATE
column and a minutes_id
column. Then your join would be on a fast integer join and you could do a straightforward date check.
Rather than doing a RIGHT JOIN
, you might be better off with a regular inner join. Then look for missing times in PHP. Outer joins (left and right) are much slower than properly indexed inner joins. You may find it easier to look for the missing times in PHP than to have your database do it.
I find it questionable to read all the data in and then use it. It's often better to read the data and then use it immediately.
$times = array();
for ( $hour = 0; $hour < 24; $hour++ ) {
for ( $minute = 0; $minute < 60; $minute++ ) {
$times[] = sprintf('%02u:%02u', $hour, $minute);
}
}
reset($times);
$last_sensor = 0; // assumes that there is no sensor 0
while ( $s = $select->fetch_assoc() ) {
if ( $s['sensor_id'] != $last_sensor ) {
echo 'Sensor: ' . $s['sensor_id'] . PHP_EOL;
$last_sensor = $s['sensor_id'];
// clear any remaining times from the last sensor
while ( list(, $time) = each($times) ) {
echo "$time 0" . PHP_EOL;
}
reset($times);
}
// iterate through until we find the current time
list(, $time) = each($times);
while ( $time != $s['value'] ) {
// mark the missing times
echo "$time 0" . PHP_EOL;
list(, $time) = each($times);
}
echo "$time {$s['numrows']}" . PHP_EOL;
}
SQL
I would write the query as
SELECT t.`value` AS time_of_day, d.`sensor_id`
FROM `times` t
LEFT OUTER JOIN `data` d
ON t.`mysql_hour_minute` = EXTRACT(HOUR_MINUTE FROM d.`datetime`)
WHERE
d.`datetime` BETWEEN '2014-11-05' AND '2014-11-05 23:59:59'
AND d.`sensor_id` IN (1, 2, 3, 4, 5, 999)
GROUP BY t.`value`, d.`sensor_id`
HAVING COUNT(t.`value`) > 0
ORDER BY d.`sensor_id` ASC, t.`value` ASC;
Note the following changes:
- Changed
RIGHT JOIN
toLEFT OUTER JOIN
, as left joins are more common and natural. Also, it matches the order in which the selected columns are presented. - Use the appropriate datetime function to avoid the
LIKE
operator for the join condition. Note that you'll have to add a column (which I've calledmysql_hour_minute
) to thetimes
table, with integer values 0, 1, 2, ..., 59, 100, 101, ..., 159, 200, ..., 2359. - To accomplish
WHERE d.
datetimeLIKE '...'
, the database would need to stringify every item. Stringifying is expensive in the first place; having to do it for every row is even worse. If you use aBETWEEN
operator, then it can take advantage of an index. (And I hope you do have an index on thedatetime
column.) - If all you need to know is whether the sensors collected any data in each time slot, you could just put a
HAVING
condition on the query instead of requesting the count.
In addition, based on the fact that you accidentally left a $
in the example date, I suspect that you are composing the query by string interpolation, rather than by parameter substitution. You might have an SQL injection vulnerability.
PHP
A more idiomatic way to repeat a loop 24 times is:
for ($i = 0; $i < 24; $i++) { ... }
as it includes the number 24. Starting from 0 but using the <=
comparison is a bit odd.
The loops would be better written as
for ($h = 0; $h < 24; $h++) {
for ($m = 0; $m < 60; $m++) {
$time = sprintf('%02d:%02d', $h, $m);
...
}
}
times
orminutes
? \$\endgroup\$