I tried to solve the following problem for about one hour now and still didn't get any further with it.
Okay, I have a table (MyISAM):
+---------+-------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| http | smallint(3) | YES | MUL | 200 | |
| elapsed | float(6,3) | NO | | NULL | |
| cached | tinyint(1) | YES | | NULL | |
| ip | int(11) | NO | | NULL | |
| date | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+---------+-------------+------+-----+-------------------+----------------+
Please don't mind the indexes, I've been playing around trying to find a solution. Now, here's my query.
SELECT http,
COUNT( http ) AS count
FROM reqs
WHERE DATE(date) >= cast(date_sub(date(NOW()),interval 24 hour) as datetime)
GROUP BY http
ORDER BY count;
the table is storing information about incoming web requests so its a rather big database.
+-----------+
| count(id) |
+-----------+
| 782412 |
+-----------+
note that there's no better way of setting a primary key as the id column will be the only unique identifier I have. The above mentioned query takes about 0.6-1.6 seconds to run.
Which index would be clever? I figured that indexing date will give me "bad" cardinality and thus MySQL won't use it. http is also a bad choice as there are only about 20 different possible values.
Thanks for you help!
Update 1 I've added an index on (http, date) as ypercube suggested:
mysql> CREATE INDEX httpDate ON reqs (http, date);
and used his query, but it performed equally bad. The added index:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| reqs | 0 | PRIMARY | 1 | id | A | 798869 | NULL | NULL | | BTREE | |
| reqs | 1 | httpDate | 1 | http | A | 19 | NULL | NULL | YES | BTREE | |
| reqs | 1 | httpDate | 2 | date | A | 99858 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
and the EXPLAIN
+----+--------------------+-------+-------+---------------+----------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+----------+---------+------+-------+-----------------------------------------------------------+
| 1 | PRIMARY | r | range | NULL | httpDate | 3 | NULL | 20 | Using index for group-by; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | ri | ref | httpDate | httpDate | 3 | func | 41768 | Using where; Using index |
+----+--------------------+-------+-------+---------------+----------+---------+------+-------+-----------------------------------------------------------+
MySQL server version:
mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+---------------------+
| Variable_name | Value |
+-------------------------+---------------------+
| protocol_version | 10 |
| version | 5.1.73 |
| version_comment | Source distribution |
| version_compile_machine | x86_64 |
| version_compile_os | redhat-linux-gnu |
+-------------------------+---------------------+
5 rows in set (0.00 sec)
2 Answers 2
I have three suggestions
SUGGESTION #1 : Rewrite the query
You should rewrite the query as follows
SELECT http,
COUNT( http ) AS count
FROM reqs
WHERE date >= ( DATE(NOW() - INTERVAL 1 DAY) + INTERVAL 0 SECOND )
GROUP BY http
ORDER BY count;
or
SELECT * FROM
(
SELECT http,
COUNT( http ) AS count
FROM reqs
WHERE date >= ( DATE(NOW() - INTERVAL 1 DAY) + INTERVAL 0 SECOND )
GROUP BY http
) A ORDER BY count;
The WHERE should not have a function on both sides of the equal sign. Having date on the left side of the equals sign makes it easier for the Query Optimizer to use an index against it.
SUGGESTION #2 : Supporting Index
I would also suggest a different index
ALTER TABLE reqs ADD INDEX date_http_ndx (date,http); -- not (http,date)
I suggest this order of columns because the date
entries would all be contiguous in the index. Then, the query simply collects http
values without skipping gaps in http
.
SUGGESTION #3 : Bigger Key Buffer (Optional)
MyISAM only uses index caching. Since the query should not touch the .MYD
file, you should use a slightly bigger MyISAM Key Buffer.
To set it to 256M
SET @newsize = 1024 * 1024 * 256;
SET GLOBAL key_buffer_size = @newsize;
Then, set it in my.cnf
[mysqld]
key_buffer_size = 256M
Restart of MySQL not required
Give it a Try !!!
-
I tried the queries you gave me. #1 performed about as good as the other suggestion or my own, the second one actually performed worse. Same thing for the Supporting Index - make the performance drop about 75 percent. I'm going to try the bigger key buffer now, thank you anyways!Robin Heller– Robin Heller2014年08月10日 21:59:03 +00:00Commented Aug 10, 2014 at 21:59
-
I accepted your answer although it didn't fix the problem, with a bigger key buffer however it performed somewhat better. Closing this as it's the best soluion of all given. Thank you!Robin Heller– Robin Heller2014年08月12日 13:25:31 +00:00Commented Aug 12, 2014 at 13:25
-
1For Suggestion #2 to work, it may be necessary to add "USE INDEX" or "FORCE INDEX" in the query, at least that is what I had to do in order to speed up my query after creating an index like that.Johano Fierra– Johano Fierra2017年11月07日 14:38:53 +00:00Commented Nov 7, 2017 at 14:38
Change your date column type to an integer. Store the date as a Unix date in integer. Timestamp Is a lot larger than an int. You'd get some bang out of that.
-
4Are you kidding? Both
INT
andTIMESTAMP
need 4 bytes.ypercubeᵀᴹ– ypercubeᵀᴹ2014年08月10日 19:30:25 +00:00Commented Aug 10, 2014 at 19:30 -
4Not ot mention that you lose all the datetime functions when you are storing dates or timestamps as integers.ypercubeᵀᴹ– ypercubeᵀᴹ2014年08月10日 19:44:52 +00:00Commented Aug 10, 2014 at 19:44
Explore related questions
See similar questions with these tags.
http
column being nullable. I'll investigate tomorrow, if I find time.http NOT NULL
) and copying all data to it (except the rows with http NULL of course.)