I have two tables, 'auth' and 'ips', which I'd like to join in a query.
Table 'auth':
column name | data type | Extra |
---|---|---|
id | int | AI |
date | varchar(6) | - |
timestamp | varchar(8) | - |
result | varchar(8) | - |
ip | varchar(15) | - |
user | text | - |
service | varchar(4) | - |
Created by:\
CREATE TABLE `auth` (
`id` int PRIMARY KEY AUTO_INCREMENT,
`date` varchar(6),
`timestamp` varchar(8),
`result` varchar(8),
`ip` varchar(15),
`user` text,
`service` varchar(4)
);
No indexes.
Example row:
1 | Aug 29 | 03:39:31 | Failed | 39.109.71.83 | etherpad | ssh2 |
---|
Context: I have a program running locally which monitors the /var/log/auth.log file and inserts ssh login attempts into this table. Contains about 3,200 rows and counting.
Table 'ips':
column name | data type |
---|---|
start | int (unsigned) |
end | int (unsigned) |
country | text |
Created by:
CREATE TABLE `ips` (
`start` int UNSIGNED,
`end` int UNSIGNED,
`country` text,
INDEX(`start`, `end`)
);
Index for start and end (in that order), the table is also ordered by start (ascending) and rows start and end only contain unique values.
Example row:
16777216 | 16777471 | United States of America |
---|
Context: this table stores IP ranges and the country to which it (very likely) is assigned to. IPs are stored as unsigned int as one would get from using INET_ATON. Contains 486,257 rows and is not added to anymore, only rarely updated in its entirety.
Now to the queries:
The query I'm currently using:
SELECT auth.date, auth.timestamp, auth.result, auth.ip, auth.user,
ips.country
FROM `auth`
INNER JOIN ips ON (INET_ATON(auth.ip) >= ips.start
AND INET_ATON(auth.ip) < ips.end )
LIMIT 100;
This query takes approximately 9.7 seconds on my system, not even mentioning going through all 3200+ rows.
EXPLAIN query gives the following result:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | auth | NULL | ALL | NULL | NULL | NULL | NULL | 5443 | 100.00 | NULL |
1 | SIMPLE | ips | NULL | ALL | index | NULL | NULL | NULL | 466996 | 11.11 | Range checked for each record (index map: 0x1) |
If I change the auth.ip references in the 'ON' clause to a static IP (let's make one up: 200.200.200.200), we get the following query:
SELECT auth.date, auth.timestamp, auth.result, auth.ip, auth.user,
ips.country
FROM `auth`
INNER JOIN ips ON (INET_ATON("200.200.200.200") >= ips.start
AND INET_ATON("200.200.200.200") < ips.end )
LIMIT 100;
This query returns in just 0.15 seconds.
EXPLAIN query gives the following result:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | auth | NULL | ALL | NULL | NULL | NULL | NULL | 5443 | 100.00 | NULL |
1 | SIMPLE | ips | NULL | range | index | index | 4 | NULL | 234558 | 33.33 | Using index condition; Using join buffer (hash join) |
- Why is the first query taking roughly 63 times longer than the second query?
- Can the first query be improved regarding performance?
MySQL server version 8.0.26
If I missed something let me know, as it is my first post.
1 Answer 1
The direct answer:
INET_ATON(auth.ip)
is not sargable.
INET_ATON("200.200.200.200")
is evaluated before starting to run the query.
There is no simple way to make the range test efficient. Here is a not-so-simple solution: http://mysql.rjweb.org/doc.php/ipranges that does a very efficient job.
-
Please make sure you denote that you wrote the content presented in the link. Something like "Here is a post I wrote showing how to efficiently do a range test." Also, you should add one or two salient points (preferably code, with an explanation) from the target post into your answer here. As it is this answer is little more than self promotion since it doesn't provide a clear answer without visiting your site.2021年08月31日 18:51:53 +00:00Commented Aug 31, 2021 at 18:51