MySQL select with between optimization

Question 1

I got issues with some geo IP query. So basically here is create code of table:

CREATE TABLE `geo_ip_city` (
 `id` INT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
 `begin` BIGINT(20) NOT NULL DEFAULT '0',
 `end` BIGINT(20) NOT NULL DEFAULT '0',
 `code` VARCHAR(2) NOT NULL DEFAULT '',
 `city` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0',
 PRIMARY KEY (`id`),
 INDEX `begin` (`begin`),
 INDEX `end` (`end`),
 INDEX `code` (`code`)
)

And query:

SELECT * FROM `geo_ip_city` USE INDEX ( `end` ) WHERE `begin` <= 2523596988 AND `end` >= 2523596988 LIMIT 1

Table size is about 4.5m records.

This query runs when all other criteria doesn't work, but still on peak times quite often.

In worst case scenario (where nothing found) it executes for 1.9444 seconds.

Here is explain:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE geo_ip_city range end end 8 NULL 2297506 Using index condition; Using where

So when i have alot of requests for this feature, my servers going crazy.

Question:

Can I do something with this table or query to increase performance (maybe partitioning or complex keys)? Or should I watch in some other way?

Question 2

Can I ask why you have LIMIT 1? If you remove it, is there a chance that you get more than 1 row?

Question 3

In other words, are the intervals (begin, end) overlapping or not?

Question 4

No they aren't. LIMIT 1 here to prevent scanning table if result already found.

Question 5

If you are 100% sure that the intervals (begin, end) are never going to be overlapping, you can use this query, which only needs an index on (begin) or (begin, end) and will be much more efficient than what you have:

SELECT t.*
FROM 
 ( SELECT g.* 
 FROM geo_ip_city AS g
 WHERE g.begin <= 2523596988 
 ORDER BY g.begin DESC 
 -- ORDER BY g.begin DESC, g.end DESC
 LIMIT 1
 ) AS t
WHERE t.end >= 2523596988 ;

The only problem with the above is that this constraint (non-overlapping intervals) is not enforced by the database. Postgres has a nice EXCLUDE feature which can be used for exactly such a constraint - but MySQL cannot do this via DDL alone. Your application or procedures have to enforce it.

So, if by accident two or more overlapping intervals have been inserted into the table, the query may return 0 rows - while a match actually exists.

Question 6

Wow. Thats incredible fast. Lemme check on real data. Intervals can't overlap, and if they will it's issue anyway. So data consistency won't suffer.

Question 7

Your conditions are searching for ranges. Now imagine you're looking in a dictionary and looking up all words where the first letter is "greater" than A. Of how much use is the index? You want to narrow the search range down as much as possible. MySQL most of the time can only use one index per table. Combine those begin and end indexes.

CREATE TABLE `geo_ip_city` (
 `id` INT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
 `begin` BIGINT(20) NOT NULL DEFAULT '0',
 `end` BIGINT(20) NOT NULL DEFAULT '0',
 `code` VARCHAR(2) NOT NULL DEFAULT '',
 `city` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0',
 PRIMARY KEY (`id`),
 INDEX `idx_begin_end` (`begin`, `end`)
);

I also removed the index on code because it's also pretty much useless. It consists of two letters. Assuming there are no special characters/symbols included, this leaves you with 26*26=676 possible values for this column. That's a selectivity of 676 / 4.500.000 = 0.0001. You want to have the selectivity as near to 1 as possible.

Question 8

I will try it, actually i thought about it. Indexing will take alot, so i needed to check this. I will try to do it and will response here after that. Thank you.

Question 9

Tried, added complex index. Query now executes about 2 seconds, so no win.

Question 10

if you trying to use the same query then it could be optimized adding composite index to geo_ip_city

the new index could be

ALTER TABLE geo_ip_city ADD INDEX ind_begin_end (begin, end)

with this you would need to remove you use index in the select statment to be

SELECT * FROM geo_ip_city WHERE begin <= 2523596988 AND end >= 2523596988 LIMIT 1

ypercubeTM ypercubeTM 99.7k13 gold badges217 silver badges306 bronze badges · Accepted Answer · 2014-11-24 12:42:50Z

If you are 100% sure that the intervals (begin, end) are never going to be overlapping, you can use this query, which only needs an index on (begin) or (begin, end) and will be much more efficient than what you have:

SELECT t.*
FROM 
 ( SELECT g.* 
 FROM geo_ip_city AS g
 WHERE g.begin <= 2523596988 
 ORDER BY g.begin DESC 
 -- ORDER BY g.begin DESC, g.end DESC
 LIMIT 1
 ) AS t
WHERE t.end >= 2523596988 ;

The only problem with the above is that this constraint (non-overlapping intervals) is not enforced by the database. Postgres has a nice EXCLUDE feature which can be used for exactly such a constraint - but MySQL cannot do this via DDL alone. Your application or procedures have to enforce it.

So, if by accident two or more overlapping intervals have been inserted into the table, the query may return 0 rows - while a match actually exists.

Wow. Thats incredible fast. Lemme check on real data. Intervals can't overlap, and if they will it's issue anyway. So data consistency won't suffer.

Stack Exchange Network

MySQL select with between optimization

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MySQL select with between optimization

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions