I've a big table (MariaDB 11.2) with many fields, but I'll resume the table like this:
CREATE TABLE `GT_CalendariProfessional` (
`CalendariProfessionalID` BIGINT(20) NOT NULL AUTO_INCREMENT,
`DataInici` DATETIME(6) NOT NULL,
`DataFi` DATETIME(6) NOT NULL,
PRIMARY KEY (`CalendariProfessionalID`) USING BTREE,
INDEX `DataInici_DataFi` (`DataInici`, `DataFi`) USING BTREE,
INDEX `DataInici` (`DataInici`) USING BTREE
) COLLATE='utf8mb4_unicode_ci' ENGINE=InnoDB;
Example data:
CalendariProfessionalID;DataInici;DataFi
11535212;2024年07月08日 08:00:00,000000;2024年07月08日 15:00:00,000000;
I've a complex query that filter the data by dates (DataInici, DataFi) and it doesn't use indexes. I tried to simplify the query at maximum, and now it uses indexes but they filter almost all the rows:
SELECT COUNT(1) FROM GT_CalendariProfessional
Result: 9195499
EXPLAIN SELECT cp.CalendariProfessionalId
FROM GT_CalendariProfessional cp
WHERE '2024-05-20 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024-05-27 23:59:59'
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | cp | range | DataInici_DataFi,DataInici | DataInici_DataFi | 8 | \N | 9136502 | Using where; Using index |
If I add more fields to the SELECT, it doesn't use indexes:
EXPLAIN SELECT *
FROM GT_CalendariProfessional cp
WHERE '2024-05-20 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024-05-27 23:59:59'
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | cp | ALL | DataInici_DataFi,DataInici | \N | \N | \N | 9136502 | Using where |
So it seems that there's no difference between using or not using the DateTime index. Why is that?
Thx
PD1. The fraction of data returned is around 0,5% !
SELECT COUNT(1)
FROM GT_CalendariProfessional cp
WHERE '2024-05-20 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024-05-27 23:59:59'
Result: 50278
PD2. Tried changing WHERE order unsuccessful:
EXPLAIN SELECT * FROM GT_CalendariProfessional cp WHERE cp.DataFi >='2024-05-20 00:00:00' AND cp.DataInici <= '2024-05-27 23:59:59'
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | cp | ALL | \N | \N | \N | \N | 8992598 | Using where |
PD3. Tried casting the string to DATETIME unsuccessful:
EXPLAIN SELECT *
FROM GT_CalendariProfessional cp
WHERE STR_TO_DATE('2024-05-20 00:00:00', '%Y-%m-%d %H:%i:%s') <= cp.DataFi AND cp.DataInici <= STR_TO_DATE('2024-05-27 23:59:59', '%Y-%m-%d %H:%i:%s');
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | cp | ALL | \N | \N | \N | \N | 8992598 | Using where |
PD4. Tried this and it uses index, but is very slow!
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | GT_CalendariProfessional | range | DataInici_DataFi | DataInici_DataFi | 8 | \N | 9017998 | Using where; Using index |
4 | INTERSECT | GT_CalendariProfessional | index | \N | DataInici_DataFi | 16 | \N | 9017998 | Using where; Using index |
\N | INTERSECT RESULT | <intersect1,4> | ALL | \N | \N | \N | \N | \N |
PD5. Tried ANALYZE TABLE ... PERSISTENT FOR ALL and no changes! 😥
1 Answer 1
This is to be expected.
DATETIME
is a red herring. But the different column names is important. Let me rewrite the query with fewer keystrokes...
The SELECT
has two unrelated ranges; MySQL cannot do much to optimize this query:
SELECT * FROM t WHERE a > 5 AND b < 9
Even with INDEX(a,b)
, the best that the Optimizer can do is to
- Drill into the index at a=5.
- Scan forward until the end of the index, checking each row for
b<9
. - Each time both filters pass, reach over into the data's BTree to find the columns requested by
*
.
If a > 5
occurs more than about 20% of the table, the Optimizer will decide that a table scan is faster. Hence the INDEX
is shunned.
If, instead, the query were
SELECT a,b FROM t WHERE a > 5 AND b < 9
then the index would be used but for a different reason. In this case, the index is "covering" in that all columns anywhere in the SELECT
are in the index. This obviates Step 3 above.
The benefit of Ergest's extra index (with the columns swapped) is that the Optimizer will do the following:
- Estimate effort for using
INDEX(a,b)
-- That is, see how many rows witha>4
. - Estimate effort for using
INDEX(b,a)
-- That is, see how many rows withb<9
. - Decide which one is likely to be faster.
- Do the three steps I originally mentioned with just the 'better' index.
I have demonstrated such with INDEX(lat, lng), INDEX(lng, lat)
in Find Nearest That blog discusses various ways to search two-dimensional data.
INDEX DataFi_DataInici (DataFi,DataInici)
SELECT CalendariProfessionalId FROM ... WHERE '2024年05月20日 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024年05月27日 23:59:59'
NOW uses the new INDEX DataFi_DataInici, and the EXPLAIN says that 'rows' are only 2304278 instead of 8992598 ! That's good. But in terms of speed, the query lasts the same. If I add more fields to the SELECT, the EXPLAIN tells that the query doesn't use indexes 😣 I don't understand 1) Why 2M rows, if the COUNT(1) tells tha only 40K is returned and 2) why number of fields are important on index used.WHERE cp.DataFi >='2024年05月20日 00:00:00'
? 1. I can't tell , normally it shouldn't . 2. Think of a phone book , when you have to search Ergest (FirstName), Basha(LastName) ,Tirana(Location) . where FirstName = ? , LastName = ? .. . Now which one will be faster for you to find if all the columns are ordered in the book or un-ordered ? In simple terms same applies on MySQL using indexes . Not only the columns but the order of indexes matter. Rick has a good explanation on your specific case