Why MariaDB 11 isn't using DateTime index?

Question 1

I've a big table (MariaDB 11.2) with many fields, but I'll resume the table like this:

CREATE TABLE `GT_CalendariProfessional` (
 `CalendariProfessionalID` BIGINT(20) NOT NULL AUTO_INCREMENT,
 `DataInici` DATETIME(6) NOT NULL,
 `DataFi` DATETIME(6) NOT NULL,
 PRIMARY KEY (`CalendariProfessionalID`) USING BTREE,
 INDEX `DataInici_DataFi` (`DataInici`, `DataFi`) USING BTREE,
 INDEX `DataInici` (`DataInici`) USING BTREE
) COLLATE='utf8mb4_unicode_ci' ENGINE=InnoDB;

Example data:

CalendariProfessionalID;DataInici;DataFi
11535212;2024年07月08日 08:00:00,000000;2024年07月08日 15:00:00,000000;

I've a complex query that filter the data by dates (DataInici, DataFi) and it doesn't use indexes. I tried to simplify the query at maximum, and now it uses indexes but they filter almost all the rows:

SELECT COUNT(1) FROM GT_CalendariProfessional

Result: 9195499

EXPLAIN SELECT cp.CalendariProfessionalId
 FROM GT_CalendariProfessional cp
 WHERE '2024-05-20 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024-05-27 23:59:59'

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	cp	range	DataInici_DataFi,DataInici	DataInici_DataFi	8	\N	9136502	Using where; Using index

If I add more fields to the SELECT, it doesn't use indexes:

EXPLAIN SELECT *
 FROM GT_CalendariProfessional cp
 WHERE '2024-05-20 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024-05-27 23:59:59'

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	cp	ALL	DataInici_DataFi,DataInici	\N	\N	\N	9136502	Using where

So it seems that there's no difference between using or not using the DateTime index. Why is that?

Thx

PD1. The fraction of data returned is around 0,5% !

SELECT COUNT(1)
 FROM GT_CalendariProfessional cp
 WHERE '2024-05-20 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024-05-27 23:59:59'

Result: 50278

PD2. Tried changing WHERE order unsuccessful:

EXPLAIN SELECT * FROM GT_CalendariProfessional cp WHERE cp.DataFi >='2024-05-20 00:00:00' AND cp.DataInici <= '2024-05-27 23:59:59'

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	cp	ALL	\N	\N	\N	\N	8992598	Using where

PD3. Tried casting the string to DATETIME unsuccessful:

EXPLAIN SELECT *
 FROM GT_CalendariProfessional cp
 WHERE STR_TO_DATE('2024-05-20 00:00:00', '%Y-%m-%d %H:%i:%s') <= cp.DataFi AND cp.DataInici <= STR_TO_DATE('2024-05-27 23:59:59', '%Y-%m-%d %H:%i:%s');

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	cp	ALL	\N	\N	\N	\N	8992598	Using where

PD4. Tried this and it uses index, but is very slow!

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	GT_CalendariProfessional	range	DataInici_DataFi	DataInici_DataFi	8	\N	9017998	Using where; Using index
4	INTERSECT	GT_CalendariProfessional	index	\N	DataInici_DataFi	16	\N	9017998	Using where; Using index
\N	INTERSECT RESULT	<intersect1,4>	ALL	\N	\N	\N	\N	\N

PD5. Tried ANALYZE TABLE ... PERSISTENT FOR ALL and no changes! 😥

Question 2

Maybe adding another index INDEX DataFi_DataInici (DataFi,DataInici)

Question 3

@ErgestBasha I forgot! Now I'm tried and the

SELECT CalendariProfessionalId FROM ... WHERE '2024年05月20日 00:00:00' <= cp.DataFi AND cp.DataInici <= '2024年05月27日 23:59:59'

NOW uses the new INDEX DataFi_DataInici, and the EXPLAIN says that 'rows' are only 2304278 instead of 8992598 ! That's good. But in terms of speed, the query lasts the same. If I add more fields to the SELECT, the EXPLAIN tells that the query doesn't use indexes 😣 I don't understand 1) Why 2M rows, if the COUNT(1) tells tha only 40K is returned and 2) why number of fields are important on index used.

Question 4

@Dimas Still I do not understand why using WHERE '2024年05月20日 00:00:00' <= cp.DataFi and not WHERE cp.DataFi >='2024年05月20日 00:00:00'? 1. I can't tell , normally it shouldn't . 2. Think of a phone book , when you have to search Ergest (FirstName), Basha(LastName) ,Tirana(Location) . where FirstName = ? , LastName = ? .. . Now which one will be faster for you to find if all the columns are ordered in the book or un-ordered ? In simple terms same applies on MySQL using indexes . Not only the columns but the order of indexes matter. Rick has a good explanation on your specific case

Question 5

@ErgestBasha is not the same? I took the idea from here: stackoverflow.com/a/12849999/952576. I tried changing the order of the comparison and I can't note any difference? Furthermore, I will read thought the Rick answer and the link, there are much to learn! Thank you

Question 6

This is to be expected.

DATETIME is a red herring. But the different column names is important. Let me rewrite the query with fewer keystrokes...

The SELECT has two unrelated ranges; MySQL cannot do much to optimize this query:

SELECT * FROM t WHERE a > 5 AND b < 9

Even with INDEX(a,b), the best that the Optimizer can do is to

Drill into the index at a=5.
Scan forward until the end of the index, checking each row for b<9.
Each time both filters pass, reach over into the data's BTree to find the columns requested by *.

If a > 5 occurs more than about 20% of the table, the Optimizer will decide that a table scan is faster. Hence the INDEX is shunned.

If, instead, the query were

SELECT a,b FROM t WHERE a > 5 AND b < 9

then the index would be used but for a different reason. In this case, the index is "covering" in that all columns anywhere in the SELECT are in the index. This obviates Step 3 above.

The benefit of Ergest's extra index (with the columns swapped) is that the Optimizer will do the following:

Estimate effort for using INDEX(a,b) -- That is, see how many rows with a>4.
Estimate effort for using INDEX(b,a) -- That is, see how many rows with b<9.
Decide which one is likely to be faster.
Do the three steps I originally mentioned with just the 'better' index.

I have demonstrated such with INDEX(lat, lng), INDEX(lng, lat) in Find Nearest That blog discusses various ways to search two-dimensional data.

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 1 · 2024-05-28 17:05:27Z

This is to be expected.

DATETIME is a red herring. But the different column names is important. Let me rewrite the query with fewer keystrokes...

The SELECT has two unrelated ranges; MySQL cannot do much to optimize this query:

SELECT * FROM t WHERE a > 5 AND b < 9

Even with INDEX(a,b), the best that the Optimizer can do is to

Drill into the index at a=5.
Scan forward until the end of the index, checking each row for b<9.
Each time both filters pass, reach over into the data's BTree to find the columns requested by *.

If a > 5 occurs more than about 20% of the table, the Optimizer will decide that a table scan is faster. Hence the INDEX is shunned.

If, instead, the query were

SELECT a,b FROM t WHERE a > 5 AND b < 9

then the index would be used but for a different reason. In this case, the index is "covering" in that all columns anywhere in the SELECT are in the index. This obviates Step 3 above.

The benefit of Ergest's extra index (with the columns swapped) is that the Optimizer will do the following:

Estimate effort for using INDEX(a,b) -- That is, see how many rows with a>4.
Estimate effort for using INDEX(b,a) -- That is, see how many rows with b<9.
Decide which one is likely to be faster.
Do the three steps I originally mentioned with just the 'better' index.

I have demonstrated such with INDEX(lat, lng), INDEX(lng, lat) in Find Nearest That blog discusses various ways to search two-dimensional data.

Stack Exchange Network

Why MariaDB 11 isn't using DateTime index?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why MariaDB 11 isn't using DateTime index?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions