Recently I upgraded MySQL on my server from newest 5.7.x to newest 8.0.x and I imported my entire database with multi-indexes that I created when I was using MySQL 5.7.x. These indexes optimize SELECT queries that run with WHERE clause on few columns (both INT and VARCHAR columns) on a table that has ~ 8 million records (no JOINs) and they used to work very well, but strangly after the upgrade they do NOT work at all... some queries on this table are MUCH slower :-( We don't want to stay with MySQL 5.7 as it will be discontinued this year. Have you had any similar problem as I have? I tweaked my.ini the same way I did in MySQL 5.7 by increasing buffer sizes, etc. Please help. I will provide more details later if it's necessary like the table structure, example queries and my.ini . Thank you very much.
EDIT: TABLE AND INDEXES DEFINITIONS:
CREATE TABLE IF NOT EXISTS
event
(id
int(11) NOT NULL AUTO_INCREMENT,imei
char(50) NOT NULL,cradle_id
varchar(45) NOT NULL,company_id
int(10) unsigned NOT NULL,event_type
int(10) NOT NULL,lat
double NOT NULL,lon
double NOT NULL,
speed
double NOT NULL,dtime
datetime NOT NULL,force_level
double NOT NULL DEFAULT '0',duration
double NOT NULL,
force_hist
varchar(300) NOT NULL DEFAULT 'non',lr_force_hist
varchar(300) NOT NULL DEFAULT 'non',speed_hist
varchar(250) NOT NULL DEFAULT 'non',milage
double NOT NULL,filter
int(11) DEFAULT '1',filter_reason
int(11) DEFAULT NULL,road_type
int(11) DEFAULT NULL,calibration_variance
double DEFAULT '0',
reason
int(11) DEFAULT NULL,reason_other
text,
distance_adjustment
float DEFAULT NULL,max_speed
smallint(6) DEFAULT NULL,location_name
varchar(225) DEFAULT NULL,
device_id
char(17) DEFAULT NULL,utc_offset
int(11) NOT NULL DEFAULT '0',duplicated
int(11) DEFAULT '0',
PRIMARY KEY (id),
KEY event_index (imei,dtime,event_type,filter),
KEY event_index2 (device_id,dtime,event_type,filter),
KEY event_index3 (filter)
) ENGINE=InnoDB AUTO_INCREMENT=40026132 DEFAULT CHARSET=utf8;
SLOW QUERY LOG:
Query_time: 2.986858 Lock_time: 0.000001
Rows_sent: 1 Rows_examined: 42355 SET timestamp=1680260078;
select *
from event
where ( ( imei = "0e69eb4e-9999-494c-873d-9416d3569ab0"
and dtime > "2021年12月16日 13:23:50" )
or ( imei = "ae7f20e8-34d8-48e4-99a1-ff6197bc16ff"
and dtime > "2021年11月17日 14:44:05"
and dtime < "2021年12月16日 13:23:49"
)
or ( imei = "bd415fb6-9a36-493f-ba65-fd54db7e6fce"
and dtime > "2021年01月21日 08:46:21"
and dtime < "2021年11月17日 14:44:04"
)
or ( imei = "357577093449854"
and dtime > "2018年10月07日 19:45:26"
and dtime < "2021年01月21日 08:46:20"
)
or ( imei = "359751080340701"
and dtime > "2017年07月31日 11:58:57"
and dtime < "2018年10月07日 19:45:25"
)
or ( imei = "358979073825062"
and dtime > "2016年10月11日 09:02:37"
and dtime < "2017年07月31日 11:58:56" )
)
and filter != 3
order by dtime desc, id desc
limit 1;
EXPLAIN of the above query on MySQL 8.0:
1 SIMPLE event range event_index3,event_index event_index 155 82900 50.00 Using index condition; Using MRR; Using filesort
EXPLAIN of the above query on MySQL 5.7:
1 SIMPLE event range event_index3,event_index event_index 155 82900 50.00 Using index condition; Using filesort
The query above takes 1.167099 second on MySQL 8.0, but is super fast on MySQL 5.7, only 0.1 second on MySQL 5.7
This happens with default MySQL 8.0 settings in my.ini and also with tweaked my.ini (increased buffer sizes). Both of the databases have KEYRING plugin enabled.
I tried turning off different indexes (setting them INVISIBLE) on MySQL 8.0 but it didn't speed up the query at all.
The server has NVME disk that reads with speed ~ 12 GB/s.
There are about 8 million records in that table. Unfortunatelly I cannot attach entire dump of data here, because it would violate privacy of clients. I can create some dummy data though if you need it to help me. Thank you.
4 Answers 4
You need to provide more information, such as table and index definition, the slow query and the execution plans on 5.7 and 8.0, statistics of tables and columns referenced in predicates.
-
Hello. Thank you for your response. I EDITED my original question. I hope the information that I attached will help here.Programming Caveman– Programming Caveman2023年03月30日 10:31:23 +00:00Commented Mar 30, 2023 at 10:31
Looks MRR causing the performance regression. You actually have a better solution to avoid such a performance regression: try to add a composite index on (imei, dtime desc, id desc). It will avoid using MRR and filesort.
-- explanation
You only take one row(limit 1),so you should avoid sorting, which is very expensive(sort all but only take one).
To take advantage of the order of an index, the index you use can only take equal conditions in predicate, which is why range conditions shouldn't be considered.
For the new query, my suggestion will be rewriting it to following union
query with an index on event(imei, dtime desc, id desc)
.
select PawDT_1680490904511.*
from
((select *, event.dtime, event.id
from event
where event.imei = '358979073825062' and event.dtime > '2016-10-11 09:02:37'
and event.dtime < '2017-07-31 11:58:56' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
union
(select *, event.dtime, event.id
from event
where event.imei = '0e69eb4e-9999-494c-873d-9416d3569ab0'
and event.dtime > '2021-12-16 13:23:50' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
union
(select *, event.dtime, event.id
from event
where event.imei = 'ae7f20e8-34d8-48e4-99a1-ff6197bc16ff'
and event.dtime > '2021-11-17 14:44:05' and event.dtime < '2021-12-16 13:23:49'
and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
union
(select *, event.dtime, event.id
from event
where event.imei = 'bd415fb6-9a36-493f-ba65-fd54db7e6fce'
and event.dtime > '2021-01-21 08:46:21' and event.dtime < '2021-11-17 14:44:04'
and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
union
(select *, event.dtime, event.id
from event
where event.imei = '357577093449854' and event.dtime > '2018-10-07 19:45:26'
and event.dtime < '2021-01-21 08:46:20' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)
union
(select *, event.dtime, event.id
from event
where event.imei = '359751080340701' and event.dtime > '2017-07-31 11:58:57'
and event.dtime < '2018-10-07 19:45:25' and event.filter <> 3 order by event.dtime desc, event.id desc limit 1)) as PawDT_1680490904511
order by PawDT_1680490904511.dtime desc, PawDT_1680490904511.id desc limit 1
The logic is the same for each union
branch.
-
Hi. Why not to include "filter" column as well? As why did you propose such order of columns for new index?Programming Caveman– Programming Caveman2023年03月31日 10:39:31 +00:00Commented Mar 31, 2023 at 10:39
-
It accually worked but I need to understand WHY.Programming Caveman– Programming Caveman2023年03月31日 11:02:52 +00:00Commented Mar 31, 2023 at 11:02
-
However when I add more imei and dtime conditions to the query then it still uses MRR and becomes much slower.Programming Caveman– Programming Caveman2023年03月31日 11:07:57 +00:00Commented Mar 31, 2023 at 11:07
-
I edited my question AGAIN. I posted longer query which has more conditions and this one doesn't work fast in MySQL 8 even with your proposed index (and all previous indexes turned off). What would you suggest for this case?Programming Caveman– Programming Caveman2023年03月31日 12:46:59 +00:00Commented Mar 31, 2023 at 12:46
-
Added explanation of previous workable suggestion and the solution to the new query.Tomas.Nemec– Tomas.Nemec2023年04月03日 03:36:33 +00:00Commented Apr 3, 2023 at 3:36
Tomas almost has it. The issue is that OR
is essentially un-optimizable. The workaround is to do a UNION ALL
Change his answer to
SELECT ...
FROM (
( SELECT ... ORDER BY ... LIMIT 1 )
UNION ALL
( SELECT ... ORDER BY ... LIMIT 1 )
UNION ALL
( SELECT ... ORDER BY ... LIMIT 1 )
UNION ALL
...
)
ORDER BY ... LIMIT 1 -- yes, repeated
Without the ORDER BY ... LIMIT 1, it will collect all possible rows before reaching the final LIMIT
. This could lead to a bulky temp table and filesort.
Since each test looks for a different imei
, I feel safe is using UNION ALL
instead of the slower UNION
(which implies DISTINCT
).
I think INDEX(imei, dtime)
(or at least starting with those 2 columns) will be optimal.
This formulation and index should work well for any MySQL version.
... when I add more imei and dtime conditions ...
Consider creating and populating a table that contains those 3 values. Then JOIN
to event
-- no need for OR
or UNION
. And it may be even faster.
-
have you got any solution for the performance issue? After upgrading MySQL version 5.7 to 8.0.mysql_aurora.3.04.0, I am facing the same issue.Ketan Panchal– Ketan Panchal2023年12月07日 13:48:17 +00:00Commented Dec 7, 2023 at 13:48
I'm experiencing the same thing when doing "in" queries with thousands of values. It looks like the range optimizer changed how it estimates the memory usage. The range optimizer tries to estimate how much memory the query will use, and compares it to the variable range_optimizer_max_mem_size. If it comes over this value then it reverts to a full table scan. The default value for range_optimizer_max_mem_size is around 8 megabytes, and this allows for values ~35k values, which lines up with this estimation technique.
But specifically in mysql 8, if the table has multiple indexes in addition to the one used then it seems to be counting those somehow. If I have a table with
create table parameter_test
(
id int not null auto_increment primary key,
v1 varchar(64),
v2 varchar(64),
v3 varchar(64),
v4 varchar(64),
v5 varchar(64),
index ix_v1 (v1),
index ix_v2 (v2),
index ix_v3 (v3),
index ix_v4 (v4),
index ix_v5 (v5)
);
Then I do
select
*
from
parameter_test
where
id in (
1000001,1000002,1000003,1000004,1000005,1000006... (thousands of entries)
);
it only goes to 9000 before it switches to a full table scan.
If I drop the varchar indexes then it gets to 34k before it switches to a full table scan.
I can't find any documentation about this change, and I don't know if the memory usage changed, or just the estimation of it changes.
-
The "cost-based" Optimizer works with a lot of numbers. It is not easy to figure out how it comes up with 9K and 34K. The bottom line is that "IN"has multiple ways of being performed, and the Optimizer sometimes fails to pick the "best" way.Rick James– Rick James2023年12月09日 18:15:18 +00:00Commented Dec 9, 2023 at 18:15
Explore related questions
See similar questions with these tags.