I have a problem with my database. I have a production server running MySQL 5.5.33 and a local test server running MySQL 5.5.5. The only other difference I can think of is that the prod DB is being replicated
They both run the same exact SQL databases (local is a replica of the prod environment).
I need to create a materialized view in MySQL which will contain a subset of the record_status_events table. Below you will find the query that does this. This same exact query works in under a second on local, but halts on prod. It gets stuck in the "Sending data" status. If I export the data and import it manually it works fine. It is just the INSERT INTO ... SELECT combo causing the problems.
The underlying query I am trying to insert runs under 150ms and has 50k entries.
I tried:
- Changing the DB engine
- Wrapping it in a transaction
- Removing all indexes
- Removing the primary key
- Using the auto increment and ignoring the key from the source table
DROP TABLE IF EXISTS `hlh_record_status_submission`;
CREATE TABLE `hlh_record_status_submission` (
`id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`record_id` int(10) unsigned NOT NULL,
`status` tinyint(3) unsigned NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `Created At` (`created_at`),
KEY `status` (`status`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `hlh_record_status_submission`
SELECT * FROM `hlh_record_status_events`
WHERE id IN (SELECT MIN(id) FROM `hlh_record_status_events`
WHERE status IN (21,24,31,40) GROUP BY record_id);
-
Maybe some other transaction is still holding some locks?sticky bit– sticky bit2021年02月01日 20:58:48 +00:00Commented Feb 1, 2021 at 20:58
2 Answers 2
Give this formulation a try:
INSERT INTO `hlh_record_status_submission`
SELECT rse.* FROM `hlh_record_status_events` AS rse
JOIN ( SELECT MIN(id) AS min_id
FROM `hlh_record_status_events`
WHERE status IN (21,24,31,40)
GROUP BY record_id
) AS x ON min_Id = rse.id;
and replace INDEX(status)
with
INDEX(status, record_id, id)
Also, you should really change all your tables to InnoDB.
The subquery (a "derived" table) will be faster because of the index starting with what is in its WHERE
clause and because that index is "covering". The JOINing
to the outer part is (I assume) using the PRIMARY KEY(id)
making that efficient.
Historically, the construct IN ( SELECT ... )
has been notoriously inefficient, so I try to avoid it. JOIN
and EXISTS ( SELECT ... )
are sometimes viable alternatives.
More discussion of building optimal indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
-
This actually worked. The formulation you provided. Can you explain why so i can solve such issues in the future?IgorR– IgorR2021年02月02日 09:44:00 +00:00Commented Feb 2, 2021 at 9:44
-
1@IgorR - I added a few paragraphs to my Answer.Rick James– Rick James2021年02月02日 17:22:50 +00:00Commented Feb 2, 2021 at 17:22
Saw this and decided to weigh in a bit. Usually, first task is simplification so:
DROP TABLE IF EXISTS `hlh_record_status_submission`;
CREATE TABLE `hlh_record_status_submission` (
`id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`record_id` int(10) unsigned NOT NULL,
`status` tinyint(3) unsigned NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `Created At` (`created_at`),
KEY `status` (`status`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
With MinEventForRecord as
(
SELECT `record_id`, MIN(`id`) as MinID
FROM `hlh_record_status_events`
WHERE status IN (21,24,31,40)
GROUP BY record_id
), MinEvents as
(
SELECT `id`,`user_id`, `record_id`, `status`,`created_at`
FROM `hlh_record_status_events` rse
WHERE exists (select 'x' from MinEventForRecord mer where mer.MinID=rse.id)
)
INSERT INTO `hlh_record_status_submission`
SELECT * FROM MinEvents
So here the intent is to break down the query into digestible pieces so that you (having the bigger picture) can figure out where the actual issue resides.