I have a query that is taking much longer than I expect it should, and I can not figure out how to speed it up.
For a little bit of background on the table structure, these are audited records. There's a parent record, which has many child records. If a parent record is modified, an audit is written for it (but not for the children). If a child record is modified, an audit record is written for both the parent and the child (but not the other children).
Both of these tables have an integer ID and a revision number. The child table also has a column for the parent's ID. The way these are accessed generally is for the parent record to be selected, then the children records with the most recent revisions to the parent's revision number are brought in.
For example, if parent A has a revision of 100, and child B has records with revisions of 101, 98, and 90. Revision 98 will be brought in for child B.
Now, this is usually done on a per-parent basis, and the query is pretty quick. I'm trying to make a similar query that will give me all of the relevant IDs and revision numbers for each child. This is my query -
SELECT c.id AS cId, MAX(c.REV) AS cRev FROM Parent_AUD p
INNER JOIN Child_AUD c ON p.id = c.parent_id AND p.REV >= c.REV
GROUP BY p.id, p.REV, c.id
This takes over 20 minutes to run (I've never actually let it finish, so I'm not sure how long exactly it takes). There's about 2 millions parent records and 13 million child records, which I don't think warrants that much time.
This is the output of EXPLAIN -
+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+
| 1 | SIMPLE | parent | index | PRIMARY,FK_PARENT_REV | FK_PARENT_REV | 4 | NULL | 2150722 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | child | ref | FK_CHILD_REV,parent_id_index | parent_id_index | 9 | db.p.id | 68 | Using where; Using index |
+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+
I have single-column indexes on the parent ID, the child ID, the parent revision number, the child revision number, and the parent ID within the child table.
Does anybody have any ideas on how I can speed this up?
EDIT: This is the output of SHOW CREATE TABLE for both tables (with unrelevant column names changed) -
Parent_AUD | CREATE TABLE `Parent_AUD` (
`id` bigint(20) NOT NULL,
`REV` int(11) NOT NULL,
`REVTYPE` tinyint(4) DEFAULT NULL,
`dateModified` datetime(6) DEFAULT NULL,
`colOne` longblob DEFAULT NULL,
`colTwo` int(11) DEFAULT NULL,
`colThree` bigint(20) DEFAULT NULL,
`colFour` bigint(20) DEFAULT NULL,
`colFive` varchar(255) DEFAULT NULL,
`colSix` tinyint(1) DEFAULT NULL,
`colSeven` varchar(255) DEFAULT NULL,
`colEight` bigint(20) DEFAULT NULL,
`colNine` tinyint(1) DEFAULT NULL,
`colTen` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`,`REV`),
KEY `FK_PARENT_REV` (`REV`),
CONSTRAINT `FK_PARENT_REV` FOREIGN KEY (`REV`) REFERENCES `REVINFO` (`REV`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Child_AUD | CREATE TABLE `Child_AUD` (
`id` bigint(20) NOT NULL,
`REV` int(11) NOT NULL,
`REVTYPE` tinyint(4) DEFAULT NULL,
`colOne` longblob DEFAULT NULL,
`colTwo` longblob DEFAULT NULL,
`colThree` bit(1) DEFAULT NULL,
`colFour` int(11) DEFAULT NULL,
`colFive` longblob DEFAULT NULL,
`parent_id` bigint(20) DEFAULT NULL,
`colSix` varchar(255) DEFAULT NULL,
`colSeven` tinyint(1) DEFAULT NULL,
`colEight` longblob DEFAULT NULL,
`colNine` tinyint(1) DEFAULT NULL,
`colTen` tinyint(1) DEFAULT NULL,
`dateModified` datetime(6) DEFAULT NULL,
PRIMARY KEY (`id`,`REV`),
KEY `FK_CHILD_REV` (`REV`),
KEY `parent_id_index` (`parent_id`),
CONSTRAINT `FK_CHILD_REV` FOREIGN KEY (`REV`) REFERENCES `REVINFO` (`REV`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
2 Answers 2
KEY `parent_id_index` (`parent_id`),
-->
KEY (parent_id, REV, id)
The order is critical. First it filters on parent_id =
, then no a range of REV
, and finally has id
to make the index "covering".
-
This brought the query down to 7m40s. Which is within the timeframe that I need.Corey– Corey2019年08月26日 16:02:19 +00:00Commented Aug 26, 2019 at 16:02
-
@Corey see also 'groupwise-max'; see the tag I added.Rick James– Rick James2019年08月26日 16:09:02 +00:00Commented Aug 26, 2019 at 16:09
Since MAX will never increase due to the JOIN, you might try to filter using EXISTS instead:
SELECT c.id AS cId, MAX(c.REV) AS cRev
FROM Child_AUD c
WHERE EXISTS (
SELECT 1
FROM Parent_AUD p
WHERE p.id = c.parent_id
AND p.REV >= c.REV
)
GROUP BY c.id;
An index like:
CREATE INDEX ON Child_AUD (ID, PARENT_ID, REV);
may help.
-
This may have done the trick, looks like it's down to 3 minutes now!Corey– Corey2019年08月23日 15:55:48 +00:00Commented Aug 23, 2019 at 15:55
-
Do you have a rough estimate on how many child rows that has null as parent_id?Lennart - Slava Ukraini– Lennart - Slava Ukraini2019年08月23日 16:08:19 +00:00Commented Aug 23, 2019 at 16:08
-
-
I've added an index in the answerLennart - Slava Ukraini– Lennart - Slava Ukraini2019年08月23日 16:23:16 +00:00Commented Aug 23, 2019 at 16:23
-
Ah, actually I realized that this won't work, as I do need the grouping by p.REV in order to find the biggest child revision on a per-parent revision basis. Not sure if there's a way to get around the join to the parent table for this.Corey– Corey2019年08月23日 16:38:34 +00:00Commented Aug 23, 2019 at 16:38
Explore related questions
See similar questions with these tags.
GROUP BY
are from two different tables there isn't much you can do apart from increasingsort_buffer_size
and/or using faster storage. May be a composite index onChild_AUD (id, parent_id, rev)
orChild_AUD (parent_id, rev, id)
.c.id
in the GROUP BY clause? That seems to defeat the purpose of grouping.