1

I have a query that is taking much longer than I expect it should, and I can not figure out how to speed it up.

For a little bit of background on the table structure, these are audited records. There's a parent record, which has many child records. If a parent record is modified, an audit is written for it (but not for the children). If a child record is modified, an audit record is written for both the parent and the child (but not the other children).

Both of these tables have an integer ID and a revision number. The child table also has a column for the parent's ID. The way these are accessed generally is for the parent record to be selected, then the children records with the most recent revisions to the parent's revision number are brought in.

For example, if parent A has a revision of 100, and child B has records with revisions of 101, 98, and 90. Revision 98 will be brought in for child B.

Now, this is usually done on a per-parent basis, and the query is pretty quick. I'm trying to make a similar query that will give me all of the relevant IDs and revision numbers for each child. This is my query -

SELECT c.id AS cId, MAX(c.REV) AS cRev FROM Parent_AUD p 
INNER JOIN Child_AUD c ON p.id = c.parent_id AND p.REV >= c.REV
GROUP BY p.id, p.REV, c.id

This takes over 20 minutes to run (I've never actually let it finish, so I'm not sure how long exactly it takes). There's about 2 millions parent records and 13 million child records, which I don't think warrants that much time.

This is the output of EXPLAIN -

+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+
| 1 | SIMPLE | parent | index | PRIMARY,FK_PARENT_REV | FK_PARENT_REV | 4 | NULL | 2150722 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | child | ref | FK_CHILD_REV,parent_id_index | parent_id_index | 9 | db.p.id | 68 | Using where; Using index |
+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+

I have single-column indexes on the parent ID, the child ID, the parent revision number, the child revision number, and the parent ID within the child table.

Does anybody have any ideas on how I can speed this up?

EDIT: This is the output of SHOW CREATE TABLE for both tables (with unrelevant column names changed) -

Parent_AUD | CREATE TABLE `Parent_AUD` (
 `id` bigint(20) NOT NULL,
 `REV` int(11) NOT NULL,
 `REVTYPE` tinyint(4) DEFAULT NULL,
 `dateModified` datetime(6) DEFAULT NULL,
 `colOne` longblob DEFAULT NULL,
 `colTwo` int(11) DEFAULT NULL,
 `colThree` bigint(20) DEFAULT NULL,
 `colFour` bigint(20) DEFAULT NULL,
 `colFive` varchar(255) DEFAULT NULL,
 `colSix` tinyint(1) DEFAULT NULL,
 `colSeven` varchar(255) DEFAULT NULL,
 `colEight` bigint(20) DEFAULT NULL,
 `colNine` tinyint(1) DEFAULT NULL,
 `colTen` bigint(20) DEFAULT NULL,
 PRIMARY KEY (`id`,`REV`),
 KEY `FK_PARENT_REV` (`REV`),
 CONSTRAINT `FK_PARENT_REV` FOREIGN KEY (`REV`) REFERENCES `REVINFO` (`REV`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Child_AUD | CREATE TABLE `Child_AUD` (
 `id` bigint(20) NOT NULL,
 `REV` int(11) NOT NULL,
 `REVTYPE` tinyint(4) DEFAULT NULL,
 `colOne` longblob DEFAULT NULL,
 `colTwo` longblob DEFAULT NULL,
 `colThree` bit(1) DEFAULT NULL,
 `colFour` int(11) DEFAULT NULL,
 `colFive` longblob DEFAULT NULL,
 `parent_id` bigint(20) DEFAULT NULL,
 `colSix` varchar(255) DEFAULT NULL,
 `colSeven` tinyint(1) DEFAULT NULL,
 `colEight` longblob DEFAULT NULL,
 `colNine` tinyint(1) DEFAULT NULL,
 `colTen` tinyint(1) DEFAULT NULL,
 `dateModified` datetime(6) DEFAULT NULL,
 PRIMARY KEY (`id`,`REV`),
 KEY `FK_CHILD_REV` (`REV`),
 KEY `parent_id_index` (`parent_id`),
 CONSTRAINT `FK_CHILD_REV` FOREIGN KEY (`REV`) REFERENCES `REVINFO` (`REV`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
asked Aug 23, 2019 at 14:41
10
  • Since the columns in GROUP BY are from two different tables there isn't much you can do apart from increasing sort_buffer_size and/or using faster storage. May be a composite index on Child_AUD (id, parent_id, rev) or Child_AUD (parent_id, rev, id). Commented Aug 23, 2019 at 15:19
  • Why are you grouping on more columns than you have in the select clause? Nothing wrong with that, but how do you identify to which Group each row belongs? Commented Aug 23, 2019 at 15:30
  • @Lennart Thanks, that's a good point, I think those are leftovers from when I was coming up with the query. I don't think I need the groupings on that parent table. Commented Aug 23, 2019 at 15:33
  • Why is the c.id in the GROUP BY clause? That seems to defeat the purpose of grouping. Commented Aug 23, 2019 at 15:43
  • @ypercubeTM I would expect to need the c.id in the GROUP BY clause, as I need to get the max REV on a per-child basis. Commented Aug 23, 2019 at 15:46

2 Answers 2

0
KEY `parent_id_index` (`parent_id`),

-->

KEY (parent_id, REV, id)

The order is critical. First it filters on parent_id =, then no a range of REV, and finally has id to make the index "covering".

answered Aug 23, 2019 at 22:42
2
  • This brought the query down to 7m40s. Which is within the timeframe that I need. Commented Aug 26, 2019 at 16:02
  • @Corey see also 'groupwise-max'; see the tag I added. Commented Aug 26, 2019 at 16:09
0

Since MAX will never increase due to the JOIN, you might try to filter using EXISTS instead:

SELECT c.id AS cId, MAX(c.REV) AS cRev 
FROM Child_AUD c
WHERE EXISTS (
 SELECT 1
 FROM Parent_AUD p 
 WHERE p.id = c.parent_id 
 AND p.REV >= c.REV
)
GROUP BY c.id;

An index like:

CREATE INDEX ON Child_AUD (ID, PARENT_ID, REV);

may help.

answered Aug 23, 2019 at 15:37
6
  • This may have done the trick, looks like it's down to 3 minutes now! Commented Aug 23, 2019 at 15:55
  • Do you have a rough estimate on how many child rows that has null as parent_id? Commented Aug 23, 2019 at 16:08
  • About a million. Commented Aug 23, 2019 at 16:18
  • I've added an index in the answer Commented Aug 23, 2019 at 16:23
  • Ah, actually I realized that this won't work, as I do need the grouping by p.REV in order to find the biggest child revision on a per-parent revision basis. Not sure if there's a way to get around the join to the parent table for this. Commented Aug 23, 2019 at 16:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.