How to Speed Up my SQL Query?

Question 1

I have a query that is taking much longer than I expect it should, and I can not figure out how to speed it up.

For a little bit of background on the table structure, these are audited records. There's a parent record, which has many child records. If a parent record is modified, an audit is written for it (but not for the children). If a child record is modified, an audit record is written for both the parent and the child (but not the other children).

Both of these tables have an integer ID and a revision number. The child table also has a column for the parent's ID. The way these are accessed generally is for the parent record to be selected, then the children records with the most recent revisions to the parent's revision number are brought in.

For example, if parent A has a revision of 100, and child B has records with revisions of 101, 98, and 90. Revision 98 will be brought in for child B.

Now, this is usually done on a per-parent basis, and the query is pretty quick. I'm trying to make a similar query that will give me all of the relevant IDs and revision numbers for each child. This is my query -

SELECT c.id AS cId, MAX(c.REV) AS cRev FROM Parent_AUD p 
INNER JOIN Child_AUD c ON p.id = c.parent_id AND p.REV >= c.REV
GROUP BY p.id, p.REV, c.id

This takes over 20 minutes to run (I've never actually let it finish, so I'm not sure how long exactly it takes). There's about 2 millions parent records and 13 million child records, which I don't think warrants that much time.

This is the output of EXPLAIN -

+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+
| 1 | SIMPLE | parent | index | PRIMARY,FK_PARENT_REV | FK_PARENT_REV | 4 | NULL | 2150722 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | child | ref | FK_CHILD_REV,parent_id_index | parent_id_index | 9 | db.p.id | 68 | Using where; Using index |
+------+-------------+--------+-------+----------------------------------------------------+-----------------------------+---------+----------------------+---------+----------------------------------------------+

I have single-column indexes on the parent ID, the child ID, the parent revision number, the child revision number, and the parent ID within the child table.

Does anybody have any ideas on how I can speed this up?

EDIT: This is the output of SHOW CREATE TABLE for both tables (with unrelevant column names changed) -

Parent_AUD | CREATE TABLE `Parent_AUD` (
 `id` bigint(20) NOT NULL,
 `REV` int(11) NOT NULL,
 `REVTYPE` tinyint(4) DEFAULT NULL,
 `dateModified` datetime(6) DEFAULT NULL,
 `colOne` longblob DEFAULT NULL,
 `colTwo` int(11) DEFAULT NULL,
 `colThree` bigint(20) DEFAULT NULL,
 `colFour` bigint(20) DEFAULT NULL,
 `colFive` varchar(255) DEFAULT NULL,
 `colSix` tinyint(1) DEFAULT NULL,
 `colSeven` varchar(255) DEFAULT NULL,
 `colEight` bigint(20) DEFAULT NULL,
 `colNine` tinyint(1) DEFAULT NULL,
 `colTen` bigint(20) DEFAULT NULL,
 PRIMARY KEY (`id`,`REV`),
 KEY `FK_PARENT_REV` (`REV`),
 CONSTRAINT `FK_PARENT_REV` FOREIGN KEY (`REV`) REFERENCES `REVINFO` (`REV`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Child_AUD | CREATE TABLE `Child_AUD` (
 `id` bigint(20) NOT NULL,
 `REV` int(11) NOT NULL,
 `REVTYPE` tinyint(4) DEFAULT NULL,
 `colOne` longblob DEFAULT NULL,
 `colTwo` longblob DEFAULT NULL,
 `colThree` bit(1) DEFAULT NULL,
 `colFour` int(11) DEFAULT NULL,
 `colFive` longblob DEFAULT NULL,
 `parent_id` bigint(20) DEFAULT NULL,
 `colSix` varchar(255) DEFAULT NULL,
 `colSeven` tinyint(1) DEFAULT NULL,
 `colEight` longblob DEFAULT NULL,
 `colNine` tinyint(1) DEFAULT NULL,
 `colTen` tinyint(1) DEFAULT NULL,
 `dateModified` datetime(6) DEFAULT NULL,
 PRIMARY KEY (`id`,`REV`),
 KEY `FK_CHILD_REV` (`REV`),
 KEY `parent_id_index` (`parent_id`),
 CONSTRAINT `FK_CHILD_REV` FOREIGN KEY (`REV`) REFERENCES `REVINFO` (`REV`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Question 2

Since the columns in GROUP BY are from two different tables there isn't much you can do apart from increasing sort_buffer_size and/or using faster storage. May be a composite index on Child_AUD (id, parent_id, rev) or Child_AUD (parent_id, rev, id).

Question 3

Why are you grouping on more columns than you have in the select clause? Nothing wrong with that, but how do you identify to which Group each row belongs?

Question 4

@Lennart Thanks, that's a good point, I think those are leftovers from when I was coming up with the query. I don't think I need the groupings on that parent table.

Question 5

Why is the c.id in the GROUP BY clause? That seems to defeat the purpose of grouping.

Question 6

@ypercubeTM I would expect to need the c.id in the GROUP BY clause, as I need to get the max REV on a per-child basis.

Question 7

KEY `parent_id_index` (`parent_id`),

-->

KEY (parent_id, REV, id)

The order is critical. First it filters on parent_id =, then no a range of REV, and finally has id to make the index "covering".

Question 8

This brought the query down to 7m40s. Which is within the timeframe that I need.

Question 9

@Corey see also 'groupwise-max'; see the tag I added.

Question 10

Since MAX will never increase due to the JOIN, you might try to filter using EXISTS instead:

SELECT c.id AS cId, MAX(c.REV) AS cRev 
FROM Child_AUD c
WHERE EXISTS (
 SELECT 1
 FROM Parent_AUD p 
 WHERE p.id = c.parent_id 
 AND p.REV >= c.REV
)
GROUP BY c.id;

An index like:

CREATE INDEX ON Child_AUD (ID, PARENT_ID, REV);

may help.

Question 11

This may have done the trick, looks like it's down to 3 minutes now!

Question 12

Do you have a rough estimate on how many child rows that has null as parent_id?

Question 13

About a million.

Question 14

I've added an index in the answer

Question 15

Ah, actually I realized that this won't work, as I do need the grouping by p.REV in order to find the biggest child revision on a per-parent revision basis. Not sure if there's a way to get around the join to the parent table for this.

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Accepted Answer · 2019-08-23 22:42:55Z

0

KEY `parent_id_index` (`parent_id`),

-->

KEY (parent_id, REV, id)

The order is critical. First it filters on parent_id =, then no a range of REV, and finally has id to make the index "covering".

Share

Improve this answer

answered Aug 23, 2019 at 22:42

Rick James's user avatar

Rick James Rick James

80.7k5 gold badges52 silver badges119 bronze badges

2

This brought the query down to 7m40s. Which is within the timeframe that I need.

Corey
– Corey

2019年08月26日 16:02:19 +00:00
Commented Aug 26, 2019 at 16:02
@Corey see also 'groupwise-max'; see the tag I added.

Rick James
– Rick James

2019年08月26日 16:09:02 +00:00
Commented Aug 26, 2019 at 16:09

Add a comment |

Stack Exchange Network

How to Speed Up my SQL Query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to Speed Up my SQL Query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions