Ok, so first off i am going to provide a bit of information on the database structure
I have 3 indexed tables that are used in this query (though one of them should probably be a weak entity, but this is how it was when i took over)
person
person_id (pk)
classes
class_id (pk)
course_id (fk)
learners_to_classes
id (pk)
person_id (fk)
class_id (fk)
The persons Table has about 46,100 records learners_to_classes has about 51,100 records and classes has about 1670 records.
The query i am working on is built dynamically. I am working on a person search feature which will list the records from person, plus an indicator if that person has completed 2 particular course types, based on a number of parameters.
there issue i come across how ever is the query time for when the only parameters supplied is either none, (which would probably be a rare use) or a combination of the course types has been completed.
so for example. i want a list of people who have aTrained set to Yes would produce the following query:
SELECT
p . *,
IF(sum(c.course_id = 2) > 0, 'Yes','No') as `cTrained`,
IF(sum(c.course_id = 4) > 0, 'Yes', 'No') as `aTrained`
FROM
(`person` p)
LEFT JOIN
(SELECT
`ltc`.`person_id`, `c`.`class_id`, `c`.`course_id`
FROM
(`classes` c)
INNER JOIN `learners_to_classes` ltc ON `ltc`.`class_id` = `c`.`class_id`) AS c ON `c`.`person_id` = `p`.`person_id`
GROUP BY `p`.`person_id`
HAVING sum(c.course_id = 4) > 0
ORDER BY p.`firstname` asc
searching for people who have cTrained='Yes' and a aTrained='Yes' generates the following query:
SELECT
p . *,
IF(sum(c.course_id = 2) > 0,'Yes','No') as `connectorTrained`,
IF(sum(c.course_id = 4) > 0,'Yes','No') as `asistTrained`
FROM
(`person` p)
LEFT JOIN
(SELECT
`ltc`.`person_id`, `c`.`class_id`, `c`.`course_id`
FROM
(`classes` c)
INNER JOIN `learners_to_classes` ltc ON `ltc`.`class_id` = `c`.`class_id`) AS c ON `c`.`person_id` = `p`.`person_id`
GROUP BY `p`.`person_id`
HAVING sum(c.course_id = 2) > 0
AND sum(c.course_id = 4) > 0
ORDER BY `firstname` asc
These queries how ever are taking over 100 seconds to execute, and i cant seem to think of a way to optimize them. I was hoping someone one here might have an idea
3 Answers 3
First, get rid of the subquery. It is not needed and it can interfere with optimization. Second, you don't need a left join
, because the having
clause is requiring matches.
SELECT p .*,
IF(sum(c.course_id = 2) > 0, 'Yes', 'No') as cTrained,
IF(sum(c.course_id = 4) > 0, 'Yes', 'No') as aTrained
FROM person p JOIN
learners_to_classes ltc
ON p.person_id = ltc.person_id JOIN
classes c
ON ltc.class_id = c.class_id
GROUP BY p.person_id
HAVING sum(c.course_id = 4) > 0
ORDER BY p.firstname asc
For this query, you want the obvious indexes: learners_to_classes(person_id, class_id)
and classes(class_id)
. The first seems to be missing.
-
not every person has completed a course, but having just written that it occures to me that if either of the flags are set to yes, they would have and an inner join would do the trick.user348120– user3481202015年05月19日 23:19:40 +00:00Commented May 19, 2015 at 23:19
If you can provide sqlfiddle with data sample that would help a lot. But so far test this query please:
SELECT
p . *,
c.cTrained,
c.aTrained
FROM
(`person` p)
INNER JOIN
(SELECT
`ltc`.`person_id`, `c`.`class_id`, `c`.`course_id`,
IF(sum(c.course_id = 2) > 0, 'Yes','No') as `cTrained`,
IF(sum(c.course_id = 4) > 0, 'Yes', 'No') as `aTrained`
FROM
`classes` c
INNER JOIN `learners_to_classes` ltc
ON `ltc`.`class_id` = `c`.`class_id`
GROUP BY `ltc`.`person_id`
HAVING aTrained = 'Yes'
) AS c
ON `c`.`person_id` = `p`.`person_id`
ORDER BY p.`firstname` asc
This query is analogue of your first query. It should return same result but faster because SUM
and GROUP BY
is done just in subquery.
-
Hi Alex, Thanks for your responce. The reason i chose a left join is because not every 'person' is in learners_to_classes. I adjusted my solution so that the type of join depends on if im searching for people who have done a course or not.user348120– user3481202015年05月20日 02:35:55 +00:00Commented May 20, 2015 at 2:35
-
Since you use
HAVING sum(c.course_id = 4) > 0
you never get records that has no coursesAlex– Alex2015年05月20日 03:02:11 +00:00Commented May 20, 2015 at 3:02 -
yes, but the HAVING clause is not always in the query, it is only there if i am searching for people who have or have not done a specific course. if this isnt in the criteria the having clause doenst come up. I should have been more clear with my previous answer, i adjusted my scripts to check if the having clause are going to be used, if they are, i use an inner join, if not, i use the left join.user348120– user3481202015年05月20日 04:13:19 +00:00Commented May 20, 2015 at 4:13
Might you be able to avoid the huge joins by: a) creating an intermediary HEAP type table; b) processing some of the data with PHP arrays?
I've used both of those in the past to gain 100-fold efficiency increases for complex queries.
:-)
-
Please demonstrate to the OP on how to do thisnomistic– nomistic2015年05月19日 23:24:59 +00:00Commented May 19, 2015 at 23:24