2

Ok, so first off i am going to provide a bit of information on the database structure

I have 3 indexed tables that are used in this query (though one of them should probably be a weak entity, but this is how it was when i took over)

person
person_id (pk)
classes
class_id (pk)
course_id (fk)
learners_to_classes
id (pk)
person_id (fk)
class_id (fk)

The persons Table has about 46,100 records learners_to_classes has about 51,100 records and classes has about 1670 records.

The query i am working on is built dynamically. I am working on a person search feature which will list the records from person, plus an indicator if that person has completed 2 particular course types, based on a number of parameters.

there issue i come across how ever is the query time for when the only parameters supplied is either none, (which would probably be a rare use) or a combination of the course types has been completed.

so for example. i want a list of people who have aTrained set to Yes would produce the following query:

SELECT 
 p . *,
 IF(sum(c.course_id = 2) > 0, 'Yes','No') as `cTrained`,
 IF(sum(c.course_id = 4) > 0, 'Yes', 'No') as `aTrained`
FROM
 (`person` p)
LEFT JOIN
 (SELECT 
 `ltc`.`person_id`, `c`.`class_id`, `c`.`course_id`
 FROM
 (`classes` c)
 INNER JOIN `learners_to_classes` ltc ON `ltc`.`class_id` = `c`.`class_id`) AS c ON `c`.`person_id` = `p`.`person_id`
GROUP BY `p`.`person_id`
HAVING sum(c.course_id = 4) > 0
ORDER BY p.`firstname` asc

searching for people who have cTrained='Yes' and a aTrained='Yes' generates the following query:

SELECT 
 p . *,
 IF(sum(c.course_id = 2) > 0,'Yes','No') as `connectorTrained`,
 IF(sum(c.course_id = 4) > 0,'Yes','No') as `asistTrained`
FROM
 (`person` p)
 LEFT JOIN
 (SELECT 
 `ltc`.`person_id`, `c`.`class_id`, `c`.`course_id`
 FROM
 (`classes` c)
 INNER JOIN `learners_to_classes` ltc ON `ltc`.`class_id` = `c`.`class_id`) AS c ON `c`.`person_id` = `p`.`person_id`
GROUP BY `p`.`person_id`
HAVING sum(c.course_id = 2) > 0
 AND sum(c.course_id = 4) > 0
ORDER BY `firstname` asc

These queries how ever are taking over 100 seconds to execute, and i cant seem to think of a way to optimize them. I was hoping someone one here might have an idea

asked May 19, 2015 at 23:10

3 Answers 3

4

First, get rid of the subquery. It is not needed and it can interfere with optimization. Second, you don't need a left join, because the having clause is requiring matches.

SELECT p .*,
 IF(sum(c.course_id = 2) > 0, 'Yes', 'No') as cTrained,
 IF(sum(c.course_id = 4) > 0, 'Yes', 'No') as aTrained
FROM person p JOIN
 learners_to_classes ltc
 ON p.person_id = ltc.person_id JOIN
 classes c
 ON ltc.class_id = c.class_id
GROUP BY p.person_id
HAVING sum(c.course_id = 4) > 0
ORDER BY p.firstname asc

For this query, you want the obvious indexes: learners_to_classes(person_id, class_id) and classes(class_id). The first seems to be missing.

answered May 19, 2015 at 23:17
1
  • not every person has completed a course, but having just written that it occures to me that if either of the flags are set to yes, they would have and an inner join would do the trick. Commented May 19, 2015 at 23:19
1

If you can provide sqlfiddle with data sample that would help a lot. But so far test this query please:

SELECT 
 p . *,
 c.cTrained,
 c.aTrained
FROM
 (`person` p)
INNER JOIN
 (SELECT 
 `ltc`.`person_id`, `c`.`class_id`, `c`.`course_id`,
 IF(sum(c.course_id = 2) > 0, 'Yes','No') as `cTrained`,
 IF(sum(c.course_id = 4) > 0, 'Yes', 'No') as `aTrained`
 FROM
 `classes` c
 INNER JOIN `learners_to_classes` ltc 
 ON `ltc`.`class_id` = `c`.`class_id`
 GROUP BY `ltc`.`person_id`
 HAVING aTrained = 'Yes'
 ) AS c 
ON `c`.`person_id` = `p`.`person_id`
ORDER BY p.`firstname` asc

This query is analogue of your first query. It should return same result but faster because SUM and GROUP BY is done just in subquery.

answered May 19, 2015 at 23:30
3
  • Hi Alex, Thanks for your responce. The reason i chose a left join is because not every 'person' is in learners_to_classes. I adjusted my solution so that the type of join depends on if im searching for people who have done a course or not. Commented May 20, 2015 at 2:35
  • Since you use HAVING sum(c.course_id = 4) > 0 you never get records that has no courses Commented May 20, 2015 at 3:02
  • yes, but the HAVING clause is not always in the query, it is only there if i am searching for people who have or have not done a specific course. if this isnt in the criteria the having clause doenst come up. I should have been more clear with my previous answer, i adjusted my scripts to check if the having clause are going to be used, if they are, i use an inner join, if not, i use the left join. Commented May 20, 2015 at 4:13
0

Might you be able to avoid the huge joins by: a) creating an intermediary HEAP type table; b) processing some of the data with PHP arrays?

I've used both of those in the past to gain 100-fold efficiency increases for complex queries.

:-)

answered May 19, 2015 at 23:20
1
  • Please demonstrate to the OP on how to do this Commented May 19, 2015 at 23:24

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.