4

I have a query which combines a LEFT JOIN and subquery. The dataset is quite big and the time to execute the statement is over 70 seconds.

SELECT
 s.siblings,
 l.id
FROM
 `list` l
 INNER JOIN
 child c ON c.id = l.child_id
 INNER JOIN
 parent p ON p.id = c.parent_id
 LEFT JOIN (
 SELECT COUNT(c.id) AS siblings, c.id, c.parent_id
 FROM child c
 GROUP BY c.id
 ) AS s ON s.parent_id = c.parent_id AND s.id != c.id
WHERE
 l.country = 1
GROUP BY l.id, s.siblings
ORDER BY l.dateadded

This query should return all lists for a country. Each list is specific to a unique baby. For each list I would like to return a count of the number of children that have the same parent.

If I remove the LEFT JOIN subquery the fetch time is 0.1 seconds. Is there a way to make the query more efficient?

asked Mar 11, 2016 at 11:54
4
  • This query would result in syntax error. Please correct it first. Then try to explain what it is supposed to do because there are two places (one inside the subquery and second in the external query) where it uses columns in the SELECT that are not in the GROUP BY list. This is non-standard SQL and can give unpredictable, non-repeatable results (i.e.: useless results). Commented Mar 11, 2016 at 12:03
  • @ypercubeTM would the syntax error be caused by list as it is a reserved keyword? I have updated question to fix this and added an explanation of the statement. Commented Mar 11, 2016 at 12:32
  • Could you include an EXPLAIN of your query in your question? Commented Mar 11, 2016 at 13:25
  • How many rows does the child table have and how many in each country? Does the query return anything else but 1 in the siblings column? Commented Mar 11, 2016 at 16:59

3 Answers 3

3

The main reason for the slow query is the join on a subquery. This will not use indexes. Then, you not only join with a derived table (subquery), but as well group total result based on subquery column - GROUP BY l.id, s.Siblings

In this case it could help to:

  • create temporary table from subquery, it also could include subquery for return correct parent_id
  • create index on this table
  • use temporary table in join
  • drop temporary table

This could have variants, but it is often faster and less server-loading than a complicated set of subqueries with joins.

Paul White
95.4k30 gold badges440 silver badges689 bronze badges
answered Mar 11, 2016 at 12:50
1

The query has lot of unnecessary complications. Plus the GROUP BY c.id in the derived table (assuming that child (id) is the primary key of that table) seems completely redundant. The result of the query should be always 1 for the siblings column, which is probably not the wanted result.

This would do what you want (find the number of siblings) in a much simpler way:

SELECT
 s.cnt - 1 AS siblings,
 l.id
FROM
 `list` AS l
 INNER JOIN
 child AS c ON c.id = l.child_id
 INNER JOIN 
 ( SELECT c.parent_id, COUNT(*) AS cnt
 FROM child AS c
 GROUP BY c.parent_id
 ) AS s ON s.parent_id = c.parent_id
WHERE
 l.country = 1
ORDER BY l.dateadded ;
Paul White
95.4k30 gold badges440 silver badges689 bronze badges
answered Mar 11, 2016 at 16:57
-1
s.siblings,
...
JOIN ( SELECT ... )
 -->
( SELECT COUNT(*) FROM ... ) AS siblings

That is, get rid of the LEFT JOIN and replace it with a subquery in the SELECT.

answered Mar 12, 2016 at 4:17

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.