The following query is taking over 800ms to run, and returning 300 rows. When deployed to SQL Azure, it takes much longer on an affordable price tier.
SELECT
Tests.Id,
Tests.Title,
Tests.AuthorId,
Tests.[Status],
Users.FirstName + ' ' + Users.LastName AS AuthorName,
(SELECT
COUNT(1)
FROM Results LEFT JOIN Users ON Results.UserId = Users.Id
WHERE
Results.TestId = Tests.Id AND
Results.MarkedBy IS NULL AND
Results.QuestionNumber >= 1 AND
EXISTS (
(SELECT ClassName FROM UserClasses WHERE UserClasses.UserId = Users.Id)
INTERSECT
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
)
) AS UnmarkedCount,
(CASE WHEN EXISTS (SELECT 1 FROM Results WHERE Results.TestId = Tests.Id)
THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT) END
) AS AnyResults,
(SELECT Stuff((SELECT ',' + ClassName FROM
(
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
) x FOR XML PATH ('')),1,1,'')
) AS Classes
FROM
Tests INNER JOIN Users ON Tests.AuthorId = Users.Id
WHERE
Users.SchoolId = @schoolId AND Tests.Status <= 4
An overview of the schema:
Users
include students and teachers.UserClasses
matches many users to many class names.TestClasses
matches many tests to many class names.- Each test in
Tests
can have multipleResults
- one per question per student.
The query returns a list of tests, using subqueries to find:
UnmarkedCount
: How many unmarked results exist for this test, where the intersection of the following is not empty:- The classes of the student of this result
- The test's classes
- The teacher's classes
AnyResults
: Are there any results for this test?Classes
: As a comma-separated list, which of the teacher's classes are assigned to this test?
Note that if we remove the condition where three queries are intersected, the execution time is reduced to 150ms. However, this logic is required.
How could this be improved?
Further Details:
Query Execution Plan
This is an extract from the query execution plan, where the heavy lifting seems to occur. I can't see anywhere suggesting indexes.
Business logic
The procedure returns a list of all tests at a given school. For each test, it calculates:
- UnmarkedCount: How many results are not yet marked for students in classes which are both allocated to this test and taught by the current user?
- Classes: Which of the classes allocated to this test does the current user teach?
2 Answers 2
Let's just focus on this part, because that's where your performance goes:
SELECT
COUNT(1)
FROM Results LEFT JOIN Users ON Results.UserId = Users.Id
WHERE
Results.TestId = Tests.Id AND
Results.MarkedBy IS NULL AND
Results.QuestionNumber >= 1 AND
EXISTS (
(SELECT ClassName FROM UserClasses WHERE UserClasses.UserId = Users.Id)
INTERSECT
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
)
That pattern of EXISTS (... INTERSECT ...)
is better written as a chain of INNER JOIN
.
The query optimizer of your database already did that internally as well, but it chose the wrong order for the join, resulting in overly large temporary result sets. Especially when joining UserClasses
straight on Results
without applying the more selective filter by @teacherId
first.
SELECT
COUNT(1)
FROM Results
INNER JOIN TestClasses ON
TestClasses.TestId = Tests.Id AND
TestClasses.TestId = Results.TestId
INNER JOIN UserClasses AS TeacherClass ON
TeacherClass.UserId = @teacherId AND
TeacherClass.ClassName = TestClasses.ClassName
INNER JOIN UserClasses AS UserClass ON
UserClass.UserId = Results.UserId AND
UserClass.ClassName = TestClasses.ClassName
WHERE
Results.MarkedBy IS NULL AND
Results.QuestionNumber >= 1
I reordered the JOIN
clauses to ensure that the product remains as small as possible after each single step. Further, I eliminated the unnecessary join on the User
schema.
However, you don't actually need full INNER JOIN
either. If your database system supports that, you can safely replace the 2nd and 3rd of the INNER JOIN
with LEFT SEMI JOIN
operators instead.
So much for fixing the inner select. But as a matter of fact, now we don't even need to do it as a subquery any more, but can just handle if as a LEFT JOIN
with COUNT
and GROUP BY
on the outmost query.
Whether this actually gains any performance needs to be tested.
There are also a couple of flaws in your database scheme:
Take the UserClasses
table schema. You are abusing it to describe both the roles of teacher and student for any given class, without distinguishing between these two. I suspect you coded the user role into the Users
schema instead, but it would have been better to store different roles in different schemes.
You are apparently storing class names as string literals in multiple schemes, this is an indirect violation of the 2NF, but even worse, it requires string comparisons to match the corresponding columns against each other. This should be refactored ASAP.
There also appears to be a possible design flaw in Results
. If the same test is reused by two different classes, and a pupil is enrolled into both, his test results are now shared between both classes. Test results should probably linked to a specific enrollment to a class, rather than just to the generic test. This also allows to simplify this query further, as the most expensive part of joining on UserClass
for querying pupil enrollment is then obsolete.
-
\$\begingroup\$ Huge thanks for your detailed response, which not only answers my question but also gives general advice about my database which I really appreciate. I have a lot to learn, and you have given some helpful pointers! Please could you show me the SQL you are suggesting when you say "now we don't even need to do it as a subquery any more..."? \$\endgroup\$James– James2016年09月07日 22:29:03 +00:00Commented Sep 7, 2016 at 22:29
EXISTS (
(SELECT ClassName FROM UserClasses WHERE UserClasses.UserId = Users.Id)
INTERSECT
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
)
(SELECT Stuff((SELECT ',' + ClassName FROM
(
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
) x FOR XML PATH ('')),1,1,'')
) AS Classes
I'm assuming that ClassName
is a string column of some type. Then those intersections are a bit of a red flag. If you can refactor the schema so that there's an integer surrogate key, it seems to me that it would be worth trying that and profiling again.
INTERSECTION
clauses as an independent subquery and thenJOIN
it? The business logic in that subquery is not clear to me, so I hesitate to try to do that myself. Maybe you could edit your question to spell out that logic more clearly. \$\endgroup\$