The following query is taking over 800ms to run, and returning 300 rows. When deployed to SQL Azure, it takes much longer on an affordable price tier.
SELECT
Tests.Id,
Tests.Title,
Tests.AuthorId,
Tests.[Status],
Users.FirstName + ' ' + Users.LastName AS AuthorName,
(SELECT
COUNT(1)
FROM Results LEFT JOIN Users ON Results.UserId = Users.Id
WHERE
Results.TestId = Tests.Id AND
Results.MarkedBy IS NULL AND
Results.QuestionNumber >= 1 AND
EXISTS (
(SELECT ClassName FROM UserClasses WHERE UserClasses.UserId = Users.Id)
INTERSECT
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
)
) AS UnmarkedCount,
(CASE WHEN EXISTS (SELECT 1 FROM Results WHERE Results.TestId = Tests.Id)
THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT) END
) AS AnyResults,
(SELECT Stuff((SELECT ',' + ClassName FROM
(
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
) x FOR XML PATH ('')),1,1,'')
) AS Classes
FROM
Tests INNER JOIN Users ON Tests.AuthorId = Users.Id
WHERE
Users.SchoolId = @schoolId AND Tests.Status <= 4
An overview of the schema:
Usersinclude students and teachers.UserClassesmatches many users to many class names.TestClassesmatches many tests to many class names.- Each test in
Testscan have multipleResults- one per question per student.
The query returns a list of tests, using subqueries to find:
UnmarkedCount: How many unmarked results exist for this test, where the intersection of the following is not empty:- The classes of the student of this result
- The test's classes
- The teacher's classes
AnyResults: Are there any results for this test?Classes: As a comma-separated list, which of the teacher's classes are assigned to this test?
Note that if we remove the condition where three queries are intersected, the execution time is reduced to 150ms. However, this logic is required.
How could this be improved?
Further Details:
Query Execution Plan
This is an extract from the query execution plan, where the heavy lifting seems to occur. I can't see anywhere suggesting indexes.
Business logic
The procedure returns a list of all tests at a given school. For each test, it calculates:
- UnmarkedCount: How many results are not yet marked for students in classes which are both allocated to this test and taught by the current user?
- Classes: Which of the classes allocated to this test does the current user teach?
2 Answers 2
Let's just focus on this part, because that's where your performance goes:
SELECT
COUNT(1)
FROM Results LEFT JOIN Users ON Results.UserId = Users.Id
WHERE
Results.TestId = Tests.Id AND
Results.MarkedBy IS NULL AND
Results.QuestionNumber >= 1 AND
EXISTS (
(SELECT ClassName FROM UserClasses WHERE UserClasses.UserId = Users.Id)
INTERSECT
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
)
That pattern of EXISTS (... INTERSECT ...) is better written as a chain of INNER JOIN.
The query optimizer of your database already did that internally as well, but it chose the wrong order for the join, resulting in overly large temporary result sets. Especially when joining UserClasses straight on Results without applying the more selective filter by @teacherId first.
SELECT
COUNT(1)
FROM Results
INNER JOIN TestClasses ON
TestClasses.TestId = Tests.Id AND
TestClasses.TestId = Results.TestId
INNER JOIN UserClasses AS TeacherClass ON
TeacherClass.UserId = @teacherId AND
TeacherClass.ClassName = TestClasses.ClassName
INNER JOIN UserClasses AS UserClass ON
UserClass.UserId = Results.UserId AND
UserClass.ClassName = TestClasses.ClassName
WHERE
Results.MarkedBy IS NULL AND
Results.QuestionNumber >= 1
I reordered the JOIN clauses to ensure that the product remains as small as possible after each single step. Further, I eliminated the unnecessary join on the User schema.
However, you don't actually need full INNER JOIN either. If your database system supports that, you can safely replace the 2nd and 3rd of the INNER JOIN with LEFT SEMI JOIN operators instead.
So much for fixing the inner select. But as a matter of fact, now we don't even need to do it as a subquery any more, but can just handle if as a LEFT JOIN with COUNT and GROUP BY on the outmost query.
Whether this actually gains any performance needs to be tested.
There are also a couple of flaws in your database scheme:
Take the UserClasses table schema. You are abusing it to describe both the roles of teacher and student for any given class, without distinguishing between these two. I suspect you coded the user role into the Users schema instead, but it would have been better to store different roles in different schemes.
You are apparently storing class names as string literals in multiple schemes, this is an indirect violation of the 2NF, but even worse, it requires string comparisons to match the corresponding columns against each other. This should be refactored ASAP.
There also appears to be a possible design flaw in Results. If the same test is reused by two different classes, and a pupil is enrolled into both, his test results are now shared between both classes. Test results should probably linked to a specific enrollment to a class, rather than just to the generic test. This also allows to simplify this query further, as the most expensive part of joining on UserClass for querying pupil enrollment is then obsolete.
-
\$\begingroup\$ Huge thanks for your detailed response, which not only answers my question but also gives general advice about my database which I really appreciate. I have a lot to learn, and you have given some helpful pointers! Please could you show me the SQL you are suggesting when you say "now we don't even need to do it as a subquery any more..."? \$\endgroup\$James– James2016年09月07日 22:29:03 +00:00Commented Sep 7, 2016 at 22:29
EXISTS (
(SELECT ClassName FROM UserClasses WHERE UserClasses.UserId = Users.Id)
INTERSECT
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
)
(SELECT Stuff((SELECT ',' + ClassName FROM
(
(SELECT ClassName FROM TestClasses WHERE TestClasses.TestId = Tests.Id)
INTERSECT
(SELECT ClassName FROM UserClasses WHERE UserId = @teacherId)
) x FOR XML PATH ('')),1,1,'')
) AS Classes
I'm assuming that ClassName is a string column of some type. Then those intersections are a bit of a red flag. If you can refactor the schema so that there's an integer surrogate key, it seems to me that it would be worth trying that and profiling again.
INTERSECTIONclauses as an independent subquery and thenJOINit? The business logic in that subquery is not clear to me, so I hesitate to try to do that myself. Maybe you could edit your question to spell out that logic more clearly. \$\endgroup\$