Consolidating multiple related queries, each using different window and/or criteria for counting records

Question 1

I'm accumulating results for multiple contingency tables, one for each Test. As an example:

Contingency Table for TestA (one for each test, and for each word)
 EventsWithWord EventsWithoutWord
TestA passed n11 n10 
TestA failed n01 n00

For analysis, I need the number of events matching each of the following categories (this is how R expects the data results):

TestA passed for each word: n11
TestA results available for each word: n_1 (sum of n11 and n01)
TestA passed: n1_ (sum of n11 and n10)
TestA results available: n__

Table "TestResults": columns: CompanyID nvarchar(20) NOT NULL, TestName nvarchar(255) NOT NULL, CompanyResults bit NULL

Table "Data" (one row for each event): Columns: EventID int NOT NULL, CompanyID nvarchar(20) NOT NULL

Table "WordEventMap": Columns: EventID int NOT NULL, Word nvarchar(255) NOT NULL

Is there a way of combining some of the queries (e.g., with windowing functions)? Some of these tables are very large--40 million rows currently in WordEventMap and 1.3 million rows in Data, and I expect the results table will have about 4 million rows (40 different tests, 100,000 different words). The current execution plan does not consolidate any of the Common Table Expressions.

Current query:

WITH CTE1
AS (SELECT
 COUNT(DISTINCT Data.EventID) AS n__,
 TestResults.TestName
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName),
CTE2
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n11,
 TestResults.TestName,
 WordEventMap.Word
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
INNER JOIN WordEventMap
 ON Data.[EventID] = WordEventMap.EventID
WHERE TestResults.CompanyResults = 1
GROUP BY TestResults.TestName,
 WordEventMap.Word),
CTE3
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n1_,
 TestResults.TestName
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
WHERE TestResults.CompanyResults = 1
GROUP BY TestResults.TestName),
CTE4
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n_1,
 TestResults.TestName,
 WordEventMap.Word
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
INNER JOIN WordEventMap
 ON Data.[EventID] = WordEventMap.EventID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName,
 WordEventMap.Word)
SELECT
 CTE2.TestName,
 CTE2.Word,
 n11,
 n1_,
 n_1,
 n__
FROM CTE2
INNER JOIN CTE4
 ON CTE2.Word = CTE4.Word
 AND CTE2.TestName = CTE4.TestName
INNER JOIN CTE1
 ON CTE2.TestName = CTE1.TestName
INNER JOIN CTE3
 ON CTE2.TestName = CTE3.TestName

Here's a SQL fiddle using the above structure and the below data. The above query is saved as the View [Results]. http://sqlfiddle.com/#!6/a8155e/1

Sample data:

Table TestResults
CompanyA TestA 1
CompanyA TestB 0
CompanyB TestA NULL
CompanyB TestB 1
CompanyC TestA 0
CompanyC TestB 1
Table Data
1 CompanyA
2 CompanyA
3 CompanyB
4 CompanyC
5 CompanyB
Table WordEventMap
1 airplane
1 tightrope
1 eggplant
2 eggplant
2 aardvark
2 eggbeater
3 airplane
3 aardvark
3 spaghetti
4 airplane
4 eggplant
4 wikipedia
5 eggplant
5 eggbeater
5 tightrope
5 licorice
Results Set
TestA aardvark 1 2 1 3
TestB aardvark 1 3 2 5
TestA airplane 1 2 2 3
TestB airplane 2 3 3 5
TestA eggbeater 1 2 1 3
TestA eggplant 1 2 2 3
TestB eggplant 2 3 3 5
TestB licorice 1 3 1 5
TestB spaghetti 1 3 1 5
TestA tightrope 1 2 1 3
TestB tightrope 1 3 2 5
TestB wikipedia 1 3 1 5

Question 2

I would not expect the execution plan to ever "consolidate" CTEs. Primer: #BackToBasics : Common Table Expressions (CTEs). Can you provide a few rows of sample data (preferably at least two rows per / that fall into each CTE), desired results, and tag with the version of SQL Server you're using?

Question 3

By consolidate, I mean that a CTE is not a stand-alone query, and sql server can produce an execution plan in which the CTE is not its own branch. See scarydba.com/2015/01/14/common-table-expressions-not-tables . I'll work on getting the clarifying information you're asking for and will add it to the question's text.

Question 4

When you think that optimizer should be able to combine two CTEs into one why don't you write a single CTE, human beings are still smarter than optimizers :-)

As there's only a different WHERE-condition the CTEs 2&4 and 1&3 can be combined using conditional aggregation:

WITH CTE1
AS (SELECT
 COUNT(DISTINCT Data.EventID) AS n__,
 COUNT(DISTINCT CASE WHEN TestResults.CompanyResults = 1 THEN Data.[EventID] END) AS n1_, 
 TestResults.TestName
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName),
CTE2
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n_1,
 COUNT(DISTINCT CASE WHEN TestResults.CompanyResults = 1 THEN Data.[EventID] END) AS n11,
 TestResults.TestName,
 WordEventMap.Word
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
INNER JOIN WordEventMap
 ON Data.[EventID] = WordEventMap.EventID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName,
 WordEventMap.Word)
SELECT
 CTE2.TestName,
 CTE2.Word,
 n11,
 n1_,
 n_1,
 n__
FROM CTE2
INNER JOIN CTE1
 ON CTE2.TestName = CTE1.TestName
WHERE n11 > 0 
;

But the main performance problem are those DISTINCTs, based on uniqueness in your actual data you might be able to remove some (for your example data no DISTINCT is needed).

Question 5

Thanks. Removing the DISTINCT cut the execution time in half. Using your combination of CTEs cut the time by 2/3s.

dnoeth dnoeth 4,21613 silver badges14 bronze badges · Accepted Answer · 2017-07-19 08:57:54Z

When you think that optimizer should be able to combine two CTEs into one why don't you write a single CTE, human beings are still smarter than optimizers :-)

As there's only a different WHERE-condition the CTEs 2&4 and 1&3 can be combined using conditional aggregation:

WITH CTE1
AS (SELECT
 COUNT(DISTINCT Data.EventID) AS n__,
 COUNT(DISTINCT CASE WHEN TestResults.CompanyResults = 1 THEN Data.[EventID] END) AS n1_, 
 TestResults.TestName
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName),
CTE2
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n_1,
 COUNT(DISTINCT CASE WHEN TestResults.CompanyResults = 1 THEN Data.[EventID] END) AS n11,
 TestResults.TestName,
 WordEventMap.Word
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
INNER JOIN WordEventMap
 ON Data.[EventID] = WordEventMap.EventID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName,
 WordEventMap.Word)
SELECT
 CTE2.TestName,
 CTE2.Word,
 n11,
 n1_,
 n_1,
 n__
FROM CTE2
INNER JOIN CTE1
 ON CTE2.TestName = CTE1.TestName
WHERE n11 > 0 
;

But the main performance problem are those DISTINCTs, based on uniqueness in your actual data you might be able to remove some (for your example data no DISTINCT is needed).

Thanks. Removing the DISTINCT cut the execution time in half. Using your combination of CTEs cut the time by 2/3s.

Stack Exchange Network

Consolidating multiple related queries, each using different window and/or criteria for counting records

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Consolidating multiple related queries, each using different window and/or criteria for counting records

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions