0

I'm accumulating results for multiple contingency tables, one for each Test. As an example:

Contingency Table for TestA (one for each test, and for each word)
 EventsWithWord EventsWithoutWord
TestA passed n11 n10 
TestA failed n01 n00 

For analysis, I need the number of events matching each of the following categories (this is how R expects the data results):

  • TestA passed for each word: n11
  • TestA results available for each word: n_1 (sum of n11 and n01)
  • TestA passed: n1_ (sum of n11 and n10)
  • TestA results available: n__

Table "TestResults": columns: CompanyID nvarchar(20) NOT NULL, TestName nvarchar(255) NOT NULL, CompanyResults bit NULL

Table "Data" (one row for each event): Columns: EventID int NOT NULL, CompanyID nvarchar(20) NOT NULL

Table "WordEventMap": Columns: EventID int NOT NULL, Word nvarchar(255) NOT NULL

Is there a way of combining some of the queries (e.g., with windowing functions)? Some of these tables are very large--40 million rows currently in WordEventMap and 1.3 million rows in Data, and I expect the results table will have about 4 million rows (40 different tests, 100,000 different words). The current execution plan does not consolidate any of the Common Table Expressions.

Current query:

WITH CTE1
AS (SELECT
 COUNT(DISTINCT Data.EventID) AS n__,
 TestResults.TestName
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName),
CTE2
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n11,
 TestResults.TestName,
 WordEventMap.Word
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
INNER JOIN WordEventMap
 ON Data.[EventID] = WordEventMap.EventID
WHERE TestResults.CompanyResults = 1
GROUP BY TestResults.TestName,
 WordEventMap.Word),
CTE3
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n1_,
 TestResults.TestName
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
WHERE TestResults.CompanyResults = 1
GROUP BY TestResults.TestName),
CTE4
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n_1,
 TestResults.TestName,
 WordEventMap.Word
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
INNER JOIN WordEventMap
 ON Data.[EventID] = WordEventMap.EventID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName,
 WordEventMap.Word)
SELECT
 CTE2.TestName,
 CTE2.Word,
 n11,
 n1_,
 n_1,
 n__
FROM CTE2
INNER JOIN CTE4
 ON CTE2.Word = CTE4.Word
 AND CTE2.TestName = CTE4.TestName
INNER JOIN CTE1
 ON CTE2.TestName = CTE1.TestName
INNER JOIN CTE3
 ON CTE2.TestName = CTE3.TestName

Here's a SQL fiddle using the above structure and the below data. The above query is saved as the View [Results]. http://sqlfiddle.com/#!6/a8155e/1

Sample data:

Table TestResults
CompanyA TestA 1
CompanyA TestB 0
CompanyB TestA NULL
CompanyB TestB 1
CompanyC TestA 0
CompanyC TestB 1
Table Data
1 CompanyA
2 CompanyA
3 CompanyB
4 CompanyC
5 CompanyB
Table WordEventMap
1 airplane
1 tightrope
1 eggplant
2 eggplant
2 aardvark
2 eggbeater
3 airplane
3 aardvark
3 spaghetti
4 airplane
4 eggplant
4 wikipedia
5 eggplant
5 eggbeater
5 tightrope
5 licorice
Results Set
TestA aardvark 1 2 1 3
TestB aardvark 1 3 2 5
TestA airplane 1 2 2 3
TestB airplane 2 3 3 5
TestA eggbeater 1 2 1 3
TestA eggplant 1 2 2 3
TestB eggplant 2 3 3 5
TestB licorice 1 3 1 5
TestB spaghetti 1 3 1 5
TestA tightrope 1 2 1 3
TestB tightrope 1 3 2 5
TestB wikipedia 1 3 1 5
asked Jul 18, 2017 at 21:12
2
  • I would not expect the execution plan to ever "consolidate" CTEs. Primer: #BackToBasics : Common Table Expressions (CTEs). Can you provide a few rows of sample data (preferably at least two rows per / that fall into each CTE), desired results, and tag with the version of SQL Server you're using? Commented Jul 18, 2017 at 21:50
  • By consolidate, I mean that a CTE is not a stand-alone query, and sql server can produce an execution plan in which the CTE is not its own branch. See scarydba.com/2015/01/14/common-table-expressions-not-tables . I'll work on getting the clarifying information you're asking for and will add it to the question's text. Commented Jul 18, 2017 at 23:12

1 Answer 1

3

When you think that optimizer should be able to combine two CTEs into one why don't you write a single CTE, human beings are still smarter than optimizers :-)

As there's only a different WHERE-condition the CTEs 2&4 and 1&3 can be combined using conditional aggregation:

WITH CTE1
AS (SELECT
 COUNT(DISTINCT Data.EventID) AS n__,
 COUNT(DISTINCT CASE WHEN TestResults.CompanyResults = 1 THEN Data.[EventID] END) AS n1_, 
 TestResults.TestName
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName),
CTE2
AS (SELECT
 COUNT(DISTINCT Data.[EventID]) AS n_1,
 COUNT(DISTINCT CASE WHEN TestResults.CompanyResults = 1 THEN Data.[EventID] END) AS n11,
 TestResults.TestName,
 WordEventMap.Word
FROM TestResults
INNER JOIN Data
 ON TestResults.CompanyID = Data.CompanyID
INNER JOIN WordEventMap
 ON Data.[EventID] = WordEventMap.EventID
WHERE TestResults.CompanyResults IS NOT NULL
GROUP BY TestResults.TestName,
 WordEventMap.Word)
SELECT
 CTE2.TestName,
 CTE2.Word,
 n11,
 n1_,
 n_1,
 n__
FROM CTE2
INNER JOIN CTE1
 ON CTE2.TestName = CTE1.TestName
WHERE n11 > 0 
;

But the main performance problem are those DISTINCTs, based on uniqueness in your actual data you might be able to remove some (for your example data no DISTINCT is needed).

answered Jul 19, 2017 at 8:57
1
  • Thanks. Removing the DISTINCT cut the execution time in half. Using your combination of CTEs cut the time by 2/3s. Commented Jul 19, 2017 at 15:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.