In SQL Server 2016 I have a scenario where data will be processed according to different aggregation functions in a large GROUP BY ROLLUP
. I would like to have a stored procedure that has a parameter that specifies which aggregation function to use to describe the groupings in a way that does not risk SQL injection and takes advantage of compilation (it is a heavy stored procedure).
My thoughts are to use a collection of queries that summarize the data's groupings on a particular aggregate function. (e.g. agg.DataMin, agg.DataMedian, agg.DataWeightedAverage, and so on). Then use these with the parameter in a CTE
WITH AggData AS
(
SELECT * FROM agg.DataMin WHERE @AggFunction = 1
UNION ALL
SELECT * FROM agg.DataMedian WHERE @AggFunction = 2
UNION ALL
SELECT * FROM agg.DataWeightedAverage WHERE @AggFunction = 3
)
SELECT ...
My concerns are query performance and industry best practice. The data table is of a reasonable size (2+ Gig). I will have to add many aggregate queries with some being inline table-valued functions for some leave-out aggregations.
In the above, will the queries/table-valued functions only execute when the @AggFunction
matches the WHERE
condition or will they all execute and filter after the results are returned? If the latter, is there a method to short-circuit the evaluation of the unneeded queries at run-time? Also, is there some standard method to perform this in SQL that I have overlooked?
1 Answer 1
Contradiction Detection could kick in to make sure only one of the statements is run, and in my simple test it did as long as there was a statement-level recompile hint, but why risk it? For example:
USE tempdb
GO
-- CREATE SCHEMA agg
--DROP TABLE agg.DataMin
--DROP TABLE agg.DataMedian
--DROP TABLE agg.DataWeightedAverage
--GO
CREATE TABLE agg.DataMin ( x INT PRIMARY KEY )
CREATE TABLE agg.DataMedian ( x INT PRIMARY KEY )
CREATE TABLE agg.DataWeightedAverage ( x INT PRIMARY KEY )
GO
INSERT INTO agg.DataMin ( x )
SELECT object_id FROM sys.all_objects
INSERT INTO agg.DataMedian ( x )
SELECT object_id FROM sys.all_objects WHERE type = 'P'
INSERT INTO agg.DataWeightedAverage ( x )
SELECT object_id FROM sys.all_objects WHERE type = 'X'
GO
-- Are there some situations when it wouldn't...
DECLARE @AggFunction INT = 1
;WITH AggData AS
(
SELECT * FROM agg.DataMin WHERE @AggFunction = 1
UNION ALL
SELECT * FROM agg.DataMedian WHERE @AggFunction = 2
UNION ALL
SELECT * FROM agg.DataWeightedAverage WHERE @AggFunction = 3
)
SELECT *
FROM AggData
OPTION ( RECOMPILE )
My results: Recompile in action
In this simple example, only one table is scanned on the left with the recompile, and 3 tables are scanned on the right, without the recompile. The recompile hint allows the optimizer to "see" the parameter value and act accordingly. In a stored procedure where parameter sniffing would be used, a recompile would also be needed to get the same behaviour, either at statement or stored-proc level.
However I cannot say if there are no situations where contradiction detection would not occur; and you can't prove a negative. To put it another way, I cannot prove contradiction detection would always occur even with a recompile. There may be some unknown situations where even with a recompile it does not occur; excessive complexity springs to mind.
Also, there is no real advantage to using the CTE in your example, so why not keep it simple? You could just write some simple procedural SQL with IF...THEN...ELSE
which would guarantee only one of your statements would fire, eg
DECLARE @AggFunction INT = 99
IF @AggFunction = 1
SELECT * FROM agg.DataMin
ELSE IF @AggFunction = 2
SELECT * FROM agg.DataMedian
ELSE IF @AggFunction = 3
SELECT * FROM agg.DataWeightedAverage
ELSE
RAISERROR( 'Unknown value for parameter @AggFunction (%i).', 16, 1, @AggFunction )
Add some parameter checking while you're at it. Hopefully this meets your requirements of guaranteeing only one statement is compiled when needed, is safe and hopefully simple to implement.
HTH
-
3I suspect that those filters will have startup predicates and only one branch was in fact scanned.Martin Smith– Martin Smith2016年06月06日 16:32:18 +00:00Commented Jun 6, 2016 at 16:32
-
1If used in a larger query though this pattern might not always get optimised like that and even if it does the cardinality estimates will likely be less accurate as it is the same plan irrespective of which branch will be executed.Martin Smith– Martin Smith2016年06月06日 16:43:34 +00:00Commented Jun 6, 2016 at 16:43
-
5@Martin Yes, that's exactly why I'm a big fan of dynamic SQL for this. Compile a plan for each possible branch, that compilation overhead will be worth it in the log run. And if parameter variance or data skew leads to parameter sniffing issues, you can always compile every time, too.Aaron Bertrand– Aaron Bertrand2016年06月06日 16:46:52 +00:00Commented Jun 6, 2016 at 16:46
-
@wBob If I use a
CREATE TYPE TABLE
with theIF
s and embed that into theGROUP BY ROLLUP
will I get the benefit of a precompiled sproc? All the result sets from the aggregated views/table-valued functions will have the same schema.Edmund– Edmund2016年06月06日 17:35:04 +00:00Commented Jun 6, 2016 at 17:35 -
1All procs must be compiled before execution. Some sections of the proc may compile separately (eg dynamic SQL) and some sections may recompile (eg triggered by schema change, forced recompile). Table types basically behave like table variables so watch out for those estimated rowcounts of 1. This may not matter if you're not using them in joins. However I can't see the value of using them here; if you're inserting into the type to present elsewhere, why not just present it? I worked up an example here, see if it helps.wBob– wBob2016年06月07日 15:04:04 +00:00Commented Jun 7, 2016 at 15:04
Explore related questions
See similar questions with these tags.
@AggFunction
is an expression that your code expects.SET @sql += CASE @AggFunction WHEN 1 THEN N'agg.DataMin' WHEN 2 THEN N'agg.DataMedian' WHEN 3 THEN N'agg.DataWeightedAverage' ELSE NULL END
I'm not sure how this is vulnerable to SQL injection.