I have a table that stores the results of queries that are run at least once a day. There's a bit
column that represents whether the row is from the most recent run of a particular query with a particular set of arguments. (Yes, it's a functional dependency, but a necessary denormalization for performance, since most queries on this table are only interested in the most recent result.)
Since this bit
column is almost always a false value, I'm looking for the best way to tune queries returning only the true values. Partitioning isn't an option (Standard Edition). It seems like making the column SPARSE would be an interesting solution, but I believe that would require me to change the column to nullable and use NULL rather than 0 for false values. Seems a little kludgy.
Is there an option similar to SPARSE that would optimize space/performance for a non-null bit column with mostly (well over 99%) false values?
Pinal Dave's article indicates that both zero and null values are optimized, but this doesn't seem right to me, since these are different values -- unless MSSQL is using the same mechanism for non-null columns to indicate the default value. This would be great if it were true, but the BOL doesn't mention this.
1 Answer 1
A filtered index (WHERE IsMostRecentRun = 1
) sounds like a better idea to me than using sparse. If you can make it so that false
is instead represented by null
, you may be able to do both, but while that will potentially save some space in the base table, I suspect the bigger gain would be in query performance from the filtered index - as long as it's covering. If you need too many columns to cover and/or need to also filter or join on other columns, then you may find a balancing act between the improvements you get from the seek or range scan on the filtered index and the costs of things like lookups to get at the rest of the columns.
That all said, it seems suspicious you would need an additional column to serve as a flag for this. Isn't this information redundant? Can't it be determined by other data? In which case I would focus on index tuning to optimize the use of the existing columns instead of storing and maintaining redundant data.
(Also I don't think Pinal is correct.)
-
Thanks Aaron. It is redundant, but the alternative requires a correlated subquery that is easily an order of magnitude slower (even with a covering index). Table partitioning would be a great solution, but I only have the Standard license. I'm looking forward to getting into the office tomorrow and running some tests with a filtered index.richardtallent– richardtallent2015年01月26日 06:38:39 +00:00Commented Jan 26, 2015 at 6:38
Explore related questions
See similar questions with these tags.
NULL
and0
values usingSPARSE
column, and Neither the msdn link nor the Pinal's article says anything about optimizing a sql query by using aSPARSE
column.