Our resident database expert is telling us that numbers tables are invaluable. I don't quite understand why. Here's a numbers table:
USE Model
GO
CREATE TABLE Numbers
(
Number INT NOT NULL,
CONSTRAINT PK_Numbers
PRIMARY KEY CLUSTERED (Number)
WITH FILLFACTOR = 100
)
INSERT INTO Numbers
SELECT
(a.Number * 256) + b.Number AS Number
FROM
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) a (Number),
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) b (Number)
GO
Per the blog post, the rationale given is
Numbers tables are truly invaluable. I use them all of the time for string manipulation, simulating window functions, populating test tables with lots of data, eliminating cursor logic, and many other tasks that would be incredibly difficult without them.
But I don't understand what those uses are, exactly -- can you provide some compelling, specific examples of where a "numbers table" saves you a ton of work in SQL Server -- and why we should have them?
4 Answers 4
I've seen many uses when you need to project 'missing data'. Eg. you have a time series (an access log for instance) and you want to show the number of hits per day for past 30 days (think analytics dashboard). If you do a select count(...) from ... group by day
you will get the count for every day, but the result will only have a row for each day you actually had at least one access. On the other hand if you first project a table of days from your numbers table (select dateadd(day, -number, today) as day from numbers
) and then you left join with the counts (or outer apply, whatever you fancy) then you will get a result that has 0 for count for the days you had no access. This is just one example. Of course, one may argue that the presentation layer of your dashboard could handle the missing days and just show a 0 instead, but some tools (eg. SSRS) will simply not be able to handle this.
Other examples I've seen used similar time series tricks (date/time +/- number) to do all sort of window calculations. In general, whenever in an imperative language you would use a for loop with a well known number of iterations, the declarative and set nature of SQL can use a trick based on a numbers table.
BTW, I feel the need to call out the fact that even though using a numbers table it feels like imperative procedural execution, don't fall into the fallacy of assuming it is imperative. Let me give an example:
int x;
for (int i=0;i<1000000;++i)
x = i;
printf("%d",x);
This program will output 999999, that is pretty much guaranteed.
Lets try the same in SQL Server, using a number table. First create a table of 1,000,000 numbers:
create table numbers (number int not null primary key);
go
declare @i int = 0
, @j int = 0;
set nocount on;
begin transaction
while @i < 1000
begin
set @j = 0;
while @j < 1000
begin
insert into numbers (number)
values (@j*1000+@i);
set @j += 1;
end
commit;
raiserror (N'Inserted %d*1000', 0, 0, @i)
begin transaction;
set @i += 1;
end
commit
go
Now lets do the 'for loop':
declare @x int;
select @x = number
from numbers with(nolock);
select @x as [@x];
The result is:
@x
-----------
88698
If you're now having a WTF moment (after all number
is the clustered primary key!), the trick is called allocation order scan and I did not insert @j*1000+@i
by accident... You could had also venture a guess and say the result is because parallelism and that sometimes may be the correct answer.
There are many trolls under this bridge and I mentioned some in On SQL Server boolean operator short-circuit and T-SQL functions do no imply a certain order of execution
I've found a numbers table quite useful in a variety of situations.
At Why should I consider using an auxiliary numbers table?, written in 2004, I show a few examples:
- Parsing a string
- Finding identity gaps
- Generating date ranges (e.g. populating a calendar table, which can also be invaluable)
- Generating time slices
- Generating IP ranges
At Bad habits to kick : using loops to populate large tables, I show how a numbers table can be used to make short work of inserting a lot of rows (as opposed to the knee-jerk approach of using a while loop).
At Processing a list of integers : my approach and More on splitting lists : custom delimiters, preventing duplicates, and maintaining order, I show how to use a numbers table to split a string (e.g. a set of comma-separated values) and provide performance comparisons between this and other methods. More info on splitting and other string handling:
- Split strings the right way – or the next best way
- Splitting Strings : A Follow-Up
- Comparing string splitting / concatenation methods
- Removing Duplicates from Strings in SQL Server
- Performance Surprises and Assumptions : STRING_SPLIT()
And in The SQL Server Numbers Table, Explained - Part 1, I give some background about the concept and have future posts in store to detail specific applications.
There are many other uses, those are just a few that have stood out to me enough to write about them.
And like @gbn, I have a few answers on stack overflow and on this site that use a numbers table as well.
Finally, I have a series of blog posts about generating sets without looping, which in part show the performance advantage of using a numbers table compared to most other methods (Remus' quirky outlier aside):
Here's a great example that I used recently from Adam Machanic:
CREATE FUNCTION dbo.GetSubstringCount
(
@InputString TEXT,
@SubString VARCHAR(200),
@NoisePattern VARCHAR(20)
)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN
(
SELECT COUNT(*)
FROM dbo.Numbers N
WHERE
SUBSTRING(@InputString, N.Number, LEN(@SubString)) = @SubString
AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number + LEN(@SubString), 1)) = 0
AND 0 =
CASE
WHEN @NoisePattern = '' THEN 0
ELSE PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number - 1, 1))
END
)
END
I used something else similar with a CTE
to find a specific instance of substring (i.e. "Find the 3rd pipe in this string") to work with correlated delimited data:
declare @TargetStr varchar(8000),
@SearchedStr varchar(8000),
@Occurrence int
set @TargetStr='a'
set @SearchedStr='abbabba'
set @Occurrence=3;
WITH Occurrences AS (
SELECT Number,
ROW_NUMBER() OVER(ORDER BY Number) AS Occurrence
FROM master.dbo.spt_values
WHERE Number BETWEEN 1 AND LEN(@SearchedStr) AND type='P'
AND SUBSTRING(@SearchedStr,Number,LEN(@TargetStr))=@TargetStr)
SELECT Number
FROM Occurrences
WHERE Occurrence=@Occurrence
If you don't have a numbers table, the alternative is to use a loop of some sort. Basically, a numbers table allows you to do set-based iteration, without cursors or loops.
-
6And the mandatory warning about the lurking danger of doing string manipulation in inline TVFs: T-SQL functions do no imply a certain order of executionRemus Rusanu– Remus Rusanu2012年01月25日 01:16:32 +00:00Commented Jan 25, 2012 at 1:16
I would use a numbers table whenever I need a SQL equivalent of Enumerable.Range. For example, I just used it in an answer on this site: calculating number of permutations