I have this column of ints that represent an occurrence of a signal and I'm trying to add a column that shows the count of consecutive row
If my data looks like this
724
727
728
733
735
737
743
747
749
the resulting data with a consecutive row count column would look like this
724 1
727 1
728 2
729 3
735 1
737 1
743 1
744 2
748 1
I've done it using a looping function but I'm trying to figure out using a cte. Here is a sample of my latest attempt
DECLARE @d TABLE ( signal INT )
INSERT INTO @d
SELECT 724
UNION
SELECT 727
UNION
SELECT 728
UNION
SELECT 729
UNION
SELECT 735
UNION
SELECT 737
UNION
SELECT 743
UNION
SELECT 744
UNION
SELECT 748 ;
WITH a AS ( SELECT signal,
ROW_NUMBER() OVER ( ORDER BY signal ) AS marker
FROM @d
) ,
b AS ( SELECT a1.signal,
CASE ( a1.signal - a2.signal )
WHEN 1 THEN 1
ELSE 0
END consecutiveMarker
FROM a a1
INNER JOIN a a2 ON a2.marker = a1.marker - 1
)
SELECT *
FROM b
Produces these results
signal consecutiveMarker
727 0
728 1
729 1
735 0
737 0
743 0
744 1
748 0
The first obvious issue is missing the first signal in a series. Barring that, I thought I could then pass this to another cte with a row_number partitioning on the consecutiveMarker. That didn't work because it partitioned it as one partition. I couldn't find a way to indicate to the partitioning method that one series is separate from the next
Any help is appreciated.
-
1There seems to be a mismatch between source data and desired results.Martin Smith– Martin Smith2011年10月09日 14:49:54 +00:00Commented Oct 9, 2011 at 14:49
3 Answers 3
The general name for this type of query is "gaps and islands". One approach below. If you can have duplicates in the source data you might need dense_rank
rather than row_number
WITH DATA(C) AS
(
SELECT 724 UNION ALL
SELECT 727 UNION ALL
SELECT 728 UNION ALL
SELECT 729 UNION ALL
SELECT 735 UNION ALL
SELECT 737 UNION ALL
SELECT 743 UNION ALL
SELECT 744 UNION ALL
SELECT 747 UNION ALL
SELECT 749
), T1 AS
(
SELECT C,
C - ROW_NUMBER() OVER (ORDER BY C) AS Grp
FROM DATA)
SELECT C,
ROW_NUMBER() OVER (PARTITION BY Grp ORDER BY C) AS Consecutive
FROM T1
Returns
C Consecutive
----------- --------------------
724 1
727 1
728 2
729 3
735 1
737 1
743 1
744 2
747 1
749 1
In SQL 2012 you can also do this using LAG and the window functions, eg
DECLARE @d TABLE ( signal INT PRIMARY KEY)
INSERT INTO @d
VALUES
( 724 ),
( 727 ),
( 728 ),
( 729 ),
( 735 ),
( 737 ),
( 743 ),
( 744 ),
( 748 )
SELECT signal
, 1 + ( SUM( is_group ) OVER ( ORDER BY signal ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * is_group )
FROM
(
SELECT *
, CASE WHEN LAG(signal) OVER( ORDER BY signal ) = signal - 1 THEN 1 ELSE 0 END is_group
FROM @d
) x
As usual with such problems, it is very easy to accomplish in Java or C++ or C#.
If you really need to do it in the database, you can use an RDBMS with fast cursors, such as Oracle, write a simple cursor, and enjoy fast performance without having to write anything complex.
If you need to do it in T-SQL, and you cannot change database design, Itzik Ben-Gan has written up several solutions in "MVP Deep Dives vol 1", and some new solutions using OLAP functions in his new book about window functions in SQL 2012.
Alternatively, you can add another column consecutiveMarker to your table, and store precalculated values in it. We can use constraints to ensure that pre-calculated data is always valid. If anyone is interested, I can explain how.