Select longest continuous sequence

Question 1

I am trying to construct a query in PostgreSQL 9.0 that gets the longest sequence of continuous rows for a specific column.

Consider the following table:

lap_id (serial), lap_no (int), car_type (enum), race_id (int FK)

Where lap_no is unique for each (race_id, car_type).

I would like the query to produce the longest sequence for a given race_id and car_type, so it would return an int (or long) that is highest.

With the following data:

1, 1, red, 1
2, 2, red, 1
3, 3, red, 1
4, 4, red, 1
5, 1, blue, 1
6, 5, red, 1
7, 2, blue, 1
8, 1, green, 1

For car_type = red and race_id = 1 the query would return 5 as the longest sequence of the lap_no field.

I found a similar question here however my situation is a bit more straightforward.

_{(I would also like to know the longest sequence for a given car_type for all races, but was planning to work that out myself.)}

Question 2

Your description results in a table definition like this:

CREATE TABLE tbl (
 lap_id serial PRIMARY KEY
 , lap_no int NOT NULL
 , car_type enum NOT NULL
 , race_id int NOT NULL -- REFERENCES ...
 , UNIQUE(race_id, car_type, lap_no)
);

General solution for this class of problems

To get the longest sequence (1 result, longest of all, arbitrary pick if there are ties):

SELECT race_id, car_type, count(*) AS seq_len
FROM (
 SELECT *, count(*) FILTER (WHERE step)
 OVER (ORDER BY race_id, car_type, lap_no) AS grp
 FROM (
 SELECT *, (lag(lap_no) OVER (PARTITION BY race_id, car_type ORDER BY lap_no) + 1)
 IS DISTINCT FROM lap_no AS step
 FROM tbl
 ) x
 ) y
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;

count(*) FILTER (WHERE step) only counts TRUE (= step to next group), which results in a new number for every new group.

Related procedural solution with plpgsql:

GROUP BY and aggregate sequential numeric values

If the top requirement is performance, the plpgsql function is typically faster in this particular case because it can calculate the result in a single scan.

Faster for consecutive numbers

We can capitalize on the fact that consecutive lap_no define a sequence, for a much simpler and faster version:

SELECT race_id, car_type, count(*) AS seq_len
FROM (
 SELECT race_id, car_type
 , row_number() OVER (PARTITION BY race_id, car_type ORDER BY lap_no) - lap_no AS grp
 FROM tbl
 ) x
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;

Consecutive laps end up in the same grp. Every missing lap results in a lower grp per partition.

This relies on (race_id, car_type, lap_no) being UNIQUE NOT NULL. NULL values or duplicates could break the logic.

Discussion of Jack's simpler alternative

@Jack's version effectively counts all laps (rows) where the previous lap_no in this race_id had the same car_type. That's simpler and faster and correct - as long as each car_type can only have one sequence per race_id.

But for a task that simple the query could be simpler, yet. It would follow logically that all lap_no per (car_type, race_id) must be in sequence, and we could just count the laps:

SELECT race_id, car_type, count(*) AS seq_len
FROM tbl
GROUP BY race_id, car_type
ORDER BY seq_len DESC
LIMIT 1;

If, on the other hand, one car_type can have multiple separate sequences per race_id (and the question does not specify otherwise), Jack's version will fail.

Faster for a given race / car type

In reply to the comment / clarifications in the question: restricting the query to one given (race_id, car_type) will make it much faster, of course:

SELECT count(*) AS seq_len
FROM (
 SELECT row_number() OVER (ORDER BY lap_no) - lap_no AS grp
 FROM tbl
 WHERE race_id = 1
 AND car_type = 'red'
 ) x
GROUP BY grp
ORDER BY seq_len DESC
LIMIT 1;

db<>fiddle here
_{Old SQL Fiddle}

Index

Key to top performance is a fitting index (except for the mentioned procedural solution working with a single sequential scan). A multicolumn index like this serves best:

CREATE INDEX tbl_mult_idx ON tbl (race_id, car_type, lap_no);

If your table has the UNIQUE constraint I assumed at the top, that is implemented with just this (unique) index internally, and you do not need to create another index.

Question 3

Hi Erwin, thanks that does the job, however it takes ~17sec on my database! Dont suppose you could provide a modification so it takes race_id and car_type as parameters rather than comparing the entire table? (I have tried re-writing it and keep running into errors)

Question 4

create table tbl (lap_no int, car_type text, race_id int);

insert into tbl values (1,'red',1),(2,'red',1),(3,'red',1),(4,'red',1),
 (1,'blue',1),(5,'red',1),(2,'blue',1),(1,'green',1);

select car_type, race_id, sum(case when lap_no=(prev+1) then 1 else 0 end)+1 seq_len
from ( select *, lag(lap_no) over (partition by car_type, race_id order by lap_no) prev 
 from tbl ) z
group by car_type, race_id
order by seq_len desc limit 1;

/*
|car_type|race_id|seq_len|
|:-------|------:|------:|
|red | 1| 5|
*/

Question 5

or perhaps sum((lap_no=(prev+1))::integer)+1 but I'm not sure that is easier to read

score 23 · Accepted Answer · 2013-02-25 18:18:24Z

Your description results in a table definition like this:

CREATE TABLE tbl (
 lap_id serial PRIMARY KEY
 , lap_no int NOT NULL
 , car_type enum NOT NULL
 , race_id int NOT NULL -- REFERENCES ...
 , UNIQUE(race_id, car_type, lap_no)
);

General solution for this class of problems

To get the longest sequence (1 result, longest of all, arbitrary pick if there are ties):

SELECT race_id, car_type, count(*) AS seq_len
FROM (
 SELECT *, count(*) FILTER (WHERE step)
 OVER (ORDER BY race_id, car_type, lap_no) AS grp
 FROM (
 SELECT *, (lag(lap_no) OVER (PARTITION BY race_id, car_type ORDER BY lap_no) + 1)
 IS DISTINCT FROM lap_no AS step
 FROM tbl
 ) x
 ) y
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;

count(*) FILTER (WHERE step) only counts TRUE (= step to next group), which results in a new number for every new group.

Related procedural solution with plpgsql:

GROUP BY and aggregate sequential numeric values

If the top requirement is performance, the plpgsql function is typically faster in this particular case because it can calculate the result in a single scan.

Faster for consecutive numbers

We can capitalize on the fact that consecutive lap_no define a sequence, for a much simpler and faster version:

SELECT race_id, car_type, count(*) AS seq_len
FROM (
 SELECT race_id, car_type
 , row_number() OVER (PARTITION BY race_id, car_type ORDER BY lap_no) - lap_no AS grp
 FROM tbl
 ) x
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;

Consecutive laps end up in the same grp. Every missing lap results in a lower grp per partition.

This relies on (race_id, car_type, lap_no) being UNIQUE NOT NULL. NULL values or duplicates could break the logic.

Discussion of Jack's simpler alternative

@Jack's version effectively counts all laps (rows) where the previous lap_no in this race_id had the same car_type. That's simpler and faster and correct - as long as each car_type can only have one sequence per race_id.

But for a task that simple the query could be simpler, yet. It would follow logically that all lap_no per (car_type, race_id) must be in sequence, and we could just count the laps:

SELECT race_id, car_type, count(*) AS seq_len
FROM tbl
GROUP BY race_id, car_type
ORDER BY seq_len DESC
LIMIT 1;

If, on the other hand, one car_type can have multiple separate sequences per race_id (and the question does not specify otherwise), Jack's version will fail.

Faster for a given race / car type

In reply to the comment / clarifications in the question: restricting the query to one given (race_id, car_type) will make it much faster, of course:

SELECT count(*) AS seq_len
FROM (
 SELECT row_number() OVER (ORDER BY lap_no) - lap_no AS grp
 FROM tbl
 WHERE race_id = 1
 AND car_type = 'red'
 ) x
GROUP BY grp
ORDER BY seq_len DESC
LIMIT 1;

db<>fiddle here
_{Old SQL Fiddle}

Index

Key to top performance is a fitting index (except for the mentioned procedural solution working with a single sequential scan). A multicolumn index like this serves best:

CREATE INDEX tbl_mult_idx ON tbl (race_id, car_type, lap_no);

If your table has the UNIQUE constraint I assumed at the top, that is implemented with just this (unique) index internally, and you do not need to create another index.

Hi Erwin, thanks that does the job, however it takes ~17sec on my database! Dont suppose you could provide a modification so it takes race_id and car_type as parameters rather than comparing the entire table? (I have tried re-writing it and keep running into errors)

Stack Exchange Network

Select longest continuous sequence

2 Answers 2

General solution for this class of problems

Faster for consecutive numbers

Discussion of Jack's simpler alternative

Faster for a given race / car type

Index

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions