14

I am trying to construct a query in PostgreSQL 9.0 that gets the longest sequence of continuous rows for a specific column.

Consider the following table:

lap_id (serial), lap_no (int), car_type (enum), race_id (int FK)

Where lap_no is unique for each (race_id, car_type).

I would like the query to produce the longest sequence for a given race_id and car_type, so it would return an int (or long) that is highest.

With the following data:

1, 1, red, 1
2, 2, red, 1
3, 3, red, 1
4, 4, red, 1
5, 1, blue, 1
6, 5, red, 1
7, 2, blue, 1
8, 1, green, 1

For car_type = red and race_id = 1 the query would return 5 as the longest sequence of the lap_no field.

I found a similar question here however my situation is a bit more straightforward.

(I would also like to know the longest sequence for a given car_type for all races, but was planning to work that out myself.)

Erwin Brandstetter
186k28 gold badges463 silver badges636 bronze badges
asked Feb 25, 2013 at 16:06
0

2 Answers 2

23

Your description results in a table definition like this:

CREATE TABLE tbl (
 lap_id serial PRIMARY KEY
 , lap_no int NOT NULL
 , car_type enum NOT NULL
 , race_id int NOT NULL -- REFERENCES ...
 , UNIQUE(race_id, car_type, lap_no)
);

General solution for this class of problems

To get the longest sequence (1 result, longest of all, arbitrary pick if there are ties):

SELECT race_id, car_type, count(*) AS seq_len
FROM (
 SELECT *, count(*) FILTER (WHERE step)
 OVER (ORDER BY race_id, car_type, lap_no) AS grp
 FROM (
 SELECT *, (lag(lap_no) OVER (PARTITION BY race_id, car_type ORDER BY lap_no) + 1)
 IS DISTINCT FROM lap_no AS step
 FROM tbl
 ) x
 ) y
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;

count(*) FILTER (WHERE step) only counts TRUE (= step to next group), which results in a new number for every new group.

Related procedural solution with plpgsql:

If the top requirement is performance, the plpgsql function is typically faster in this particular case because it can calculate the result in a single scan.

Faster for consecutive numbers

We can capitalize on the fact that consecutive lap_no define a sequence, for a much simpler and faster version:

SELECT race_id, car_type, count(*) AS seq_len
FROM (
 SELECT race_id, car_type
 , row_number() OVER (PARTITION BY race_id, car_type ORDER BY lap_no) - lap_no AS grp
 FROM tbl
 ) x
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;

Consecutive laps end up in the same grp. Every missing lap results in a lower grp per partition.

This relies on (race_id, car_type, lap_no) being UNIQUE NOT NULL. NULL values or duplicates could break the logic.

Discussion of Jack's simpler alternative

@Jack's version effectively counts all laps (rows) where the previous lap_no in this race_id had the same car_type. That's simpler and faster and correct - as long as each car_type can only have one sequence per race_id.

But for a task that simple the query could be simpler, yet. It would follow logically that all lap_no per (car_type, race_id) must be in sequence, and we could just count the laps:

SELECT race_id, car_type, count(*) AS seq_len
FROM tbl
GROUP BY race_id, car_type
ORDER BY seq_len DESC
LIMIT 1;

If, on the other hand, one car_type can have multiple separate sequences per race_id (and the question does not specify otherwise), Jack's version will fail.

Faster for a given race / car type

In reply to the comment / clarifications in the question: restricting the query to one given (race_id, car_type) will make it much faster, of course:

SELECT count(*) AS seq_len
FROM (
 SELECT row_number() OVER (ORDER BY lap_no) - lap_no AS grp
 FROM tbl
 WHERE race_id = 1
 AND car_type = 'red'
 ) x
GROUP BY grp
ORDER BY seq_len DESC
LIMIT 1;

db<>fiddle here
Old SQL Fiddle

Index

Key to top performance is a fitting index (except for the mentioned procedural solution working with a single sequential scan). A multicolumn index like this serves best:

CREATE INDEX tbl_mult_idx ON tbl (race_id, car_type, lap_no);

If your table has the UNIQUE constraint I assumed at the top, that is implemented with just this (unique) index internally, and you do not need to create another index.

answered Feb 25, 2013 at 18:18
1
  • Hi Erwin, thanks that does the job, however it takes ~17sec on my database! Dont suppose you could provide a modification so it takes race_id and car_type as parameters rather than comparing the entire table? (I have tried re-writing it and keep running into errors) Commented Feb 25, 2013 at 19:28
7
create table tbl (lap_no int, car_type text, race_id int);
insert into tbl values (1,'red',1),(2,'red',1),(3,'red',1),(4,'red',1),
 (1,'blue',1),(5,'red',1),(2,'blue',1),(1,'green',1);
select car_type, race_id, sum(case when lap_no=(prev+1) then 1 else 0 end)+1 seq_len
from ( select *, lag(lap_no) over (partition by car_type, race_id order by lap_no) prev 
 from tbl ) z
group by car_type, race_id
order by seq_len desc limit 1;
/*
|car_type|race_id|seq_len|
|:-------|------:|------:|
|red | 1| 5|
*/
answered Feb 25, 2013 at 19:53
1
  • or perhaps sum((lap_no=(prev+1))::integer)+1 but I'm not sure that is easier to read Commented Feb 25, 2013 at 20:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.