2

Tricky for me to define the problem, maybe you understand it through my dummy data.

I have this data:

PK, TaskPK
1, 1 
2, 1 
3, 2 
4, 2 
5, 5 
6, 1 
7, 1 
8, 2 
9, 2 
10, 5 
11, 5

Now I have to count TaskPK so, I make this query

Select PK, TaskPK, Count(*)
From tbl
Group by TaskPK

It brought this result

TaskPK, Count(*)
1, 4
2, 4
5, 3

But I want slight different result

Like this

TaskPK, Count(*)
1, 2
2, 2
5, 1
1, 2
2, 2
5, 2

The above result based on consecutive data occurrence, as TaskPK start with 1 (it group together), then it change it 2 (it group together), then 5 (it group together) taskPK. But as TaskPK again shifts to 1, then it should group seperatly not link with previous occurrence of 1, this task seperately count.

Is this possible?

joanolo
13.7k8 gold badges39 silver badges67 bronze badges
asked Jul 1, 2017 at 7:44
1
  • Are you working with MySQL or you can use MariaDB that has window functions ? Commented Jul 1, 2017 at 11:42

1 Answer 1

5

Problems of this kind are a variation of what is normally referred to as Gaps and Islands. You can see a good overview at The SQL of Gaps and Islands in Sequences

Step by step approach

This can be done with several intermediate steps. First, let's make a (derived) table, that we will store as t1, finding out at which points you start a new series of TaskPK values.

CREATE TEMPORARY TABLE t1 AS
SELECT
 PK, TaskPK, 
 case when 
 (SELECT TaskPK 
 FROM tbl t_prv 
 WHERE t_prv.PK < tbl.PK 
 ORDER BY PK DESC 
 LIMIT 1
 ) <=> TaskPK 
 then 0
 else 1 
 end AS start_of_series
FROM
 tbl ;

t1 contains:

PK | TaskPK | start_of_series
-: | -----: | --------------:
 1 | 1 | 1
 2 | 1 | 0
 3 | 2 | 1
 4 | 2 | 0
 5 | 5 | 1
 6 | 1 | 1
 7 | 1 | 0
 8 | 2 | 1
 9 | 2 | 0
10 | 5 | 1
11 | 5 | 0

NOTE: the <=> operator in MySQL is equivalent to NOT DISTINCT FROM in standard SQL. It allows for equality comparison with NULL values as well as NON-NULL ones.

Now that we know where every series starts, we can give each PK a series_number by adding together all the current and previous values of the column start_of_series (that's a cumulative sum). Without window functions, this must be done with an (awful, inefficient) subquery.

We store the result in a table t2:

CREATE TEMPORARY TABLE t2 AS
SELECT
 PK, TaskPK, 
 (SELECT 
 sum(start_of_series) 
 FROM 
 t1 
 WHERE 
 t1.PK <= tbl.PK
 ) AS series_number
FROM
 tbl 
;

The content of t2 is:

PK | TaskPK | series_number
-: | -----: | ------------:
 1 | 1 | 1
 2 | 1 | 1
 3 | 2 | 2
 4 | 2 | 2
 5 | 5 | 3
 6 | 1 | 4
 7 | 1 | 4
 8 | 2 | 5
 9 | 2 | 5
10 | 5 | 6
11 | 5 | 6

At this point, what you want to do is just GROUP BY series_number, and take the value of TaskPK (just min(TaskPK) will do this job), together with a count(*):

SELECT
 min(TaskPK) AS TaskPK, count(*) AS count
FROM
 t2
GROUP BY
 series_number
ORDER BY
 series_number

This is the result you actually wanted:

TaskPK | count
-----: | ----:
 1 | 2
 2 | 2
 5 | 1
 1 | 2
 2 | 2
 5 | 2

Putting everything together

By substituting t2 and t1 by their definitions, you can put everything together in a single query:

SELECT
 min(TaskPK) AS TaskPK, count(*) AS count
FROM
 (SELECT
 PK, TaskPK, 
 (SELECT 
 sum(start_of_series) 
 FROM 
 (SELECT
 PK, TaskPK, 
 case when 
 (SELECT TaskPK FROM tbl t_prv WHERE t_prv.PK < tbl.PK ORDER BY PK DESC LIMIT 1) <=> TaskPK 
 then 0
 else 1 
 end AS start_of_series 
 FROM
 tbl
 ) AS t1 
 WHERE 
 t1.PK <= tbl.PK
 ) AS series_number
 FROM
 tbl
 ) AS t2
GROUP BY
 series_number
ORDER BY
 series_number
;
TaskPK | count
-----: | ----:
 1 | 2
 2 | 2
 5 | 1
 1 | 2
 2 | 2
 5 | 2

This query is quite complicated (there are five SELECT within it), and can probably be still simplified some.

You can find the whole code and data at dbfiddle here

This can be very hihgly simplified if you can use MariaDB's WINDOW functions, or MySQL 8.0 Window functions (not yet released for production as of now).

answered Jul 1, 2017 at 12:29
1
  • Thanks a lot man, you just opened up a whole new world for me! Commented Jul 25, 2022 at 5:06

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.