Tricky for me to define the problem, maybe you understand it through my dummy data.
I have this data:
PK, TaskPK
1, 1
2, 1
3, 2
4, 2
5, 5
6, 1
7, 1
8, 2
9, 2
10, 5
11, 5
Now I have to count TaskPK
so, I make this query
Select PK, TaskPK, Count(*)
From tbl
Group by TaskPK
It brought this result
TaskPK, Count(*)
1, 4
2, 4
5, 3
But I want slight different result
Like this
TaskPK, Count(*)
1, 2
2, 2
5, 1
1, 2
2, 2
5, 2
The above result based on consecutive data occurrence, as TaskPK start with 1 (it group together), then it change it 2 (it group together), then 5 (it group together) taskPK. But as TaskPK again shifts to 1, then it should group seperatly not link with previous occurrence of 1, this task seperately count.
Is this possible?
-
Are you working with MySQL or you can use MariaDB that has window functions ?joanolo– joanolo2017年07月01日 11:42:05 +00:00Commented Jul 1, 2017 at 11:42
1 Answer 1
Problems of this kind are a variation of what is normally referred to as Gaps and Islands. You can see a good overview at The SQL of Gaps and Islands in Sequences
Step by step approach
This can be done with several intermediate steps. First, let's make a (derived) table, that we will store as t1
, finding out at which points you start a new series of TaskPK
values.
CREATE TEMPORARY TABLE t1 AS
SELECT
PK, TaskPK,
case when
(SELECT TaskPK
FROM tbl t_prv
WHERE t_prv.PK < tbl.PK
ORDER BY PK DESC
LIMIT 1
) <=> TaskPK
then 0
else 1
end AS start_of_series
FROM
tbl ;
t1
contains:
PK | TaskPK | start_of_series -: | -----: | --------------: 1 | 1 | 1 2 | 1 | 0 3 | 2 | 1 4 | 2 | 0 5 | 5 | 1 6 | 1 | 1 7 | 1 | 0 8 | 2 | 1 9 | 2 | 0 10 | 5 | 1 11 | 5 | 0
NOTE: the <=> operator in MySQL is equivalent to NOT DISTINCT FROM
in standard SQL. It allows for equality comparison with NULL values as well as NON-NULL ones.
Now that we know where every series starts, we can give each PK
a series_number
by adding together all the current and previous values of the column start_of_series
(that's a cumulative sum). Without window functions, this must be done with an (awful, inefficient) subquery.
We store the result in a table t2
:
CREATE TEMPORARY TABLE t2 AS
SELECT
PK, TaskPK,
(SELECT
sum(start_of_series)
FROM
t1
WHERE
t1.PK <= tbl.PK
) AS series_number
FROM
tbl
;
The content of t2
is:
PK | TaskPK | series_number -: | -----: | ------------: 1 | 1 | 1 2 | 1 | 1 3 | 2 | 2 4 | 2 | 2 5 | 5 | 3 6 | 1 | 4 7 | 1 | 4 8 | 2 | 5 9 | 2 | 5 10 | 5 | 6 11 | 5 | 6
At this point, what you want to do is just GROUP BY
series_number, and take the value of TaskPK
(just min(TaskPK)
will do this job), together with a count(*)
:
SELECT
min(TaskPK) AS TaskPK, count(*) AS count
FROM
t2
GROUP BY
series_number
ORDER BY
series_number
This is the result you actually wanted:
TaskPK | count -----: | ----: 1 | 2 2 | 2 5 | 1 1 | 2 2 | 2 5 | 2
Putting everything together
By substituting t2
and t1
by their definitions, you can put everything together in a single query:
SELECT
min(TaskPK) AS TaskPK, count(*) AS count
FROM
(SELECT
PK, TaskPK,
(SELECT
sum(start_of_series)
FROM
(SELECT
PK, TaskPK,
case when
(SELECT TaskPK FROM tbl t_prv WHERE t_prv.PK < tbl.PK ORDER BY PK DESC LIMIT 1) <=> TaskPK
then 0
else 1
end AS start_of_series
FROM
tbl
) AS t1
WHERE
t1.PK <= tbl.PK
) AS series_number
FROM
tbl
) AS t2
GROUP BY
series_number
ORDER BY
series_number
;
TaskPK | count -----: | ----: 1 | 2 2 | 2 5 | 1 1 | 2 2 | 2 5 | 2
This query is quite complicated (there are five SELECT
within it), and can probably be still simplified some.
You can find the whole code and data at dbfiddle here
This can be very hihgly simplified if you can use MariaDB's WINDOW functions, or MySQL 8.0 Window functions (not yet released for production as of now).
-
Thanks a lot man, you just opened up a whole new world for me!Lukas Pierce– Lukas Pierce2022年07月25日 05:06:15 +00:00Commented Jul 25, 2022 at 5:06