How to count three consecutive records with same values in a column based on the value of another column in PostgreSQL?

Question 1

I have a table attendance as follows:

create table attendance (id int, name varchar(30), status varchar(10), date date)

This table has the following records:

insert into attendance values (1,'John','absent','2016-03-01');
insert into attendance values (1,'John','absent','2016-03-02');
insert into attendance values (1,'John','absent','2016-03-03');
insert into attendance values (2,'Sam','present','2016-03-04');
insert into attendance values (3,'Sam','absent','2016-03-05');
insert into attendance values (1,'John','absent','2016-03-06');
insert into attendance values (1,'John','absent','2016-03-07');
insert into attendance values (1,'John','absent','2016-03-08');
insert into attendance values (1,'John','present','2016-03-09');
insert into attendance values (1,'John','absent','2016-03-10');
insert into attendance values (1,'John','absent','2016-03-11');
insert into attendance values (1,'John','present','2016-03-12'); 
insert into attendance values (1,'John','absent','2016-03-13');

Now I want to count that how many times a person has three consecutive absent records.
The result should be as follows:

id name count
 1 John 2

because John is absent for three consecutive days two times.

If they have 6 absents on 6 consecutive dates, then it should count 2 i.e (3+3).

It should count 1 for 3 consecutive absents on 3 consecutive dates i.e if John is absent for 1,2 and 3 march, it should count 1. but if John is absent for 1,2 and 4 march then it should not count 1.
4 or 5 absents should be counted as 1.

If there is no entry for a date, It is considered 'Present'.

Any help is appreciated.

Question 2

Another solution. Recursive part creates groups of rows of status "absent". The solution uses recursive CTEs and window functions.

WITH RECURSIVE a(id, name, date, n) as (
 SELECT id, name, q.date, 1 as n
 FROM (
 SELECT id, name, date,
 date-lag(date) OVER (PARTITION BY name ORDER BY date) as lag
 FROM attendance
 WHERE status='absent'
 ) q
 WHERE lag >1 or lag is null
 UNION
 SELECT a.id, a.name, a.date, a.n + 1
 FROM a
 JOIN attendance at ON (
 a.id = at.id and at.name = a.name and at.date = a.date + n)
 WHERE at.status='absent'
)
SELECT id, name, sum(long_absences) FROM (
 SELECT id, name, count(*)/3 as long_absences
 FROM a
 GROUP BY id, name, date having count(*) >=3
 ) as absences
GROUP BY id, name;

Question 3

I beleive @user103153 solution is better so I'll just add this as an example of another technique. I did not have PostgreSQL to try with so some adjustments likely have to be made. I also use a calendar table, in psql either a recursive CTE or a generating expression can be used:

We start by constructing the domain of names and dates:

select distinct d, name 
from calendar c 
cross join attendance a

from there we can construct the status:

select nc.name, nc.d, a.date, a.name as aname
 , coalesce(a.status, 'present') as status 
from ( 
 select distinct d, name 
 from calendar c 
 cross join attendance a
) as nc
left join attendance a 
 on a.name = nc.name
 and a.date = nc.d

Then by using a common trick we can construct groups of consecutive name, date and status:

select name, d, status, row_number() over (partition by name order by d) 
 - row_number() over (partition by name, status order by d) as grp 
from (
 select nc.name, nc.d, a.date, a.name as aname
 , coalesce(a.status, 'present') as status 
 from ( 
 select distinct d, name 
 from calendar c cross join attendance a
 ) as nc
 left join attendance a 
 on a.name = nc.name
 and a.date = nc.d
)

Next step is to determine the start and stop for each interval where status is 'absent':

select name, min(d), max(d) from (
 select name, d, status, row_number() over (partition by name order by d) 
 - row_number() over (partition by name, status order by d) as grp 
 from (
 select nc.name, nc.d, a.date, a.name as aname
 , coalesce(a.status, 'present') as status 
 from ( 
 select distinct d, name 
 from calendar c 
 cross join attendance a
 ) as nc
 left join attendance a 
 on a.name = nc.name
 and a.date = nc.d
 ) 
) 
where status = 'absent' 
group by grp, name order by min(d);
NAME 2 3 
------------------------------ ---------- ----------
John 03/01/2016 03/03/2016
Sam 03/05/2016 03/05/2016
John 03/06/2016 03/08/2016
John 03/10/2016 03/11/2016
John 03/13/2016 03/13/2016

Finally (left as an exercise :-) we take the length of the interval and integer divide that with 3 and filter out the ones with 0. Then it's just a matter of summing the number per name.

Question 4

I think I cobbled together something that does what you're looking for. I used several subqueries to check for consecutive absences, and also to avoid double dipping on a long absence (meaning 6 days in a row should count ONLY the 2 combined marks from the first day, and not the individual marks on absences 2-4, 3-5, and 4-6).

Would be curious to see if there's a more elegant solution from someone else.

 select 
 z.name, 
 sum(z.absent_marks) 
 from (
 select 
 sub1.* 
 from (
 select 
 x.name, 
 x.adate, 
 floor(max(x.timeshift)/3)::integer as absent_marks
 from (
 select 
 a.name,
 a.date as adate,
 b.date as bdate,
 dense_rank() over (partition by a.name, a.date order by b.date) as timeshift
 from attendance a
 left join attendance b 
 on a."name" = b."name" 
 and a.date <= b.date
 and a.status = b.status 
 where a.status = 'absent'
 ) x 
 where x.timeshift - 1 = datediff('day', x.adate, x.bdate)
 group by x.name, x.adate
 ) sub1
 left join ( 
 select 
 x.name, 
 x.adate, 
 floor(max(x.timeshift)/3)::integer as absent_marks
 from (
 select 
 a.name,
 a.date as adate,
 b.date as bdate,
 dense_rank() over (partition by a.name, a.date order by b.date) as timeshift
 from attendance a
 left join attendance b 
 on a."name" = b."name" 
 and a.date <= b.date
 and a.status = b.status 
 where a.status = 'absent'
 ) x 
 where x.timeshift - 1 = datediff('day', x.adate, x.bdate)
 group by x.name, x.adate) sub2
 on sub1.adate = sub2.adate + 1
 and sub1.name = sub2.name 
 where (sub2.absent_marks = 0 or sub2.absent_marks is null)
 and sub1.absent_marks != 0
 ) z
 group by
 z.name

Question 5

It did not give me the output. And instead of datediff() we have date_part in PostgreSQL. @peach

user103153 user103153 764 bronze badges · Accepted Answer · 2016-08-09 21:47:07Z

Another solution. Recursive part creates groups of rows of status "absent". The solution uses recursive CTEs and window functions.

WITH RECURSIVE a(id, name, date, n) as (
 SELECT id, name, q.date, 1 as n
 FROM (
 SELECT id, name, date,
 date-lag(date) OVER (PARTITION BY name ORDER BY date) as lag
 FROM attendance
 WHERE status='absent'
 ) q
 WHERE lag >1 or lag is null
 UNION
 SELECT a.id, a.name, a.date, a.n + 1
 FROM a
 JOIN attendance at ON (
 a.id = at.id and at.name = a.name and at.date = a.date + n)
 WHERE at.status='absent'
)
SELECT id, name, sum(long_absences) FROM (
 SELECT id, name, count(*)/3 as long_absences
 FROM a
 GROUP BY id, name, date having count(*) >=3
 ) as absences
GROUP BY id, name;

Stack Exchange Network

How to count three consecutive records with same values in a column based on the value of another column in PostgreSQL?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to count three consecutive records with same values in a column based on the value of another column in PostgreSQL?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions