I have a table attendance as follows:
create table attendance (id int, name varchar(30), status varchar(10), date date)
This table has the following records:
insert into attendance values (1,'John','absent','2016-03-01');
insert into attendance values (1,'John','absent','2016-03-02');
insert into attendance values (1,'John','absent','2016-03-03');
insert into attendance values (2,'Sam','present','2016-03-04');
insert into attendance values (3,'Sam','absent','2016-03-05');
insert into attendance values (1,'John','absent','2016-03-06');
insert into attendance values (1,'John','absent','2016-03-07');
insert into attendance values (1,'John','absent','2016-03-08');
insert into attendance values (1,'John','present','2016-03-09');
insert into attendance values (1,'John','absent','2016-03-10');
insert into attendance values (1,'John','absent','2016-03-11');
insert into attendance values (1,'John','present','2016-03-12');
insert into attendance values (1,'John','absent','2016-03-13');
Now I want to count that how many times a person has three consecutive absent records.
The result should be as follows:
id name count
1 John 2
because John is absent for three consecutive days two times.
If they have 6 absents on 6 consecutive dates, then it should count 2 i.e (3+3).
It should count 1 for 3 consecutive absents on 3 consecutive dates i.e if John is absent for 1,2 and 3 march, it should count 1. but if John is absent for 1,2 and 4 march then it should not count 1.
4 or 5 absents should be counted as 1.
If there is no entry for a date, It is considered 'Present'.
Any help is appreciated.
3 Answers 3
Another solution. Recursive part creates groups of rows of status "absent". The solution uses recursive CTEs and window functions.
WITH RECURSIVE a(id, name, date, n) as (
SELECT id, name, q.date, 1 as n
FROM (
SELECT id, name, date,
date-lag(date) OVER (PARTITION BY name ORDER BY date) as lag
FROM attendance
WHERE status='absent'
) q
WHERE lag >1 or lag is null
UNION
SELECT a.id, a.name, a.date, a.n + 1
FROM a
JOIN attendance at ON (
a.id = at.id and at.name = a.name and at.date = a.date + n)
WHERE at.status='absent'
)
SELECT id, name, sum(long_absences) FROM (
SELECT id, name, count(*)/3 as long_absences
FROM a
GROUP BY id, name, date having count(*) >=3
) as absences
GROUP BY id, name;
I beleive @user103153 solution is better so I'll just add this as an example of another technique. I did not have PostgreSQL to try with so some adjustments likely have to be made. I also use a calendar table, in psql either a recursive CTE or a generating expression can be used:
We start by constructing the domain of names and dates:
select distinct d, name
from calendar c
cross join attendance a
from there we can construct the status:
select nc.name, nc.d, a.date, a.name as aname
, coalesce(a.status, 'present') as status
from (
select distinct d, name
from calendar c
cross join attendance a
) as nc
left join attendance a
on a.name = nc.name
and a.date = nc.d
Then by using a common trick we can construct groups of consecutive name, date and status:
select name, d, status, row_number() over (partition by name order by d)
- row_number() over (partition by name, status order by d) as grp
from (
select nc.name, nc.d, a.date, a.name as aname
, coalesce(a.status, 'present') as status
from (
select distinct d, name
from calendar c cross join attendance a
) as nc
left join attendance a
on a.name = nc.name
and a.date = nc.d
)
Next step is to determine the start and stop for each interval where status is 'absent':
select name, min(d), max(d) from (
select name, d, status, row_number() over (partition by name order by d)
- row_number() over (partition by name, status order by d) as grp
from (
select nc.name, nc.d, a.date, a.name as aname
, coalesce(a.status, 'present') as status
from (
select distinct d, name
from calendar c
cross join attendance a
) as nc
left join attendance a
on a.name = nc.name
and a.date = nc.d
)
)
where status = 'absent'
group by grp, name order by min(d);
NAME 2 3
------------------------------ ---------- ----------
John 03/01/2016 03/03/2016
Sam 03/05/2016 03/05/2016
John 03/06/2016 03/08/2016
John 03/10/2016 03/11/2016
John 03/13/2016 03/13/2016
Finally (left as an exercise :-) we take the length of the interval and integer divide that with 3 and filter out the ones with 0. Then it's just a matter of summing the number per name.
I think I cobbled together something that does what you're looking for. I used several subqueries to check for consecutive absences, and also to avoid double dipping on a long absence (meaning 6 days in a row should count ONLY the 2 combined marks from the first day, and not the individual marks on absences 2-4, 3-5, and 4-6).
Would be curious to see if there's a more elegant solution from someone else.
select
z.name,
sum(z.absent_marks)
from (
select
sub1.*
from (
select
x.name,
x.adate,
floor(max(x.timeshift)/3)::integer as absent_marks
from (
select
a.name,
a.date as adate,
b.date as bdate,
dense_rank() over (partition by a.name, a.date order by b.date) as timeshift
from attendance a
left join attendance b
on a."name" = b."name"
and a.date <= b.date
and a.status = b.status
where a.status = 'absent'
) x
where x.timeshift - 1 = datediff('day', x.adate, x.bdate)
group by x.name, x.adate
) sub1
left join (
select
x.name,
x.adate,
floor(max(x.timeshift)/3)::integer as absent_marks
from (
select
a.name,
a.date as adate,
b.date as bdate,
dense_rank() over (partition by a.name, a.date order by b.date) as timeshift
from attendance a
left join attendance b
on a."name" = b."name"
and a.date <= b.date
and a.status = b.status
where a.status = 'absent'
) x
where x.timeshift - 1 = datediff('day', x.adate, x.bdate)
group by x.name, x.adate) sub2
on sub1.adate = sub2.adate + 1
and sub1.name = sub2.name
where (sub2.absent_marks = 0 or sub2.absent_marks is null)
and sub1.absent_marks != 0
) z
group by
z.name
-
It did not give me the output. And instead of datediff() we have date_part in PostgreSQL. @peachAbdul Raheem Ghani– Abdul Raheem Ghani2016年08月09日 07:07:28 +00:00Commented Aug 9, 2016 at 7:07