I have data about people traveling in different countries like this:
country | begintimestamp | distance
Germany | 2015年01月01日 00:00:00 | 100
Germany | 2015年01月01日 01:12:13 | 30
France | 2015年01月01日 02:13:14 | 40
France | 2015年01月01日 03:14:15 | 20
Spain | 2015年01月01日 04:15:16 | 10
France | 2015年01月01日 05:16:17 | 30
France | 2015年01月01日 05:17:18 | 5
Germany | 2015年01月01日 06:18:19 | 3
What I need is to be able to receive a result like this - the distance
of consecutive rows summed with the earliest begintimestamp
:
country | begintimestamp | distance
Germany | 2015年01月01日 00:00:00 | 130 // 100+30, the distance of two first rows summed.
France | 2015年01月01日 02:13:14 | 60 // 40+20
Spain | 2015年01月01日 04:15:16 | 10 //
France | 2015年01月01日 05:16:17 | 35 // 30+5
Germany | 2015年01月01日 06:18:19 | 3
I've tried to play around with PG window functions but have not been able to come up with anything that would lead me closer to the result.
1 Answer 1
select min(country) as country,
min(begintimestamp) as first_begin_ts,
sum(distance) as distance
from (
select t1.*,
sum(group_flag) over (order by begintimestamp) as grp
from (
select *,
case
when lag(country) over (order by begintimestamp) = country then null
else 1
end as group_flag
from travel
) t1
) t2
group by grp
order by first_begin_ts;
The inner most query (alias t1
) creates a number each time the country changes). The second level query (alias t2
) then does a running sum on those flags, which essentially gives each consecutive set of countries a different number. The outer most query then groups by that number and sums the distance. The min(country)
is necessary to make the group by
operator happy, but as all rows with the same grp
have the same country anyway, it doesn't matter.
SQLFiddle: http://sqlfiddle.com/#!15/fe341/1