How to get aggregate data from a dynamic number of related rows in adjacent table

Question 1

EDIT: Unknowing of the rule that prohibits cross-posting, I also asked this on Stackoverflow and chose an answer over there. Since there's another (fully working) answer in this thread though, I won't delete it. But for the solution I chose, check out this thread - https://stackoverflow.com/questions/52024244/how-to-get-aggregate-data-from-a-dynamic-number-of-related-rows-in-adjacent-tabl

I have a table of matches played, roughly looking like this:

player_id | match_id | result | opponent_rank
----------------------------------------------
82 | 2847 | w | 42
82 | 3733 | w | 185
82 | 4348 | l | 10
82 | 5237 | w | 732
82 | 5363 | w | 83
82 | 7274 | w | 6
51 | 2347 | w | 39
51 | 3746 | w | 394
51 | 5037 | l | 90
... | ... | ... | ...

To get all the winning streaks (not just top streak by any player), I use this query:

SELECT player.tag, s.streak, match.date, s.player_id, s.match_id FROM (
 SELECT streaks.streak, streaks.player_id, streaks.match_id FROM (
 SELECT w1.player_id, max(w1.match_id) AS match_id, count(*) AS streak FROM (
 SELECT w2.player_id, w2.match_id, w2.win, w2.date, sum(w2.grp) OVER w AS grp FROM (
 SELECT m.player_id, m.match_id, m.win, m.date, (m.win = false AND LAG(m.win, 1, true) OVER w = true)::integer AS grp FROM matches_m AS m
 WHERE matches_m.opponent_position<'100'
 WINDOW w AS (PARTITION BY m.player_id ORDER BY m.date, m.match_id)
 ) AS w2
 WINDOW w AS (PARTITION BY w2.player_id ORDER BY w2.date, w2.match_id)
 ) AS w1
 WHERE w1.win = true
 GROUP BY w1.player_id, w1.grp
 ORDER BY w1.player_id DESC, count(*) DESC
 ) AS streaks
 ORDER BY streaks.streak DESC
 LIMIT 100
 ) AS s
LEFT JOIN player ON player.id = s.player_id
LEFT JOIN match ON match.id = s.match_id

And the result looks like this (note that this is not a fixed table/view, as the query above can be extended by certain parameters such as nationality, date range, ranking of players, etc):

player_id | match_id | streak
-------------------------------
82 | 3733 | 2
82 | 7274 | 3
51 | 3746 | 2
... | ... | ...

What I want to add now is a bunch of aggregate data to provide details about the winning streaks. For starters, I'd like to know the average rank of the opponents during each those streaks. Other data are the duration of the streak in time, first and last date, opponent name who ended the streak or if it's still ongoing, and so on. I've tried various things - CTE, some elaborate joins, unions, or adding them in as lag functions in the existing code. But I'm completely stuck how to solve this.

As is obvious from the code, my SQL skills are very basic, so please excuse any mistakes or inefficient statements. Also new to DBA so let me know if my question can be phrased better. For complete context, I'm using Postgres 9.4 on Debian, the matches_m table is a materialized view with 550k lines (query takes 2.5s right now). The data comes from http://aligulac.com/about/db/, I just mirror it to create the aforementioned view.

Question 2

For a beginner in sql you sure picked some fairly advanced concept to start with ;-) Your question seems to boil down to what is known as island and gaps problems. I find it easiest to handle these kinds of problems by creating a group for consecutive events. One trick to accomplish this is to use two enumerations and calculate the difference:

row_number() over ( partition by player_id
 order by match_id )
row_number() over ( partition by player_id, result
 order by match_id )

if the difference between first and second grp changes it means that a change in result occurred. Since you will probably use this grp for several things I put it in a CTE:

with t as (
 select player_id, match_id, result, opponent_rank
 , row_number() over ( partition by player_id
 order by match_id )
 -
 row_number() over ( partition by player_id, result
 order by match_id ) as grp
 from matches
)
select player_id, match_id, result, opponent_rank, grp
 , count(1) over (partition by player_id, grp
 order by match_id) as streak
 , avg(opponent_rank) over (partition by player_id, grp) as avg_rnk 
from t
where result = 'w' and player_id = 82
order by player_id, match_id;

I believe this should give you a start, so I'll stop there. Welcome to the forum btw.

Question 3

Thanks for your reply! It works, but for some reason it's super slow (35s instead of 2.5s). I'm not quite sure why, but I'll look into that in the future as I like the concept of CTE. For now though, I've chosen another solution from a cross-post of this (didn't know of that rule): stackoverflow.com/questions/52024244/… - and thanks for the welcome! I might want to add that I'm not entirely "new" to SQL, but have been using only "simple" queries for about... 15 years ;-)

Question 4

Out of curiosity, does it make a difference if you skip the CTE, and select from the expression inside the CTE? I.e.

select player_id, ... from ( select player_id, match_id, result, opponent_rank , row_number() over ( ... ) as t where ...

?

score 0 · Answer 1 · 2018-08-26 11:11:37Z

For a beginner in sql you sure picked some fairly advanced concept to start with ;-) Your question seems to boil down to what is known as island and gaps problems. I find it easiest to handle these kinds of problems by creating a group for consecutive events. One trick to accomplish this is to use two enumerations and calculate the difference:

row_number() over ( partition by player_id
 order by match_id )
row_number() over ( partition by player_id, result
 order by match_id )

if the difference between first and second grp changes it means that a change in result occurred. Since you will probably use this grp for several things I put it in a CTE:

with t as (
 select player_id, match_id, result, opponent_rank
 , row_number() over ( partition by player_id
 order by match_id )
 -
 row_number() over ( partition by player_id, result
 order by match_id ) as grp
 from matches
)
select player_id, match_id, result, opponent_rank, grp
 , count(1) over (partition by player_id, grp
 order by match_id) as streak
 , avg(opponent_rank) over (partition by player_id, grp) as avg_rnk 
from t
where result = 'w' and player_id = 82
order by player_id, match_id;

I believe this should give you a start, so I'll stop there. Welcome to the forum btw.

Thanks for your reply! It works, but for some reason it's super slow (35s instead of 2.5s). I'm not quite sure why, but I'll look into that in the future as I like the concept of CTE. For now though, I've chosen another solution from a cross-post of this (didn't know of that rule): stackoverflow.com/questions/52024244/… - and thanks for the welcome! I might want to add that I'm not entirely "new" to SQL, but have been using only "simple" queries for about... 15 years ;-)
Out of curiosity, does it make a difference if you skip the CTE, and select from the expression inside the CTE? I.e. select player_id, ... from ( select player_id, match_id, result, opponent_rank , row_number() over ( ... ) as t where ... ?

Stack Exchange Network

How to get aggregate data from a dynamic number of related rows in adjacent table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to get aggregate data from a dynamic number of related rows in adjacent table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions