EDIT: Unknowing of the rule that prohibits cross-posting, I also asked this on Stackoverflow and chose an answer over there. Since there's another (fully working) answer in this thread though, I won't delete it. But for the solution I chose, check out this thread - https://stackoverflow.com/questions/52024244/how-to-get-aggregate-data-from-a-dynamic-number-of-related-rows-in-adjacent-tabl
I have a table of matches played, roughly looking like this:
player_id | match_id | result | opponent_rank
----------------------------------------------
82 | 2847 | w | 42
82 | 3733 | w | 185
82 | 4348 | l | 10
82 | 5237 | w | 732
82 | 5363 | w | 83
82 | 7274 | w | 6
51 | 2347 | w | 39
51 | 3746 | w | 394
51 | 5037 | l | 90
... | ... | ... | ...
To get all the winning streaks (not just top streak by any player), I use this query:
SELECT player.tag, s.streak, match.date, s.player_id, s.match_id FROM (
SELECT streaks.streak, streaks.player_id, streaks.match_id FROM (
SELECT w1.player_id, max(w1.match_id) AS match_id, count(*) AS streak FROM (
SELECT w2.player_id, w2.match_id, w2.win, w2.date, sum(w2.grp) OVER w AS grp FROM (
SELECT m.player_id, m.match_id, m.win, m.date, (m.win = false AND LAG(m.win, 1, true) OVER w = true)::integer AS grp FROM matches_m AS m
WHERE matches_m.opponent_position<'100'
WINDOW w AS (PARTITION BY m.player_id ORDER BY m.date, m.match_id)
) AS w2
WINDOW w AS (PARTITION BY w2.player_id ORDER BY w2.date, w2.match_id)
) AS w1
WHERE w1.win = true
GROUP BY w1.player_id, w1.grp
ORDER BY w1.player_id DESC, count(*) DESC
) AS streaks
ORDER BY streaks.streak DESC
LIMIT 100
) AS s
LEFT JOIN player ON player.id = s.player_id
LEFT JOIN match ON match.id = s.match_id
And the result looks like this (note that this is not a fixed table/view, as the query above can be extended by certain parameters such as nationality, date range, ranking of players, etc):
player_id | match_id | streak
-------------------------------
82 | 3733 | 2
82 | 7274 | 3
51 | 3746 | 2
... | ... | ...
What I want to add now is a bunch of aggregate data to provide details about the winning streaks. For starters, I'd like to know the average rank of the opponents during each those streaks. Other data are the duration of the streak in time, first and last date, opponent name who ended the streak or if it's still ongoing, and so on. I've tried various things - CTE, some elaborate joins, unions, or adding them in as lag functions in the existing code. But I'm completely stuck how to solve this.
As is obvious from the code, my SQL skills are very basic, so please excuse any mistakes or inefficient statements. Also new to DBA so let me know if my question can be phrased better. For complete context, I'm using Postgres 9.4 on Debian, the matches_m table is a materialized view with 550k lines (query takes 2.5s right now). The data comes from http://aligulac.com/about/db/, I just mirror it to create the aforementioned view.
1 Answer 1
For a beginner in sql you sure picked some fairly advanced concept to start with ;-) Your question seems to boil down to what is known as island and gaps
problems. I find it easiest to handle these kinds of problems by creating a group for consecutive events. One trick to accomplish this is to use two enumerations and calculate the difference:
row_number() over ( partition by player_id
order by match_id )
row_number() over ( partition by player_id, result
order by match_id )
if the difference between first and second grp changes it means that a change in result occurred. Since you will probably use this grp for several things I put it in a CTE:
with t as (
select player_id, match_id, result, opponent_rank
, row_number() over ( partition by player_id
order by match_id )
-
row_number() over ( partition by player_id, result
order by match_id ) as grp
from matches
)
select player_id, match_id, result, opponent_rank, grp
, count(1) over (partition by player_id, grp
order by match_id) as streak
, avg(opponent_rank) over (partition by player_id, grp) as avg_rnk
from t
where result = 'w' and player_id = 82
order by player_id, match_id;
I believe this should give you a start, so I'll stop there. Welcome to the forum btw.
-
1Thanks for your reply! It works, but for some reason it's super slow (35s instead of 2.5s). I'm not quite sure why, but I'll look into that in the future as I like the concept of CTE. For now though, I've chosen another solution from a cross-post of this (didn't know of that rule): stackoverflow.com/questions/52024244/… - and thanks for the welcome! I might want to add that I'm not entirely "new" to SQL, but have been using only "simple" queries for about... 15 years ;-)hcac– hcac2018年08月26日 14:08:21 +00:00Commented Aug 26, 2018 at 14:08
-
Out of curiosity, does it make a difference if you skip the CTE, and select from the expression inside the CTE? I.e.
select player_id, ... from ( select player_id, match_id, result, opponent_rank , row_number() over ( ... ) as t where ...
?Lennart - Slava Ukraini– Lennart - Slava Ukraini2018年08月27日 11:28:10 +00:00Commented Aug 27, 2018 at 11:28
Explore related questions
See similar questions with these tags.