I need some guidance with this query being too slow.
SELECT DISTINCT ON (id)
id,
views - lead(views)
OVER (PARTITION BY id
ORDER BY update_date DESC) vdiff
FROM videoupdate;
With 10 million+ rows it takes ~30 seconds. I have created a multicolumn index that reduced the original time from 1 minute. I want to see difference between views for each row partitioned by id. Some thoughts I had:
- After table update create TABLE AS with the query and select from it.
- Move old data to backup and shrink table.
- Look up data warehouse?
- Change database schema?
Evan Carroll
65.7k50 gold badges259 silver badges510 bronze badges
1 Answer 1
Following @a_horse_with_no_name's suggestion again, because he's really smart though super, super-resilient to using the Post Your Answer functionality.
SELECT DISTINCT ON(id),
id,
views - lead(views) OVER (PARTITION BY id ORDER BY update_date DESC) AS vdiff
FROM (
SELECT id,
views,
update_desc,
row_number() OVER (PARTITION BY id ORDER BY update_date DESC) AS rn
FROM videoupdate
) AS t
WHERE rn <=2
ORDER BY id, update_desc DESC;
answered Sep 9, 2017 at 16:46
-
Thank you very much for your answer! I will try it out as soon as possible. @evancarrolMisa– Misa2017年09月09日 17:02:27 +00:00Commented Sep 9, 2017 at 17:02
-
@Misa not yet, it's not right.Evan Carroll– Evan Carroll2017年09月09日 17:02:45 +00:00Commented Sep 9, 2017 at 17:02
-
@Misa try that. =)Evan Carroll– Evan Carroll2017年09月09日 17:04:04 +00:00Commented Sep 9, 2017 at 17:04
lang-sql
DISTINCT ON
without andORDER BY
? I'm assuming it'sordered by update _date DESC
distinct on
and use arow_number()
over the same window as the lead() function and use that row number to get the distinct ID.lead()
andWHERE
runs before the window function. So the distinct on is still the best bet.