How to get next/previous partition count using window function?

Question 1

I'm trying to get the current_count, next_count, and previous_count for date.month partitions with window functions:

WITH tmp1 AS (
SELECT location.city, date.month, date.year
 , COUNT(*) OVER W AS current_count, COUNT(*) OVER W2 AS 
count_next
FROM fact, crime, date, location 
WHERE fact.location_key = location.location_key and crime.crime_key = 
fact.crime_key and fact.date_key = date.date_key
WINDOW W AS (
 PARTITION BY location.city, date.year, date.month
 ORDER BY date.month
 ),
 W2 AS (
 PARTITION BY location.city, date.year, date.month
 ORDER BY date.month
 RANGE BETWEEN CURRENT ROW AND FOLLOWING
 )
)
SELECT *
FROM tmp1 
GROUP BY city, year, month,current_count, count_next

However, the code above works for current_count only, the count_next shows the same result as the current_count.

How to show the count for next month and previous month? With window functions?

Question 2

Postgres version, actual (minimal) table definitions, sample values and desired result would be instrumental for the question.

Question 3

Do the aggregation first. Then use the lead() and lag() window functions. Looks like you were aiming for something like this:

SELECT *
 , lead(curr_count) OVER w AS next_count
 , lag(curr_count) OVER w AS prev_count
FROM (
 SELECT l.city, d.year, d.month, count(*) AS curr_count
 FROM fact f
 JOIN crime c ON c.crime_key = f.crime_key -- see below
 JOIN date d ON d.date_key = f.date_key
 JOIN location l ON l.location_key = f.location_key
 GROUP BY l.city, d.year, d.month
 ) sub
WINDOW w AS (PARTITION BY city ORDER BY year, month)
ORDER BY city, year, month;

You can use a CTE or a subquery (like I did) to apply the logic in two steps. May be easier to understand at first. But you can add window functions over aggregate functions in the same query level. See:

Best way to get result count before LIMIT was applied

So this is equivalent and shorter (probably same query plan / performance):

SELECT l.city, d.year, d.month
 , count(*) AS curr_count
 , lead(count(*)) OVER w AS next_count
 , lag(count(*)) OVER w AS prev_count
FROM fact f
JOIN crime c USING (crime_key) -- see below
JOIN date d USING (date_key)
JOIN location l USING (location_key)
GROUP BY l.city, d.year, d.month
WINDOW w AS (PARTITION BY l.city ORDER BY d.year, d.month);

Use equivalent explicit join syntax. Much easier to read.

The USING shorthand in the join clause of my second query is only legit if column names are unambiguous. Else stick to the explicit ON syntax in the first query.

Why join to table crime? Does the join eliminate or multiply rows? Else you can remove it completely.

Use table aliases to keep the noise low.

You had PARTITION BY location.city, date.year, date.month ORDER BY date.month. But I suggest you really want PARTITION BY city ORDER BY year, month to make the query work across year boundaries?

It would probably be more efficient to use a single column of type date (4 bytes) instead of year and month. (But don't name it just "date".)

Note how I skipped ORDER BY city, year, month in the second query. You normally get rows sorted that way anyway due to the window definition, but there are no guarantees without explicit ORDER BY. And you did not specify that order is required. If so, you might want to be explicit about this and add ORDER BY.

score 2 · Answer 1 · 2020-03-15 04:03:45Z

Do the aggregation first. Then use the lead() and lag() window functions. Looks like you were aiming for something like this:

SELECT *
 , lead(curr_count) OVER w AS next_count
 , lag(curr_count) OVER w AS prev_count
FROM (
 SELECT l.city, d.year, d.month, count(*) AS curr_count
 FROM fact f
 JOIN crime c ON c.crime_key = f.crime_key -- see below
 JOIN date d ON d.date_key = f.date_key
 JOIN location l ON l.location_key = f.location_key
 GROUP BY l.city, d.year, d.month
 ) sub
WINDOW w AS (PARTITION BY city ORDER BY year, month)
ORDER BY city, year, month;

You can use a CTE or a subquery (like I did) to apply the logic in two steps. May be easier to understand at first. But you can add window functions over aggregate functions in the same query level. See:

Best way to get result count before LIMIT was applied

So this is equivalent and shorter (probably same query plan / performance):

SELECT l.city, d.year, d.month
 , count(*) AS curr_count
 , lead(count(*)) OVER w AS next_count
 , lag(count(*)) OVER w AS prev_count
FROM fact f
JOIN crime c USING (crime_key) -- see below
JOIN date d USING (date_key)
JOIN location l USING (location_key)
GROUP BY l.city, d.year, d.month
WINDOW w AS (PARTITION BY l.city ORDER BY d.year, d.month);

Use equivalent explicit join syntax. Much easier to read.

The USING shorthand in the join clause of my second query is only legit if column names are unambiguous. Else stick to the explicit ON syntax in the first query.

Why join to table crime? Does the join eliminate or multiply rows? Else you can remove it completely.

Use table aliases to keep the noise low.

You had PARTITION BY location.city, date.year, date.month ORDER BY date.month. But I suggest you really want PARTITION BY city ORDER BY year, month to make the query work across year boundaries?

It would probably be more efficient to use a single column of type date (4 bytes) instead of year and month. (But don't name it just "date".)

Note how I skipped ORDER BY city, year, month in the second query. You normally get rows sorted that way anyway due to the window definition, but there are no guarantees without explicit ORDER BY. And you did not specify that order is required. If so, you might want to be explicit about this and add ORDER BY.

Stack Exchange Network

How to get next/previous partition count using window function?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to get next/previous partition count using window function?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions