1

I'm trying to get the current_count, next_count, and previous_count for date.month partitions with window functions:

WITH tmp1 AS (
SELECT location.city, date.month, date.year
 , COUNT(*) OVER W AS current_count, COUNT(*) OVER W2 AS 
count_next
FROM fact, crime, date, location 
WHERE fact.location_key = location.location_key and crime.crime_key = 
fact.crime_key and fact.date_key = date.date_key
WINDOW W AS (
 PARTITION BY location.city, date.year, date.month
 ORDER BY date.month
 ),
 W2 AS (
 PARTITION BY location.city, date.year, date.month
 ORDER BY date.month
 RANGE BETWEEN CURRENT ROW AND FOLLOWING
 )
)
SELECT *
FROM tmp1 
GROUP BY city, year, month,current_count, count_next

However, the code above works for current_count only, the count_next shows the same result as the current_count.

How to show the count for next month and previous month? With window functions?

Erwin Brandstetter
186k28 gold badges464 silver badges636 bronze badges
asked Mar 14, 2020 at 22:34
1
  • Postgres version, actual (minimal) table definitions, sample values and desired result would be instrumental for the question. Commented Mar 15, 2020 at 4:19

1 Answer 1

2

Do the aggregation first. Then use the lead() and lag() window functions. Looks like you were aiming for something like this:

SELECT *
 , lead(curr_count) OVER w AS next_count
 , lag(curr_count) OVER w AS prev_count
FROM (
 SELECT l.city, d.year, d.month, count(*) AS curr_count
 FROM fact f
 JOIN crime c ON c.crime_key = f.crime_key -- see below
 JOIN date d ON d.date_key = f.date_key
 JOIN location l ON l.location_key = f.location_key
 GROUP BY l.city, d.year, d.month
 ) sub
WINDOW w AS (PARTITION BY city ORDER BY year, month)
ORDER BY city, year, month;

You can use a CTE or a subquery (like I did) to apply the logic in two steps. May be easier to understand at first. But you can add window functions over aggregate functions in the same query level. See:

So this is equivalent and shorter (probably same query plan / performance):

SELECT l.city, d.year, d.month
 , count(*) AS curr_count
 , lead(count(*)) OVER w AS next_count
 , lag(count(*)) OVER w AS prev_count
FROM fact f
JOIN crime c USING (crime_key) -- see below
JOIN date d USING (date_key)
JOIN location l USING (location_key)
GROUP BY l.city, d.year, d.month
WINDOW w AS (PARTITION BY l.city ORDER BY d.year, d.month);

Use equivalent explicit join syntax. Much easier to read.

The USING shorthand in the join clause of my second query is only legit if column names are unambiguous. Else stick to the explicit ON syntax in the first query.

Why join to table crime? Does the join eliminate or multiply rows? Else you can remove it completely.

Use table aliases to keep the noise low.

You had PARTITION BY location.city, date.year, date.month ORDER BY date.month. But I suggest you really want PARTITION BY city ORDER BY year, month to make the query work across year boundaries?

It would probably be more efficient to use a single column of type date (4 bytes) instead of year and month. (But don't name it just "date".)

Note how I skipped ORDER BY city, year, month in the second query. You normally get rows sorted that way anyway due to the window definition, but there are no guarantees without explicit ORDER BY. And you did not specify that order is required. If so, you might want to be explicit about this and add ORDER BY.

answered Mar 15, 2020 at 4:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.