Postgres Nested WHEN Aggregate Function

Question 1

In PostgreSQL (version 9.4) I am trying to construct a query to determine in various tranches how many times various numbers appear in the data set between ranges. When I query group-ing by "SettlementPointPrice" the count() function works correctly bucketing prices into individual tranches as I would expect. However, this creates hundreds of rows. I am looking for the ability (sub-query?) to aggregate the sum of count()'s in each bucket into one single row. What is the best way to manage this in SQL?

I am using a statement like this (full SQL below) for each bucket/tranche:

CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 0 AND 10
 THEN count(*) ELSE 0 END AS "DA 0ドル - 10ドル",

When I group by "SettlementPointPrice" (two prices in this example) to confirm the data counts the buckets correctly. As the table below shows.

Raw data from individual two days:

Row | "SettlementPointPrice" | 0-10 | 11-20 | 21-30
1 | 18 | 0 | 1 | 0
2 | 22 | 0 | 0 | 1

However, I am unable to get the aggregated summation by grouping them all together. I assume this is a sub-query?

I would like the result to be as such:

Row | 0-10 | 11-20 | 21-30
1 | 0 | 1 | 1

Full SQL code:

SELECT
 "DA-A"."SettlementPointPrice",
 CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 0 AND 10
 THEN count(*) ELSE 0 END AS "DA 0ドル - 10ドル",
 CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 11 AND 20
 THEN COUNT(*) ELSE 0 END AS "DA 11ドル - 20ドル",
 CASE WHEN (round(sum("DA-A"."SettlementPointPrice"),2)) BETWEEN 21 AND 30
 THEN COUNT(*) ELSE 0 END AS "DA 21ドル - 30ドル"
FROM 
 public.da "DA-A", 
 public.rt_aggregate "RT-A"
WHERE 
 "RT-A"."DeliveryDate" = "DA-A"."DeliveryDate" AND
 "RT-A"."SettlementPointName" = "DA-A"."SettlementPointName" AND
 "DA-A"."SettlementPointName" = 'John' AND 
 "DA-A"."DeliveryDate" >= '2015-02-01' AND
 "DA-A"."DeliveryDate" <= '2015-02-20' AND
 ("RT-A"."DeliveryHour" = 14) and 
 date_part('hour', "DA-A"."DeliveryHour") = "RT-A"."DeliveryHour"
GROUP BY
 "DA-A"."SettlementPointPrice",
 "DA-A"."SettlementPointName"

Question 2

As always your version of Postgres, please. And the name is PostgreSQL or Postgres for short. Never "Postgre".

Question 3

When you write grouping them all together, do you mean all selected rows, so that you get a single row aggregating all counts? Your column "Row" is a bit confusing, that's just noise, right? And it's unclear where the column Source DA in the result comes from. Does not agree with the query you show.

Question 4

EB: Yes, I mean aggregating all the rows and the count values into a single value. For example say there are ten rows (ten days) I just want the summation of those ten days sum(count(*)) as an example by bucket.

Question 5

EB: "Row" is just just noise, but also highlights how I am getting two rows now, but really just want one single row with everything as a sum in each bucket.

Question 6

EB: Sorry, Source DA should read: "SettlementPointPrice". Fixed.

Question 7

After some processing this boiled down to:

While your predicate d."SettlementPointName" = 'John' is filtering a single value for "SettlementPointName" anyway, simplify to:

SELECT count( d."SettlementPointPrice" < 10.5 OR NULL) AS da_00_10
 , count(d."SettlementPointPrice" >= 10.5 AND d."SettlementPointPrice" < 20.5 OR NULL) AS da_11_20
 , count(d."SettlementPointPrice" >= 20.5 AND d."SettlementPointPrice" < 30.5 OR NULL) AS da_21_30
FROM public.da d
JOIN public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE d."SettlementPointName" = 'John'
AND d."DeliveryDate" >= '2015-02-01'
AND d."DeliveryDate" <= '2015-02-20'
AND r."DeliveryHour" = 14
AND date_part('hour', d."DeliveryHour") = r."DeliveryHour";

About the counting technique:

For absolute performance, is SUM faster or COUNT?

Or better, yet, use the new aggregate filter technique in pg 9.4:

SELECT d."SettlementPointName"
 , count(*) FILTER (WHERE d."SettlementPointPrice" < 10.5) AS da_00_10
 , count(*) FILTER (WHERE d."SettlementPointPrice" >= 10.5
 AND d."SettlementPointPrice" < 20.5) AS da_11_20
 , count(*) FILTER (WHERE d."SettlementPointPrice" >= 20.5
 AND d."SettlementPointPrice" < 30.5) AS da_21_30
FROM public.da d
JOIN public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE d."DeliveryDate" >= '2015-02-01'
AND d."DeliveryDate" <= '2015-02-20'
AND r."DeliveryHour" = 14
AND date_part('hour', d."DeliveryHour") = r."DeliveryHour"
GROUP BY 1;

This time, selecting all names and returning one row per name like you asked in the comment.

Details for FILTER:

Return counts for multiple ranges in a single SELECT statement

Question 8

@JAS: Consider the added notes.

Question 9

One additional question to your code. If remove the WHERE d."SettlementPointName" = 'John' and want a GROUP BY d."SettlementPointName" how is that done through the sub select?

Question 10

Your question is a bit light on your actual objective. My guess is you just need the second query, where you can just add GROUP BY d."SettlementPointName" to get one row with counts per name. No subquery.

Question 11

EB: Your second update with additional SQL solved what I was looking for:

score 0 · Accepted Answer · 2015-02-26 04:42:31Z

After some processing this boiled down to:

While your predicate d."SettlementPointName" = 'John' is filtering a single value for "SettlementPointName" anyway, simplify to:

SELECT count( d."SettlementPointPrice" < 10.5 OR NULL) AS da_00_10
 , count(d."SettlementPointPrice" >= 10.5 AND d."SettlementPointPrice" < 20.5 OR NULL) AS da_11_20
 , count(d."SettlementPointPrice" >= 20.5 AND d."SettlementPointPrice" < 30.5 OR NULL) AS da_21_30
FROM public.da d
JOIN public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE d."SettlementPointName" = 'John'
AND d."DeliveryDate" >= '2015-02-01'
AND d."DeliveryDate" <= '2015-02-20'
AND r."DeliveryHour" = 14
AND date_part('hour', d."DeliveryHour") = r."DeliveryHour";

About the counting technique:

For absolute performance, is SUM faster or COUNT?

Or better, yet, use the new aggregate filter technique in pg 9.4:

SELECT d."SettlementPointName"
 , count(*) FILTER (WHERE d."SettlementPointPrice" < 10.5) AS da_00_10
 , count(*) FILTER (WHERE d."SettlementPointPrice" >= 10.5
 AND d."SettlementPointPrice" < 20.5) AS da_11_20
 , count(*) FILTER (WHERE d."SettlementPointPrice" >= 20.5
 AND d."SettlementPointPrice" < 30.5) AS da_21_30
FROM public.da d
JOIN public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE d."DeliveryDate" >= '2015-02-01'
AND d."DeliveryDate" <= '2015-02-20'
AND r."DeliveryHour" = 14
AND date_part('hour', d."DeliveryHour") = r."DeliveryHour"
GROUP BY 1;

This time, selecting all names and returning one row per name like you asked in the comment.

Details for FILTER:

Return counts for multiple ranges in a single SELECT statement

One additional question to your code. If remove the WHERE d."SettlementPointName" = 'John' and want a GROUP BY d."SettlementPointName" how is that done through the sub select?
Your question is a bit light on your actual objective. My guess is you just need the second query, where you can just add GROUP BY d."SettlementPointName" to get one row with counts per name. No subquery.
EB: Your second update with additional SQL solved what I was looking for:

Stack Exchange Network

Postgres Nested WHEN Aggregate Function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Postgres Nested WHEN Aggregate Function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions