This is similar to Postgresql group by and count() problem, missing row with 0 result but different in genesis and details.
I have a table of temperatures in Celsius and dates and I use generate_series()
and grouping to identify the number of days in a month which are below a temperature for a range of temperatures. This is great, except I'd also like to include the months where there are no days below a temperature. Is this possible without a join?
I'm using Postgres 16
CREATE TABLE public.daily (
dom date NOT NULL,
tavg numeric( 3, 1 ) NOT NULL,
CONSTRAINT unique_weather_dt UNIQUE( dom ) )
INSERT INTO daily(dom, tavg)
VALUES
('1969-07-11', 13.9)
, ('1969-07-14', 16.1)
;
SELECT (date_trunc('month'::text, (daily.dom)::timestamp))::date AS month,
tp.dg AS upper_limit,
count(tp.dg) AS days
FROM daily,
generate_series(5, 14) tp(dg)
WHERE (daily.tavg <= (tp.dg)::numeric)
GROUP BY tp.dg, ((date_trunc('month'::text, (daily.dom)::timestamp ))::date)
ORDER BY tp.dg, ((date_trunc('month'::text, (daily.dom)::timestamp ))::date);
I think one solution would simple be to create a table for all items with the series with days set to 0 and use an outer join which to include rows when the month is missing in the original query but I'm struggling to do this elegantly. Thus far I've resorted to left joining later on after processing (including pivoting).
2 Answers 2
This computes the number of days for every given temperature (upper_limit
) in every month. If no rows qualify for month, a row with a count of 0 is filled in:
SELECT m.month
, tp.upper_limit
, count(d.dom) AS days
FROM generate_series(5, 14) tp(upper_limit) -- given range of temperature limits
CROSS JOIN ( -- full range of months
SELECT generate_series(a, z, interval '1 month')::date AS month -- full range of months
FROM (
SELECT date_trunc('month', min(dom)::timestamp) AS a
, date_trunc('month', max(dom)::timestamp) AS z
FROM daily d
) sub
) m
LEFT JOIN daily d ON d.tavg <= tp.upper_limit
AND d.dom >= m.month
AND d.dom < m.month + interval '1 month'
GROUP BY 1, 2
ORDER BY 1, 2;
In the fiddle, March is missing, and May is too warm. Either gets a count of 0 in the result.
I took lower and upper bound for the range of months from min & max in the table in subquery m
. Provide any range in its place.
-
Thanks, as ever, for a helpful reply. I'd also been looking at using a union or except using a correlated subquery: which months, upper_limits are in the set "no_days" but haven't got it working yet. I think the main difference is the left join you apply.Charlie Clark– Charlie Clark2024年04月19日 09:25:31 +00:00Commented Apr 19, 2024 at 9:25
I think Erwin has the most efficient answer because there are no function calls when GROUPing
and the UNION
of the two queries but I must admit it took me a while to understand the query. Some additional analysis also showed that the CROSS JOIN
initially creates a large set before grouping. It's definitely better to create a single range of months.
One snag that I hit with Edwin's solution was that a derived query led to divide by zero errors when days were 0. Of course, this can be solved using CASE clauses. But I'd like to keep my queries as simple as possible.
I tried various approaches and finally settled on a combination of views that gave me the flexibility, expressibility and reliability I was looking for.
I created three views for the temperature range, the cross join of this with the range of months and my original aggregate. This makes simple set logic: subset + (superset - subset) easy to implement and filtering on dates can be implemented using joins.
For completeness
CREATE VIEW temperature_range AS
SELECT upper_limit
FROM generate_series(7, 12) temps(upper_limit);
CREATE VIEW months AS
WITH months AS (
SELECT (generate_series((month_range.a)::timestamp with time zone, (month_range.z)::timestamp with time zone, '1 mon'::interval))::date AS month
FROM ( SELECT min(daily.dom) AS a,
max(daily.dom) AS z
FROM daily) month_range
)
SELECT months.month,
temperature_range.upper_limit
FROM months,
temperature_range;
CREATE VIEW cold_days AS
SELECT (date_trunc('month'::text, (daily.dom)::timestamp with time zone))::date AS month,
temperature_range.upper_limit,
count(*) AS days
FROM daily,
temperature_range
WHERE (daily.tavg <= (temperature_range.upper_limit)::numeric)
GROUP BY ((date_trunc('month'::text, (daily.dom)::timestamp with time zone))::date), temperature_range.upper_limit
ORDER BY ((date_trunc('month'::text, (daily.dom)::timestamp with time zone))::date), temperature_range.upper_limit;
CREATE VIEW no_days AS
SELECT months.month,
months.upper_limit
FROM months
EXCEPT
SELECT cold_days.month,
cold_days.upper_limit
FROM cold_days
ORDER BY 1, 2;
-- Query for the years 2022 and 2023
WITH months as (
SELECT month from months
WHERE month between '2022-01-01' and '2023-12-31'
)
select * from cold_days
JOIN months using (month)
union
select month, upper_limit, 0
from no_days
JOIN months using (month)
order by 1, 2
I have some additional views and queries that use this data and the same approach which makes it easier to make changes to things like the range of temperatures and dates.
INSERT
script) and the desired result. And always your version of Postgres.