I have the following table:
-- id, location_id, status, posted_year, posted_quarter
CREATE TABLE foo AS
SELECT * FROM ( VALUES
(1 ,12,'active' ,2014,3),
(2 ,12,'inactive',2014,3),
(3 ,12,'active' ,2014,3),
(4 ,12,'active' ,2014,4),
(5 ,12,'inactive',2014,4),
(6 ,13,'active' ,2015,1),
(7 ,13,'active' ,2015,1),
(8 ,13,'inactive',2015,1),
(9 ,13,'active' ,2015,2),
(10,13,'active' ,2015,2),
(11,13,'inactive',2015,3),
(12,13,'active' ,2015,4),
(13,13,'active' ,2015,4),
(14,13,'inactive',2015,4),
(15,12,'active' ,2015,1),
(16,13,'active' ,2015,1),
(17,12,'inactive',2015,1),
(18,12,'active' ,2015,2)
) AS t(id,location_id,status,posted_year,posted_quarter);
I want to recreate this table but have only one quarter per year from each location.
We might have more than one record in a year for a quarter, and in this case we need to check the status
based on:
If one status is active
for a year in selected quarter, the status
will be active
, otherwise the status
will be inactive
.
Examples:
location_id 12 for year 2014 and quarter 3 will have one record in the new table with status active.
location_id 12 for year 2015 and quarter 1 will have one record in the new table with status inactive.
How to write this query?
2 Answers 2
The only trick here is to create something you can group by that involves the year, and the quarter. This isn't the only way to do it, but you can do this...
make_timestamp(posted_year,1,1,0,0,0)::date
+ posted_quarter*3*'1 month'::interval
Remember, three months in a quarter-year. You can also do posted_year + posted_quater*0.25
but all the same.
From there, it's pretty basic. I'll use a CTE to separate formatting from calculation (feel free to ditch it for speed).
DISTINCT ON()
WITH t AS (
SELECT (make_timestamp(posted_year,1,1,0,0,0)::date + posted_quarter*3*'1 month'::interval)::date AS qtr, *
FROM foo
)
SELECT DISTINCT ON ( location_id, qtr ) location_id, qtr, status
FROM t
ORDER BY location_id, qtr, status='active' DESC;
GROUP BY
... bool_or()
Or, alternatively (and maybe faster),
WITH t AS (
SELECT (make_timestamp(posted_year,1,1,0,0,0)::date + posted_quarter*3*'1 month'::interval)::date AS qtr, *
FROM foo
)
SELECT location_id,
qtr,
CASE WHEN bool_or(status='active') THEN 'active' ELSE 'inactive' END
FROM t
GROUP BY location_id, qtr
ORDER BY location_id, qtr;
Assuming the id
column is just a meaningless serial column, all you need is a simple aggregation.
Create the table first, with an actual serial
column to auto-assign defaults.
CREATE TABLE foo (
foo_id serial -- serial!
, location_id int NOT NULL -- REFERENCES locations(location_id)
, posted_year int NOT NULL -- might be smallint
, posted_quarter int NOT NULL -- might be smallint
, status_active boolean NOT NULL -- boolean!
);
Then insert aggregated data, without id
:
INSERT INTO foo(location_id, posted_year, posted_quarter, status_active)
SELECT location_id
, posted_year
, posted_quarter
, CASE WHEN min(status) = 'active' THEN true ELSE false END
FROM (
VALUES
(1 ,12,'active' ,2014,3),
(2 ,12,'inactive',2014,3),
(3 ,12,'active' ,2014,3),
(4 ,12,'active' ,2014,4),
(5 ,12,'inactive',2014,4),
(6 ,13,'active' ,2015,1),
(7 ,13,'active' ,2015,1),
(8 ,13,'inactive',2015,1),
(9 ,13,'active' ,2015,2),
(10,13,'active' ,2015,2),
(11,13,'inactive',2015,3),
(12,13,'active' ,2015,4),
(13,13,'active' ,2015,4),
(14,13,'inactive',2015,4),
(15,12,'active' ,2015,1),
(16,13,'active' ,2015,1),
(17,12,'inactive',2015,1),
(18,12,'active' ,2015,2)
) t(id, location_id, status, posted_year, posted_quarter)
GROUP BY posted_year, posted_quarter, location_id
ORDER BY posted_year, posted_quarter, location_id;
ALTER TABLE foo
ADD PRIMARY KEY (foo_id)
, ADD UNIQUE (location_id, posted_year, posted_quarter);
Result:
foo_id | location_id | posted_year | posted_quarter | status_active --------+-------------+-------------+----------------+--------------- 1 | 12 | 2014 | 3 | t 2 | 12 | 2014 | 4 | t 3 | 12 | 2015 | 1 | t 4 | 13 | 2015 | 1 | t 5 | 12 | 2015 | 2 | t 6 | 13 | 2015 | 2 | t 7 | 13 | 2015 | 3 | f 8 | 13 | 2015 | 4 | t
Since 'active' sorts before 'inactive', min(status)
will return 'active' if any row in the group is active. Convert to boolean
right away to fit the boolean
column in the table.
The added UNIQUE
constraint disables dupes in the future. You might make (location_id, posted_year, posted_quarter)
the multi-column PRIMARY KEY
instead and drop foo_id
altogether. That's a matter of taste and other requirements.
Either way, it's cheaper to add theses constraints after you fill the table.
Related:
Explore related questions
See similar questions with these tags.
UNIQUE(location_id, id, posted_quater)
?bool
:is_active
(if those are the only states)