1

I have the following table:

-- id, location_id, status, posted_year, posted_quarter
CREATE TABLE foo AS
SELECT * FROM ( VALUES
(1 ,12,'active' ,2014,3), 
(2 ,12,'inactive',2014,3),
(3 ,12,'active' ,2014,3),
(4 ,12,'active' ,2014,4),
(5 ,12,'inactive',2014,4),
(6 ,13,'active' ,2015,1),
(7 ,13,'active' ,2015,1),
(8 ,13,'inactive',2015,1),
(9 ,13,'active' ,2015,2),
(10,13,'active' ,2015,2),
(11,13,'inactive',2015,3),
(12,13,'active' ,2015,4),
(13,13,'active' ,2015,4),
(14,13,'inactive',2015,4),
(15,12,'active' ,2015,1),
(16,13,'active' ,2015,1),
(17,12,'inactive',2015,1),
(18,12,'active' ,2015,2)
) AS t(id,location_id,status,posted_year,posted_quarter);

I want to recreate this table but have only one quarter per year from each location.

We might have more than one record in a year for a quarter, and in this case we need to check the status based on:

If one status is active for a year in selected quarter, the status will be active, otherwise the status will be inactive.

Examples:

  • location_id 12 for year 2014 and quarter 3 will have one record in the new table with status active.

  • location_id 12 for year 2015 and quarter 1 will have one record in the new table with status inactive.

How to write this query?

Erwin Brandstetter
186k28 gold badges464 silver badges636 bronze badges
asked Dec 16, 2016 at 22:31
3
  • Is this table UNIQUE(location_id, id, posted_quater)? Commented Dec 16, 2016 at 22:49
  • I updated the table as requested and it is has unique id and repeated location_id, posted_quarter Commented Dec 16, 2016 at 23:11
  • @Eyla as some other general points of suggestion INACTIVE/ACTIVE should likely be a bool: is_active (if those are the only states) Commented Dec 17, 2016 at 0:11

2 Answers 2

2

The only trick here is to create something you can group by that involves the year, and the quarter. This isn't the only way to do it, but you can do this...

make_timestamp(posted_year,1,1,0,0,0)::date
+ posted_quarter*3*'1 month'::interval

Remember, three months in a quarter-year. You can also do posted_year + posted_quater*0.25 but all the same.

From there, it's pretty basic. I'll use a CTE to separate formatting from calculation (feel free to ditch it for speed).

DISTINCT ON()

WITH t AS (
 SELECT (make_timestamp(posted_year,1,1,0,0,0)::date + posted_quarter*3*'1 month'::interval)::date AS qtr, *
 FROM foo
)
SELECT DISTINCT ON ( location_id, qtr ) location_id, qtr, status
FROM t
ORDER BY location_id, qtr, status='active' DESC;

GROUP BY ... bool_or()

Or, alternatively (and maybe faster),

WITH t AS (
 SELECT (make_timestamp(posted_year,1,1,0,0,0)::date + posted_quarter*3*'1 month'::interval)::date AS qtr, *
 FROM foo
)
SELECT location_id,
 qtr,
 CASE WHEN bool_or(status='active') THEN 'active' ELSE 'inactive' END
FROM t
GROUP BY location_id, qtr
ORDER BY location_id, qtr;
answered Dec 16, 2016 at 22:55
2

Assuming the id column is just a meaningless serial column, all you need is a simple aggregation.

Create the table first, with an actual serial column to auto-assign defaults.

CREATE TABLE foo (
 foo_id serial -- serial!
 , location_id int NOT NULL -- REFERENCES locations(location_id)
 , posted_year int NOT NULL -- might be smallint
 , posted_quarter int NOT NULL -- might be smallint
 , status_active boolean NOT NULL -- boolean!
);

Then insert aggregated data, without id:

INSERT INTO foo(location_id, posted_year, posted_quarter, status_active)
SELECT location_id
 , posted_year
 , posted_quarter
 , CASE WHEN min(status) = 'active' THEN true ELSE false END
FROM (
 VALUES
 (1 ,12,'active' ,2014,3), 
 (2 ,12,'inactive',2014,3),
 (3 ,12,'active' ,2014,3),
 (4 ,12,'active' ,2014,4),
 (5 ,12,'inactive',2014,4),
 (6 ,13,'active' ,2015,1),
 (7 ,13,'active' ,2015,1),
 (8 ,13,'inactive',2015,1),
 (9 ,13,'active' ,2015,2),
 (10,13,'active' ,2015,2),
 (11,13,'inactive',2015,3),
 (12,13,'active' ,2015,4),
 (13,13,'active' ,2015,4),
 (14,13,'inactive',2015,4),
 (15,12,'active' ,2015,1),
 (16,13,'active' ,2015,1),
 (17,12,'inactive',2015,1),
 (18,12,'active' ,2015,2)
 ) t(id, location_id, status, posted_year, posted_quarter)
GROUP BY posted_year, posted_quarter, location_id
ORDER BY posted_year, posted_quarter, location_id;
ALTER TABLE foo
 ADD PRIMARY KEY (foo_id)
, ADD UNIQUE (location_id, posted_year, posted_quarter);

Result:

 foo_id | location_id | posted_year | posted_quarter | status_active
--------+-------------+-------------+----------------+---------------
 1 | 12 | 2014 | 3 | t
 2 | 12 | 2014 | 4 | t
 3 | 12 | 2015 | 1 | t
 4 | 13 | 2015 | 1 | t
 5 | 12 | 2015 | 2 | t
 6 | 13 | 2015 | 2 | t
 7 | 13 | 2015 | 3 | f
 8 | 13 | 2015 | 4 | t

Since 'active' sorts before 'inactive', min(status) will return 'active' if any row in the group is active. Convert to boolean right away to fit the boolean column in the table.

The added UNIQUE constraint disables dupes in the future. You might make (location_id, posted_year, posted_quarter) the multi-column PRIMARY KEY instead and drop foo_id altogether. That's a matter of taste and other requirements.

Either way, it's cheaper to add theses constraints after you fill the table.

Related:

answered Dec 18, 2016 at 12:09

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.