Select query for returning different data based on creation time

Question 1

I have a view, which will select data from table reports. The select will filter the data based on multiple dates.

reports table: id, key_id, position, created_at

My select:

pr.key_id,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '1' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS yesterday,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '2' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_week,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '3' month) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_month,

Even if I group by key_id and date the result is not grouped.

Example of result I get now:

key_id: 1 yesterday: {1} last_week: {0} last_month: {0} created_at: '2014-04-11'
key_id: 1 yesterday: {0} last_week: {1} last_month: {0} created_at: '2014-04-10'
key_id: 1 yesterday: {0} last_week: {0} last_month: {1} created_at: '2014-04-09'

What I expect:

key_id: 1 yesterday: {1} last_week: {1} last_month: {1} created_at: '2014-04-11'

With this query I get what I want. Is there a better way of doing it?

SELECT
 k.name,
 array_agg(DISTINCT position),
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' day)) as y_position,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '7' day)) as last_week,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' month)) as last_month,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '3' month)) as three_month,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '6' month)) as six_month,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '12' month)) as last_year,
 date(pr.created_at)
FROM project_keywords pk
INNER JOIN keywords k ON pk.keyword_id = k.id
INNER JOIN project_reports pr USING (project_id, keyword_id)
GROUP BY k.name, date(pr.created_at), pr.project_id, pr.keyword_id

Question 2

So what COALESCE is supposed to do there? Anyway, what you included is not a query, just a piece of it. Also, without example data it is hard to get what you are asking about. Still, I guess you want to do something like SELECT (SELECT {yesterday}), (SELECT {last_week}), (SELECT {last_month})

Question 3

Add the full query you used, not a crippled version.

Question 4

@dezso Sorry, for not printing all the query, also with your answer I got what I wanted, is there any other way to get the same data without so many selects?

Question 5

@alex It seems you are trying to squeeze 2 reports into one, with this query.

Question 6

@ypercube: I don't think windows function would get us far here, since the correlated subqueries return constant values per (keyword_id, project_id). I think I squeezed the maximum out of this one by joining to a single, cross-tabulated subquery ...

Question 7

If I guessed the missing pieces correctly, this query using a crosstab() function should do the job (and much faster than the original - except for small cardinalities where not much can be gained to begin with):

It requires that you install the additional module tablefunc first (once per database. Detailed instructions in this related answer on SO:
PostgreSQL Crosstab Query

SELECT name, project_id, positions, created_day
 ,yesterday, last_week, last_month, three_month, six_month, last_year
FROM (
 SELECT k.name, pr.keyword_id, pr.project_id, pr.created_at::date AS created_day
 , array_agg(DISTINCT pr.position) AS positions
 FROM project_reports pr
 JOIN keywords k ON k.id = pr.keyword_id
 GROUP BY k.id, pr.project_id, created_day -- k.id being the pk
 ) sub
LEFT JOIN crosstab(
 $$SELECT ARRAY[keyword_id, project_id], created_at::date, array_agg(DISTINCT position)
 FROM project_reports
 WHERE created_at::date IN (
 current_date - 1
 , current_date - 7
 , (now() - interval '1 mon')::date
 , (now() - interval '3 mon')::date
 , (now() - interval '6 mon')::date
 , (now() - interval '12 mon')::date
 )
 GROUP BY 1,2
 ORDER BY 1,2$$
 ,$$VALUES 
 (current_date - 1)
 ,(current_date - 7)
 ,((now() - interval '1 mon')::date)
 ,((now() - interval '3 mon')::date)
 ,((now() - interval '6 mon')::date)
 ,((now() - interval '12 mon')::date)$$
 ) AS t (
 kp int[]
 , yesterday int[]
 , last_week int[]
 , last_month int[]
 , three_month int[]
 , six_month int[]
 , last_year int[]
 ) ON t.kp[1] = keyword_id
 AND t.kp[2] = project_id;

To break it down, a simple demo of what I am doing in the crosstab:

SELECT * FROM crosstab(
 $$SELECT *
 FROM (
 VALUES
 ('{1,2}'::int[], current_date - 1, '{1,2}'::int[])
 , ('{1,3}'::int[], current_date - 1, '{1,2}'::int[])
 , ('{1,2}', current_date - 7, '{5,6,7}'::int[])
 , ('{1,3}', current_date - 7, '{5,6,7}'::int[])
 , ('{1,2}', (now() - interval '1 mon')::date, '{9,10}'::int[])
 -- 3 mon missing
 , ('{1,2}', (now() - interval '6 mon')::date, '{6,6,6}'::int[])
 -- 12 mon missing
 ) sub
 ORDER BY 1,2$$
 
,$$VALUES 
 (current_date - 1)
 ,(current_date - 7)
 ,((now() - interval '1 mon')::date)
 ,((now() - interval '3 mon')::date)
 ,((now() - interval '6 mon')::date)
 ,((now() - interval '12 mon')::date)$$
) AS t (
 kp int[]
 , yesterday int[]
 , last_week int[]
 , last_month int[]
 , three_month int[]
 , six_month int[]
 , last_year int[]
 );

Result:

 kp | yesterday | last_week | last_month | three_month | six_month | last_year
-------+-----------+-----------+------------+-------------+-----------+-----------
 {1,2} | {1,2} | {5,6,7} | {9,10} | | {6,6,6} |
 {1,3} | {1,2} | {5,6,7} | | | |

The rest should be obvious.

Comments

I de-coupled the two parts and fused the plethora of correlated subqueries into a single subquery, which should be much faster. Then I use cross tabulation pull individual columns from this query. The syntax for crosstab() is the tricky part.
I am not joining to project_keywords, it seems to be just noise. I am suspecting problems in your relational model.
I seems you should include the project_id in the result to make sense. Else you should not GROUP BY it ...
You can just subtract integer from date.

score 2 · Accepted Answer · 2014-04-11 21:07:26Z

If I guessed the missing pieces correctly, this query using a crosstab() function should do the job (and much faster than the original - except for small cardinalities where not much can be gained to begin with):

It requires that you install the additional module tablefunc first (once per database. Detailed instructions in this related answer on SO:
PostgreSQL Crosstab Query

SELECT name, project_id, positions, created_day
 ,yesterday, last_week, last_month, three_month, six_month, last_year
FROM (
 SELECT k.name, pr.keyword_id, pr.project_id, pr.created_at::date AS created_day
 , array_agg(DISTINCT pr.position) AS positions
 FROM project_reports pr
 JOIN keywords k ON k.id = pr.keyword_id
 GROUP BY k.id, pr.project_id, created_day -- k.id being the pk
 ) sub
LEFT JOIN crosstab(
 $$SELECT ARRAY[keyword_id, project_id], created_at::date, array_agg(DISTINCT position)
 FROM project_reports
 WHERE created_at::date IN (
 current_date - 1
 , current_date - 7
 , (now() - interval '1 mon')::date
 , (now() - interval '3 mon')::date
 , (now() - interval '6 mon')::date
 , (now() - interval '12 mon')::date
 )
 GROUP BY 1,2
 ORDER BY 1,2$$
 ,$$VALUES 
 (current_date - 1)
 ,(current_date - 7)
 ,((now() - interval '1 mon')::date)
 ,((now() - interval '3 mon')::date)
 ,((now() - interval '6 mon')::date)
 ,((now() - interval '12 mon')::date)$$
 ) AS t (
 kp int[]
 , yesterday int[]
 , last_week int[]
 , last_month int[]
 , three_month int[]
 , six_month int[]
 , last_year int[]
 ) ON t.kp[1] = keyword_id
 AND t.kp[2] = project_id;

To break it down, a simple demo of what I am doing in the crosstab:

SELECT * FROM crosstab(
 $$SELECT *
 FROM (
 VALUES
 ('{1,2}'::int[], current_date - 1, '{1,2}'::int[])
 , ('{1,3}'::int[], current_date - 1, '{1,2}'::int[])
 , ('{1,2}', current_date - 7, '{5,6,7}'::int[])
 , ('{1,3}', current_date - 7, '{5,6,7}'::int[])
 , ('{1,2}', (now() - interval '1 mon')::date, '{9,10}'::int[])
 -- 3 mon missing
 , ('{1,2}', (now() - interval '6 mon')::date, '{6,6,6}'::int[])
 -- 12 mon missing
 ) sub
 ORDER BY 1,2$$
 
,$$VALUES 
 (current_date - 1)
 ,(current_date - 7)
 ,((now() - interval '1 mon')::date)
 ,((now() - interval '3 mon')::date)
 ,((now() - interval '6 mon')::date)
 ,((now() - interval '12 mon')::date)$$
) AS t (
 kp int[]
 , yesterday int[]
 , last_week int[]
 , last_month int[]
 , three_month int[]
 , six_month int[]
 , last_year int[]
 );

Result:

 kp | yesterday | last_week | last_month | three_month | six_month | last_year
-------+-----------+-----------+------------+-------------+-----------+-----------
 {1,2} | {1,2} | {5,6,7} | {9,10} | | {6,6,6} |
 {1,3} | {1,2} | {5,6,7} | | | |

The rest should be obvious.

Comments

I de-coupled the two parts and fused the plethora of correlated subqueries into a single subquery, which should be much faster. Then I use cross tabulation pull individual columns from this query. The syntax for crosstab() is the tricky part.
I am not joining to project_keywords, it seems to be just noise. I am suspecting problems in your relational model.
I seems you should include the project_id in the result to make sense. Else you should not GROUP BY it ...
You can just subtract integer from date.

Stack Exchange Network

Select query for returning different data based on creation time

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Select query for returning different data based on creation time

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions