2

I have a view, which will select data from table reports. The select will filter the data based on multiple dates.

reports table: id, key_id, position, created_at

My select:

pr.key_id,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '1' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS yesterday,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '2' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_week,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '3' month) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_month,

Even if I group by key_id and date the result is not grouped.

Example of result I get now:

  1. key_id: 1 yesterday: {1} last_week: {0} last_month: {0} created_at: '2014-04-11'
  2. key_id: 1 yesterday: {0} last_week: {1} last_month: {0} created_at: '2014-04-10'
  3. key_id: 1 yesterday: {0} last_week: {0} last_month: {1} created_at: '2014-04-09'

What I expect:

  1. key_id: 1 yesterday: {1} last_week: {1} last_month: {1} created_at: '2014-04-11'

With this query I get what I want. Is there a better way of doing it?

SELECT
 k.name,
 array_agg(DISTINCT position),
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' day)) as y_position,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '7' day)) as last_week,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' month)) as last_month,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '3' month)) as three_month,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '6' month)) as six_month,
 (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '12' month)) as last_year,
 date(pr.created_at)
FROM project_keywords pk
INNER JOIN keywords k ON pk.keyword_id = k.id
INNER JOIN project_reports pr USING (project_id, keyword_id)
GROUP BY k.name, date(pr.created_at), pr.project_id, pr.keyword_id
Michael Green
25.3k13 gold badges54 silver badges100 bronze badges
asked Apr 11, 2014 at 11:00
7
  • 1
    So what COALESCE is supposed to do there? Anyway, what you included is not a query, just a piece of it. Also, without example data it is hard to get what you are asking about. Still, I guess you want to do something like SELECT (SELECT {yesterday}), (SELECT {last_week}), (SELECT {last_month}) Commented Apr 11, 2014 at 11:12
  • Add the full query you used, not a crippled version. Commented Apr 11, 2014 at 11:34
  • @dezso Sorry, for not printing all the query, also with your answer I got what I wanted, is there any other way to get the same data without so many selects? Commented Apr 11, 2014 at 11:36
  • @alex It seems you are trying to squeeze 2 reports into one, with this query. Commented Apr 11, 2014 at 11:47
  • 1
    @ypercube: I don't think windows function would get us far here, since the correlated subqueries return constant values per (keyword_id, project_id). I think I squeezed the maximum out of this one by joining to a single, cross-tabulated subquery ... Commented Apr 11, 2014 at 21:12

1 Answer 1

2

If I guessed the missing pieces correctly, this query using a crosstab() function should do the job (and much faster than the original - except for small cardinalities where not much can be gained to begin with):

It requires that you install the additional module tablefunc first (once per database. Detailed instructions in this related answer on SO:
PostgreSQL Crosstab Query

SELECT name, project_id, positions, created_day
 ,yesterday, last_week, last_month, three_month, six_month, last_year
FROM (
 SELECT k.name, pr.keyword_id, pr.project_id, pr.created_at::date AS created_day
 , array_agg(DISTINCT pr.position) AS positions
 FROM project_reports pr
 JOIN keywords k ON k.id = pr.keyword_id
 GROUP BY k.id, pr.project_id, created_day -- k.id being the pk
 ) sub
LEFT JOIN crosstab(
 $$SELECT ARRAY[keyword_id, project_id], created_at::date, array_agg(DISTINCT position)
 FROM project_reports
 WHERE created_at::date IN (
 current_date - 1
 , current_date - 7
 , (now() - interval '1 mon')::date
 , (now() - interval '3 mon')::date
 , (now() - interval '6 mon')::date
 , (now() - interval '12 mon')::date
 )
 GROUP BY 1,2
 ORDER BY 1,2$$
 ,$$VALUES 
 (current_date - 1)
 ,(current_date - 7)
 ,((now() - interval '1 mon')::date)
 ,((now() - interval '3 mon')::date)
 ,((now() - interval '6 mon')::date)
 ,((now() - interval '12 mon')::date)$$
 ) AS t (
 kp int[]
 , yesterday int[]
 , last_week int[]
 , last_month int[]
 , three_month int[]
 , six_month int[]
 , last_year int[]
 ) ON t.kp[1] = keyword_id
 AND t.kp[2] = project_id;

To break it down, a simple demo of what I am doing in the crosstab:

SELECT * FROM crosstab(
 $$SELECT *
 FROM (
 VALUES
 ('{1,2}'::int[], current_date - 1, '{1,2}'::int[])
 , ('{1,3}'::int[], current_date - 1, '{1,2}'::int[])
 , ('{1,2}', current_date - 7, '{5,6,7}'::int[])
 , ('{1,3}', current_date - 7, '{5,6,7}'::int[])
 , ('{1,2}', (now() - interval '1 mon')::date, '{9,10}'::int[])
 -- 3 mon missing
 , ('{1,2}', (now() - interval '6 mon')::date, '{6,6,6}'::int[])
 -- 12 mon missing
 ) sub
 ORDER BY 1,2$$
 
,$$VALUES 
 (current_date - 1)
 ,(current_date - 7)
 ,((now() - interval '1 mon')::date)
 ,((now() - interval '3 mon')::date)
 ,((now() - interval '6 mon')::date)
 ,((now() - interval '12 mon')::date)$$
) AS t (
 kp int[]
 , yesterday int[]
 , last_week int[]
 , last_month int[]
 , three_month int[]
 , six_month int[]
 , last_year int[]
 );

Result:

 kp | yesterday | last_week | last_month | three_month | six_month | last_year
-------+-----------+-----------+------------+-------------+-----------+-----------
 {1,2} | {1,2} | {5,6,7} | {9,10} | | {6,6,6} |
 {1,3} | {1,2} | {5,6,7} | | | |

The rest should be obvious.

Comments

  • I de-coupled the two parts and fused the plethora of correlated subqueries into a single subquery, which should be much faster. Then I use cross tabulation pull individual columns from this query. The syntax for crosstab() is the tricky part.

  • I am not joining to project_keywords, it seems to be just noise. I am suspecting problems in your relational model.

  • I seems you should include the project_id in the result to make sense. Else you should not GROUP BY it ...

  • You can just subtract integer from date.

answered Apr 11, 2014 at 21:07
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.