I have a view, which will select data from table reports
. The select will filter the data based on multiple dates.
reports table
: id, key_id, position, created_at
My select:
pr.key_id,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '1' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS yesterday,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '2' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_week,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '3' month) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_month,
Even if I group by key_id and date the result is not grouped.
Example of result I get now:
- key_id: 1 yesterday: {1} last_week: {0} last_month: {0} created_at: '2014-04-11'
- key_id: 1 yesterday: {0} last_week: {1} last_month: {0} created_at: '2014-04-10'
- key_id: 1 yesterday: {0} last_week: {0} last_month: {1} created_at: '2014-04-09'
What I expect:
- key_id: 1 yesterday: {1} last_week: {1} last_month: {1} created_at: '2014-04-11'
With this query I get what I want. Is there a better way of doing it?
SELECT
k.name,
array_agg(DISTINCT position),
(SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' day)) as y_position,
(SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '7' day)) as last_week,
(SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' month)) as last_month,
(SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '3' month)) as three_month,
(SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '6' month)) as six_month,
(SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '12' month)) as last_year,
date(pr.created_at)
FROM project_keywords pk
INNER JOIN keywords k ON pk.keyword_id = k.id
INNER JOIN project_reports pr USING (project_id, keyword_id)
GROUP BY k.name, date(pr.created_at), pr.project_id, pr.keyword_id
1 Answer 1
If I guessed the missing pieces correctly, this query using a crosstab()
function should do the job (and much faster than the original - except for small cardinalities where not much can be gained to begin with):
It requires that you install the additional module tablefunc
first (once per database. Detailed instructions in this related answer on SO:
PostgreSQL Crosstab Query
SELECT name, project_id, positions, created_day
,yesterday, last_week, last_month, three_month, six_month, last_year
FROM (
SELECT k.name, pr.keyword_id, pr.project_id, pr.created_at::date AS created_day
, array_agg(DISTINCT pr.position) AS positions
FROM project_reports pr
JOIN keywords k ON k.id = pr.keyword_id
GROUP BY k.id, pr.project_id, created_day -- k.id being the pk
) sub
LEFT JOIN crosstab(
$$SELECT ARRAY[keyword_id, project_id], created_at::date, array_agg(DISTINCT position)
FROM project_reports
WHERE created_at::date IN (
current_date - 1
, current_date - 7
, (now() - interval '1 mon')::date
, (now() - interval '3 mon')::date
, (now() - interval '6 mon')::date
, (now() - interval '12 mon')::date
)
GROUP BY 1,2
ORDER BY 1,2$$
,$$VALUES
(current_date - 1)
,(current_date - 7)
,((now() - interval '1 mon')::date)
,((now() - interval '3 mon')::date)
,((now() - interval '6 mon')::date)
,((now() - interval '12 mon')::date)$$
) AS t (
kp int[]
, yesterday int[]
, last_week int[]
, last_month int[]
, three_month int[]
, six_month int[]
, last_year int[]
) ON t.kp[1] = keyword_id
AND t.kp[2] = project_id;
To break it down, a simple demo of what I am doing in the crosstab:
SELECT * FROM crosstab(
$$SELECT *
FROM (
VALUES
('{1,2}'::int[], current_date - 1, '{1,2}'::int[])
, ('{1,3}'::int[], current_date - 1, '{1,2}'::int[])
, ('{1,2}', current_date - 7, '{5,6,7}'::int[])
, ('{1,3}', current_date - 7, '{5,6,7}'::int[])
, ('{1,2}', (now() - interval '1 mon')::date, '{9,10}'::int[])
-- 3 mon missing
, ('{1,2}', (now() - interval '6 mon')::date, '{6,6,6}'::int[])
-- 12 mon missing
) sub
ORDER BY 1,2$$
,$$VALUES
(current_date - 1)
,(current_date - 7)
,((now() - interval '1 mon')::date)
,((now() - interval '3 mon')::date)
,((now() - interval '6 mon')::date)
,((now() - interval '12 mon')::date)$$
) AS t (
kp int[]
, yesterday int[]
, last_week int[]
, last_month int[]
, three_month int[]
, six_month int[]
, last_year int[]
);
Result:
kp | yesterday | last_week | last_month | three_month | six_month | last_year
-------+-----------+-----------+------------+-------------+-----------+-----------
{1,2} | {1,2} | {5,6,7} | {9,10} | | {6,6,6} |
{1,3} | {1,2} | {5,6,7} | | | |
The rest should be obvious.
Comments
I de-coupled the two parts and fused the plethora of correlated subqueries into a single subquery, which should be much faster. Then I use cross tabulation pull individual columns from this query. The syntax for
crosstab()
is the tricky part.I am not joining to
project_keywords
, it seems to be just noise. I am suspecting problems in your relational model.I seems you should include the
project_id
in the result to make sense. Else you should not GROUP BY it ...You can just subtract
integer
fromdate
.
Explore related questions
See similar questions with these tags.
COALESCE
is supposed to do there? Anyway, what you included is not a query, just a piece of it. Also, without example data it is hard to get what you are asking about. Still, I guess you want to do something likeSELECT (SELECT {yesterday}), (SELECT {last_week}), (SELECT {last_month})
(keyword_id, project_id)
. I think I squeezed the maximum out of this one by joining to a single, cross-tabulated subquery ...