Percentile rank that takes sorted argument (or same functionality) in PostgreSQL 9.3.5

Question 1

How do I calculate the percentile of the nth element in a field relative to the ones previous to it in a time series for each subset (partition) of my data?

For example, this is exactly what I want to do, except it is a cumulative sum, not a percentile:

select * 
from 
(SELECT trade_date, daily_val, id, company_id, sum(daily_val)
 over (partition by company_id order by trade_date rows between unbounded preceding and current row) AS sum
FROM daily_data
where (company_id = 858 or company_id = 726) and wh_calc_id = 156) as x
order by company_id,trade_date

This is what I want to do:

select * 
from 
(SELECT trade_date, daily_val, id, company_id, percent_rank(daily_val)
 over (partition by company_id order by trade_date rows between unbounded preceding and current row) AS pctl
FROM daily_data
where (company_id = 858 or company_id = 726) and wh_calc_id = 156) as x
order by company_id,trade_date

(Note: the only difference between the two is sum() vs percent_rank())

It seems I need a percentile rank function available in PostgreSQL 9.5 that takes a sorted expression argument. I know it exists for more recent versions:
https://www.postgresql.org/docs/9.5/static/functions-aggregate.html

And the 9.3 version doesn't seem to take an argument:
https://www.postgresql.org/docs/9.3/static/functions-window.html

I really don't want to upgrade PostgreSQL as I already have a lot going on right now.

Here is some simplified example output of what I want (created in Excel):

trade_date daily_val pctl company_id
1/1/1900 1484.881621 1.000000 1
1/2/1900 1465.465315 0.000000 1
1/3/1900 1471.244094 0.500000 1
1/4/1900 1475.109578 0.666000 1
1/5/1900 1484.881621 0.750000 1
1/6/1900 1508.863507 1.000000 1
1/7/1900 1519.089431 1.000000 1
1/8/1900 1504.819339 0.714000 1
1/9/1900 1523.210769 1.000000 1
1/10/1900 1514.990335 0.777000 1
1/11/1900 1550.595515 1.000000 1
1/12/1900 1529.454907 0.909000 1
1/13/1900 1546.324863 0.916000 1
1/14/1900 1542.077672 0.846000 1
1/15/1900 1561.396941 1.000000 1
1/16/1900 1519.089431 0.533000 1
1/17/1900 1498.774337 0.312000 1
1/18/1900 1486.847801 0.294000 1
1/19/1900 1488.819195 0.333000 1
1/20/1900 1506.83871 0.473000 1
1/21/1900 1500.777505 0.400000 1
1/1/1900 149.113683 1.000000 5
1/2/1900 155.505015 1.000000 5
1/3/1900 154.15655 0.500000 5
1/4/1900 160.425942 1.000000 5
1/5/1900 156.29799 0.750000 5
1/6/1900 164.058078 1.000000 5
1/7/1900 159.935225 0.666000 5
1/8/1900 158.537455 0.571000 5
1/9/1900 154.147639 0.125000 5
1/10/1900 155.722962 0.444000 5
1/11/1900 155.505015 0.300000 5
1/12/1900 162.766282 0.909000 5
1/13/1900 169.264949 1.000000 5
1/14/1900 167.73179 0.923000 5
1/15/1900 162.280915 0.714000 5
1/16/1900 159.906452 0.533000 5
1/17/1900 161.04602 0.687000 5
1/18/1900 157.656379 0.411000 5
1/19/1900 154.478044 0.166000 5
1/20/1900 154.576546 0.210000 5
1/21/1900 154.147639 0.050000 5

ntile doesn't work either.

Question 2

I assume, by available in PostgreSQL 9.3.5 you really meant PostgreSQL 9.5 (as the manual link indicates)? Either way, hypothetical-set aggregate functions where introduced with 9.4.

Question 3

9.3 is what I have. I provided the link to 9.5 bc I thought it would be useful to people who have 9.5, so they they don't have do what I am doing. Didn't know it was available in 9.4.

Question 4

Not yet. I am working on some infrastructure around this, so I can test this. If you look at my most recent question you will see how it relates. I will credit the answer as soon as I can test it. I look at this every day.

Question 5

In Postgres 9.4+, you can use the hypothetical-set aggregate function percent_rank() in a LATERAL join like this:

SELECT *
FROM daily_data d, LATERAL (
 SELECT round(percent_rank(d.daily_val) WITHIN GROUP (ORDER BY daily_val)::numeric
 , 6) AS pctl_calc
 FROM daily_data
 WHERE company_id = d.company_id
 AND trade_date < d.trade_date
 ) x
ORDER BY company_id, trade_date;

The CROSS JOIN LATERAL (, LATERAL for short) is safe, because a subquery with an aggregate function always returns a row. (Else you might want LEFT JOIN LATERAL (...) x ON true.)

Minor difference: The first row in each group returns 0, not 1 like in your example. Easily fixed, if you need it.

This would do the same job in Postgres 9.3:

SELECT d.*
 , round(CASE WHEN x.ct = 0 THEN numeric '1' -- or 0 ?
 ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM daily_data d, LATERAL (
 SELECT count(daily_val) AS ct
 , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
 FROM daily_data
 WHERE company_id = d.company_id
 AND trade_date < d.trade_date
 ) x
ORDER BY company_id, trade_date;

I use count(daily_val) for the full count, not count(*) to ignore NULL values - necessary if daily_val can be NULL, .

SQL Fiddle.

Or, without LATERAL join for even older versions. May be faster, too:

SELECT company_id, trade_date, daily_val, pctl
 , round(CASE WHEN ct = 0 THEN numeric '1' -- 0 ?
 ELSE ct_lt / ct END, 6) AS pctl_calc
FROM (
 SELECT d.*
 , count(x.daily_val) AS ct
 , count(x.daily_val < d.daily_val OR NULL)::numeric AS ct_lt
 FROM daily_data d
 LEFT JOIN daily_data x ON x.company_id = d.company_id
 AND x.trade_date < d.trade_date
 GROUP BY d.id -- pg 9.1+
 ) sub
ORDER BY company_id, trade_date;

Assuming id is the PK for a simplified GROUP BY, requires pg 9.1 though.

Question 6

@mountainclimber: The condition on company_id = company_idwas missing. Also, check if your values can be NULL ...

Question 7

I am sorry, my update was in a hurry and wrong. The answer was correct to begin with. Reverted changes and made code null-safe instead of just warning against possible null values now. See updated fiddle (if the site isn't overloaded). The only error I see are rounding errors on your side.

Question 8

@mountainclimber: You might add a TABLESPACE on another disk and move some objects there. See: dba.stackexchange.com/q/62969/3684

Question 9

for the 9.4+ version, to limit lookback (make it rolling) put something like and trade_date >= (d.trade_date - interval '20 years') in the where clause

Question 10

Erwin - Do you think this query could be sped up via set max_parallel_workers_per_gather = 2 assuming I have enough resources to accommodate it?

score 2 · Accepted Answer · 2017-02-11 08:53:03Z

In Postgres 9.4+, you can use the hypothetical-set aggregate function percent_rank() in a LATERAL join like this:

SELECT *
FROM daily_data d, LATERAL (
 SELECT round(percent_rank(d.daily_val) WITHIN GROUP (ORDER BY daily_val)::numeric
 , 6) AS pctl_calc
 FROM daily_data
 WHERE company_id = d.company_id
 AND trade_date < d.trade_date
 ) x
ORDER BY company_id, trade_date;

The CROSS JOIN LATERAL (, LATERAL for short) is safe, because a subquery with an aggregate function always returns a row. (Else you might want LEFT JOIN LATERAL (...) x ON true.)

Minor difference: The first row in each group returns 0, not 1 like in your example. Easily fixed, if you need it.

This would do the same job in Postgres 9.3:

SELECT d.*
 , round(CASE WHEN x.ct = 0 THEN numeric '1' -- or 0 ?
 ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM daily_data d, LATERAL (
 SELECT count(daily_val) AS ct
 , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
 FROM daily_data
 WHERE company_id = d.company_id
 AND trade_date < d.trade_date
 ) x
ORDER BY company_id, trade_date;

I use count(daily_val) for the full count, not count(*) to ignore NULL values - necessary if daily_val can be NULL, .

SQL Fiddle.

Or, without LATERAL join for even older versions. May be faster, too:

SELECT company_id, trade_date, daily_val, pctl
 , round(CASE WHEN ct = 0 THEN numeric '1' -- 0 ?
 ELSE ct_lt / ct END, 6) AS pctl_calc
FROM (
 SELECT d.*
 , count(x.daily_val) AS ct
 , count(x.daily_val < d.daily_val OR NULL)::numeric AS ct_lt
 FROM daily_data d
 LEFT JOIN daily_data x ON x.company_id = d.company_id
 AND x.trade_date < d.trade_date
 GROUP BY d.id -- pg 9.1+
 ) sub
ORDER BY company_id, trade_date;

Assuming id is the PK for a simplified GROUP BY, requires pg 9.1 though.

@mountainclimber: The condition on company_id = company_idwas missing. Also, check if your values can be NULL ...
I am sorry, my update was in a hurry and wrong. The answer was correct to begin with. Reverted changes and made code null-safe instead of just warning against possible null values now. See updated fiddle (if the site isn't overloaded). The only error I see are rounding errors on your side.
@mountainclimber: You might add a TABLESPACE on another disk and move some objects there. See: dba.stackexchange.com/q/62969/3684
for the 9.4+ version, to limit lookback (make it rolling) put something like and trade_date >= (d.trade_date - interval '20 years') in the where clause
Erwin - Do you think this query could be sped up via set max_parallel_workers_per_gather = 2 assuming I have enough resources to accommodate it?

Stack Exchange Network

Percentile rank that takes sorted argument (or same functionality) in PostgreSQL 9.3.5

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Percentile rank that takes sorted argument (or same functionality) in PostgreSQL 9.3.5

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions