0

Is there Hypothetical-Set Aggregate Function equivalent to ntile (or some other good solution) in Postgres?

I have this query:

select
 frctl
 ,*
from
 (select 
 *
 from
 d_al
 where
 not rtn is null
 and not fund_val is null
 ) dx
 ,lateral(
 select
 round(percent_rank(dx.fund_val) WITHIN GROUP (ORDER BY fund_val)::numeric
 , 6) AS frctl
 from 
 d_al
 where
 gen_qtr_end_dt <= dx.gen_qtr_end_dt
 and not rtn is null
 and not fund_val is null
 ) x
order by gen_qtr_end_dt_id, frctl

The query produces periodic historical percentile ranks. Ranking a value in a certain period/date relative to the current period/date plus all the historical periods/dates (periods before it) time series-wise/chronologically.

It works perfectly, except I want fractiles (i.e. the option to create deciles, quartiles, etc.) like ntile(#) does naturally. Do I have to build a case statement to fit the fractiles I want? For example, if I want ntile(4) (quartiles), do I have to build a case statement based off of 0, 0.25, 0.5,0.75,1. Then if I want ntile(10) (deciles), do I have to build a case statement based off of 0, 0.1, 0.2,0.3,0.4 ... etc? Or is there an ntile type Hypothetical-Set Aggregate Function I am missing?

Helpful links: https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE

Percentile rank that takes sorted argument (or same functionality) in PostgreSQL 9.3.5 (In the link directly above the problem is a bit different, but very related.)

The data:

  • Big - efficiency is important, but not the focus of my question.
  • Table d_al has three columns, two matter here:
  • gen_qtr_end_dt - not unique, not null, type date
  • fund_val - can be null, type numeric
  • rtn - can be null, type numeric, not important here

I have Postgres 9.6.

PS - this query does all of the history, but my next step is to do a number of days rolling period look back (rather than all of the history).

edit 1: Here is how I am solving it now (with a case statement as mentioned):

I put the above query in an cte then...

 with pl as (
 select
 x.pctl
 ,dx.fund_val
 , dx.rtn
 ,dx.gen_qtr_end_dt
 from
 (select 
 *
 from
 d_al
 where
 not rtn is null
 and not fund_val is null
 ) dx
 ,lateral(
 select
 round(percent_rank(dx.fund_val) WITHIN GROUP (ORDER BY fund_val)::numeric
 , 6) AS pctl
 from 
 d_al
 where
 gen_qtr_end_dt <= dx.gen_qtr_end_dt
 and not rtn is null
 and not fund_val is null
 ) x
)
-- , f as( 
 select 
 gen_qtr_end_dt_id
 ,case when pl.pctl < 0.1 then 1
 when pl.pctl < 0.2 then 2
 when pl.pctl < 0.3 then 3
 when pl.pctl < 0.4 then 4
 when pl.pctl < 0.5 then 5
 when pl.pctl < 0.6 then 6
 when pl.pctl < 0.7 then 7
 when pl.pctl < 0.8 then 8
 when pl.pctl < 0.9 then 9
 else 10
 end
 frctl 
 ,rtn
 ,fund_val
 ,*
 from 
 pl
 order by
 gen_qtr_end_dt, frctl

...which is a bit cumbersome/rigid but doable if need be.

edit 2: And here is a sample of the output from edit 1 above:

frctl fund_val pctl gen_qtr_end_dt
1 -14.514 0 3/31/2001
2 -8.618 0.142857 3/31/2001
3 1.707 0.285714 3/31/2001
5 26.162 0.428571 3/31/2001
6 141.873 0.571429 3/31/2001
8 216 0.714286 3/31/2001
9 254 0.857143 3/31/2001
1 -15.237 0.071429 6/30/2001
1 -32 0 6/30/2001
3 -6.949 0.285714 6/30/2001
5 6.307 0.428571 6/30/2001
6 28.542 0.571429 6/30/2001
7 140.816 0.642857 6/30/2001
9 239 0.857143 6/30/2001
1 -47 0.043478 9/30/2001
1 -63.367 0 9/30/2001
2 -16.599 0.130435 9/30/2001
4 -6.087 0.347826 9/30/2001
6 31.425 0.565217 9/30/2001
7 47.137 0.608696 9/30/2001
8 150.678 0.73913 9/30/2001
8 200 0.782609 9/30/2001
10 1902.684 0.956522 9/30/2001
1 -246.545 0 12/31/2001
2 -18.731 0.125 12/31/2001
4 -0.043 0.375 12/31/2001
4 -6 0.34375 12/31/2001
5 9.285 0.46875 12/31/2001
6 43.519 0.59375 12/31/2001
7 111 0.65625 12/31/2001
8 154.573 0.78125 12/31/2001
10 1017.514 0.9375 12/31/2001
1 -23.678 0.095238 3/31/2002
4 2.229 0.357143 3/31/2002
5 14 0.428571 3/31/2002
5 17.689 0.452381 3/31/2002
6 67.245 0.595238 3/31/2002
7 130.604 0.642857 3/31/2002
8 156 0.761905 3/31/2002
8 179.399 0.785714 3/31/2002
9 213.756 0.833333 3/31/2002
10 855.2 0.928571 3/31/2002
1 -26.536 0.076923 6/30/2002
3 1.295 0.288462 6/30/2002
4 9 0.365385 6/30/2002
5 16.714 0.423077 6/30/2002
6 64.547 0.557692 6/30/2002
6 103.539 0.596154 6/30/2002
8 181.284 0.769231 6/30/2002
9 203 0.807692 6/30/2002
10 600.194 0.923077 6/30/2002
10 284.306 0.903846 6/30/2002
1 -85 0.016129 9/30/2002
1 -25.475 0.096774 9/30/2002
2 -20.394 0.129032 9/30/2002
4 2.551 0.33871 9/30/2002
6 102.395 0.564516 9/30/2002
7 113.453 0.612903 9/30/2002
8 168.205 0.725806 9/30/2002
9 248 0.854839 9/30/2002
10 800.551 0.935484 9/30/2002
10 460.067 0.903226 9/30/2002

edit 3: As it stands, the way I am doing it here is so slow it is unusable. The slow part is the query with percent_rank() in it.

asked Jun 29, 2017 at 18:05
6
  • 1
    So what's wrong with the ntile() function? Commented Jun 29, 2017 at 20:10
  • I don't see how it could be implemented in this context, but I hope I am wrong. I clearly don't know a ton about this, but I don't know how to use ntile() using two different sets of data (dx.fund_val and fund_val) in the same function like percent_rank() can. And I assume that is necessary... Commented Jun 29, 2017 at 20:27
  • 1
    @mountainclimber your application requirements are not obvious from the question. It would really help if you provided a short sample input and sample output for it Commented Jun 29, 2017 at 21:52
  • @filiprem doesn't edit 2 have all of that? frctl is my desired output and fund_val and gen_qtr_end_dt are my applicable inputs. Is there something else you need? Commented Jun 29, 2017 at 22:01
  • @mountainclimber what we want is basic schema, basic data, and a desired output. We prefer it to be separate from the complexities of your real world example. But, at the very least, we need sample data. Commented Jun 29, 2017 at 22:04

1 Answer 1

1

Try trunc(10 * pl.pctl) + 1, but as percent_rank returns 0 <= n <= 1 the maximum value will be 11 instead of 10.

CUME_DIST is quite similar to PERCENT_RANK but returns 0 < n <= 1, thus you might switch to 1-cume_dist... (ORDER BY fund_val DESC) for the pctl calculation.

answered Jun 30, 2017 at 9:30
1
  • I think in my case since I am always comparing the value to itself, as well as all its current period and previous periods, (gen_qtr_end_dt <= dx.gen_qtr_end_dt, note the =) I won't get a 1 out of percent_rank(); therefore, the 11 issue isn't really an issue for me. At least that is what I am seeing in my results (no 1s). Commented Jun 30, 2017 at 14:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.