2
\$\begingroup\$

I have a pandas dataframe (called base_mortality) with 1 column and n rows, which is of the following form:

 age | death_prob 
---------------------------
 60 | 0.005925
 61 | 0.006656
 62 | 0.007474
 63 | 0.008387
 64 | 0.009405
 65 | 0.010539
 66 | 0.0118
 67 | 0.013201
 68 | 0.014756
 69 | 0.016477

age is the index and death_prob is the probability that a person who is a given age will die in the next year. I want to use these death probabilities to project the expected annuity payment that would be paid to an annuitant over the next t years.

Suppose I have 3 annuitants, whose names and ages are contained in a dictionary:

policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71}

Then I would want to construct a new dataframe whose index is time (rather than age) which has 3 columns (one for each annuitant) and t rows (one for each time step). Each column should specify the probability of death for each policy holder at that time step. For example:

 John Mike Alan
0 0.010539 0.013201 0.020486
1 0.011800 0.014756 0.022807
2 0.013201 0.016477 0.025365
3 0.014756 0.018382 0.028179
4 0.016477 0.020486 0.031269
.. ... ... ...
96 1.000000 1.000000 1.000000
97 1.000000 1.000000 1.000000
98 1.000000 1.000000 1.000000
99 1.000000 1.000000 1.000000
100 1.000000 1.000000 1.000000

At present, my code for doing this is as follows:

import pandas as pd
base_mortality = pd.read_csv('/Users/joshchapman/PycharmProjects/VectorisedAnnuityModel/venv/assumptions/base_mortality.csv', index_col=['x'])
policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71}
out = pd.DataFrame(index=range(0,101))
for name, age in policy_holders.items():
 out[name] = base_mortality.loc[age:].reset_index()['age']
out = out.fillna(1)
print(out)

However, my aim is to remove this loop and achieve this using vector operations (i.e. pandas and/or numpy functions). Any suggestions on how I might improve my code to work in this way would be great!

asked Jul 9, 2020 at 12:10
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Enter pandas.cut. It returns the bin in which each event lies. You can even pass the labels directly. This way you can reduce it to a Python loop over the people:

import pandas as pd
import numpy as np
age_bins = range(59, 70) # one more than the probabilities
death_prob = [0.005925, 0.006656, 0.007474, 0.008387, 0.009405, 0.010539, 0.0118,
 0.013201, 0.014756, 0.016477]
policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71}
values = {name: pd.cut(range(age, age + 101), age_bins, labels=death_prob)
 for name, age in policy_holders.items()}
out = pd.DataFrame(values, dtype=np.float64).fillna(1)
print(out)
# John Mike Alan
# 0 0.010539 0.013201 1.0
# 1 0.011800 0.014756 1.0
# 2 0.013201 0.016477 1.0
# 3 0.014756 1.000000 1.0
# 4 0.016477 1.000000 1.0
# .. ... ... ...
# 96 1.000000 1.000000 1.0
# 97 1.000000 1.000000 1.0
# 98 1.000000 1.000000 1.0
# 99 1.000000 1.000000 1.0
# 100 1.000000 1.000000 1.0
# 
# [101 rows x 3 columns]

Note that the hin edges need to be one larger than the labels, because technically, this is interpreted as (59, 60], (60, 61], ..., i.e. including the right edge.

answered Jul 9, 2020 at 12:51
\$\endgroup\$
3
  • \$\begingroup\$ Thanks for your help on this one! Quick question though: what if the probabilities are not unique? I've tried replacing the last probability with the second to last and this gives the error Categorical categories must be unique from pd.cut. \$\endgroup\$ Commented Jul 13, 2020 at 10:18
  • \$\begingroup\$ @JRChapman In that case you will have to pass labels=False (or None, not quite sure atm) and use the resulting indices to index into pd.Series(death_prob). See also the first revision of my answer for that. \$\endgroup\$ Commented Jul 13, 2020 at 10:27
  • \$\begingroup\$ @JRChapman: It is False, and here is the direct link to that revision: codereview.stackexchange.com/revisions/245225/1 \$\endgroup\$ Commented Jul 13, 2020 at 13:30

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.