I created a quick monte-carlo simulation which seems to do what I want (simple version below). The code basically simulates a Poisson distribution, say this results in a simulation of 10
. it would then simulate 10
values from a lognormal distribution. It then applies two parameters to the results of that simulation, lim
and xs
, sums up the results and then stores it in a results matrix. The code then takes the mean of each simulation.
import numpy as np
import os
import scipy.stats as sp
sigma = 0.5
u = 10
mu = 100
no_sim = int(10e3)
no_col = 2
mat_res = np.zeros((no_sim,no_col)) #results matrix holder
lim = 40e3 # parameter 1 - needs to vary
xs = 2000 # parameter 2 - needs to vary
for i in range(1,no_sim):
no_clm = sp.poisson.rvs(mu=mu,size=1)
clm_sev = sp.lognorm.rvs(s=sigma,scale=np.exp(u),size=no_clm)
temp_mat = np.zeros((np.size(clm_sev),2)) # temp matrix for calculations
temp_mat[:,0] = clm_sev
temp_mat[:,1] = np.minimum(lim,np.maximum(0,temp_mat[:,0]-xs)) #want to expand this step for various values of lim and xs, e.g. 5 different options
mat_res[i,0] = np.sum(temp_mat[:,0])
mat_res[i,1] = np.sum(temp_mat[:,1])
#want these mean values for various different lim and xs values which are predefined
print("mean 1 is %s" % np.mean(mat_res[:,0]),
"mean 2 is %s" % np.mean(mat_res[:,1]))
I am trying to do two things:
Speed up the code. I will need to run over a million simulations in reality so want to do this in the best way possible - currently it takes a five minutes or so to run a million simulations
Expand the code in an efficient way so that I can vary the parameters
lim
andxs
and get a new result - i.e. right now I havelim = 40e3
andxs = 2000
but I would also want to run it with say lim = 50e3
and xs = 1000
and then return back the mean value in the print (along with the original mean for the original lim
and xs
parameters). One solution is to wrap the for-loop into a function with the parameters I require, however as I only use the lim
and xs
parameters in one line in the monte carlo simulation I don't think it's efficient to run the whole simulation again from scratch, but I can't think of any good way to build it into the for-loop as it stands.
1 Answer 1
sigma
, mu
and u
should be capitalised because they're constant (if they remain as constant globals; however, in my suggested code I show them instead as function parameters that are lower-case).
Move your simulation out of the global namespace into a function.
Consider adding numerical tests. I have only shown a regression test and cannot vouch for its accuracy. For this to work you need to send a constant seed to Numpy's random generator.
Don't overabbreviate simulation
to sim
or column
to clm
. I do not know what sev
stands for.
Don't create a temp_mat
- you don't need it.
Your main, outer for
loop is tricky to vectorise, because - due to no_clm
- your inner arrays are jagged. You can work around this by populating a fully square matrix and then zeroing out partial rows parametrically.
Fundamentally, you cannot represent mat_res
as a matrix with two columns, because the first result variable is non-parametric and the second one is parametric, so they will have different shapes.
Don't use zeros
- everywhere that you've used it, empty
is more appropriate.
To parametrise the way you want, make lim
and xs
three-dimensional arrays where two of the dimensions are size-1; have clm_sev
be a three-dimensional array where the non-1 dimension is in a different position. Then, most operations in Numpy that combine the two will use broadcasting. The first part of a Monte Carlo function could be to assert that these parameter arrays are of the same length, and reshape them to be suitable for broadcasting.
Don't use min
and max
; use clip
.
Your range(1,
doesn't seem correct to me - you were skipping your first row and leaving it zero, executing one fewer case than you specified, and skewing your mean. Just let it start from 0.
Suggested
import numpy as np
import scipy.stats as sp
def monte_carlo(
lim: np.ndarray,
xs: np.ndarray,
sigma: float = 0.5,
u: float = 10,
mu: float = 100,
n_simulations: int = 1000,
) -> tuple[np.float64, np.ndarray]:
# Put the parameters into shapes usable by our broadcasting
lim = lim.reshape((1, 1, -1))
xs = xs.reshape((1, 1, -1))
n_parameters = lim.shape[-1]
# Assert that the parameter lengths are the same
if xs.shape[-1] != n_parameters:
raise ValueError('Parameter shape mismatch')
# Column-vector, for each simulation, of simulation row widths
n_columns = sp.poisson.rvs(mu=mu, size=(n_simulations, 1, 1))
# The maximum width of any simulation
width = n_columns.max()
# Row-vector of simple indices over the width of the simulation matrix
x_index = np.arange(width)[np.newaxis, :, np.newaxis]
# "Sev" (whatever that is) of random values, one row per simulation
sev = sp.lognorm.rvs(s=sigma, scale=np.exp(u), size=(n_simulations, width, 1))
# Zero out any simulation value beyond the Poisson-determined width
sev[x_index >= n_columns] = 0
# Parametric clipping will yield a three-dimensional matrix: n_simulations * width * n_parameters
clipped = np.clip(a=sev - xs, a_min=0, a_max=lim)
result_1 = sev.sum(axis=1).mean()
result_2 = clipped.sum(axis=1).mean(axis=0)
return result_1, result_2
def test() -> None:
np.random.seed(0) # to reproduce the same results
lim = np.linspace(start=38e3, stop=42e3, num=5)
xs = np.linspace(start=1800, stop=2200, num=5)
mean_1, mean_2 = monte_carlo(lim, xs)
tol = {'rtol': 0, 'atol': 1e-8}
assert np.isclose(mean_1, 2505180.2094949023, **tol)
assert np.all(
np.isclose(
mean_2,
(
2177820.61316294,
2180250.94121704,
2181574.66242172,
2181868.23243192,
2181257.68169412,
),
**tol,
)
)
if __name__ == '__main__':
test()
-
\$\begingroup\$ thank you very much for this. I did not know about broadcasting so that was useful. What is parameter fixing? Your link in the post goes to a poisson distribution. Speed wise, I assume there is no other way to speed it up for high level of sims (1m+)? Thank you again \$\endgroup\$lmkir99– lmkir992022年01月01日 17:21:22 +00:00Commented Jan 1, 2022 at 17:21
-
\$\begingroup\$ From that page, Alternatively, the distribution object can be called (as a function) to fix the shape and location. This returns a "frozen" RV object holding the given parameters fixed. Freeze the distribution and display the frozen
pmf
: [...] \$\endgroup\$Reinderien– Reinderien2022年01月01日 17:23:02 +00:00Commented Jan 1, 2022 at 17:23 -
\$\begingroup\$ Sorry missed that. What's the logic behind freezing it - it seems to return the same result, is it speed related? \$\endgroup\$lmkir99– lmkir992022年01月01日 17:26:18 +00:00Commented Jan 1, 2022 at 17:26
-
\$\begingroup\$ I don't know whether the speedup is going to be significant, but it's the most logical thing to do to define your parameters once instead of passing them every time \$\endgroup\$Reinderien– Reinderien2022年01月01日 17:28:40 +00:00Commented Jan 1, 2022 at 17:28
-
1\$\begingroup\$ this is brilliant thanks. Do you have any resource on learning more about broadcasting and multidimensional arrays? Learned a lot just working through your code, but most of it was new to me! thanks again. \$\endgroup\$lmkir99– lmkir992022年01月01日 20:20:00 +00:00Commented Jan 1, 2022 at 20:20
Explore related questions
See similar questions with these tags.
lim
andxs
" are in. An ndarray? \$\endgroup\$