Monte-carlo simulation code - simple exercise

Question 1

I created a quick monte-carlo simulation which seems to do what I want (simple version below). The code basically simulates a Poisson distribution, say this results in a simulation of 10. it would then simulate 10 values from a lognormal distribution. It then applies two parameters to the results of that simulation, lim and xs, sums up the results and then stores it in a results matrix. The code then takes the mean of each simulation.

import numpy as np
import os
import scipy.stats as sp
sigma = 0.5
u = 10
mu = 100
no_sim = int(10e3)
no_col = 2
mat_res = np.zeros((no_sim,no_col)) #results matrix holder
lim = 40e3 # parameter 1 - needs to vary
xs = 2000 # parameter 2 - needs to vary
for i in range(1,no_sim):
 no_clm = sp.poisson.rvs(mu=mu,size=1)
 clm_sev = sp.lognorm.rvs(s=sigma,scale=np.exp(u),size=no_clm)
 
 temp_mat = np.zeros((np.size(clm_sev),2)) # temp matrix for calculations
 
 temp_mat[:,0] = clm_sev
 temp_mat[:,1] = np.minimum(lim,np.maximum(0,temp_mat[:,0]-xs)) #want to expand this step for various values of lim and xs, e.g. 5 different options
 
 
 mat_res[i,0] = np.sum(temp_mat[:,0])
 mat_res[i,1] = np.sum(temp_mat[:,1])
 
 
#want these mean values for various different lim and xs values which are predefined
print("mean 1 is %s" % np.mean(mat_res[:,0]),
 "mean 2 is %s" % np.mean(mat_res[:,1]))

I am trying to do two things:

Speed up the code. I will need to run over a million simulations in reality so want to do this in the best way possible - currently it takes a five minutes or so to run a million simulations
Expand the code in an efficient way so that I can vary the parameters lim and xs and get a new result - i.e. right now I have lim = 40e3 and xs = 2000

but I would also want to run it with say lim = 50e3 and xs = 1000 and then return back the mean value in the print (along with the original mean for the original lim and xs parameters). One solution is to wrap the for-loop into a function with the parameters I require, however as I only use the lim and xs parameters in one line in the monte carlo simulation I don't think it's efficient to run the whole simulation again from scratch, but I can't think of any good way to build it into the for-loop as it stands.

Question 2

Please show the format that your "various different lim and xs" are in. An ndarray?

Question 3

@Reinderien I don't have them at the moment, I was going to set them up in a dataframe or an array; but I was trying to plan how to build an inner loop to go through each element systematically and couldn't think of a solution.

Question 4

sigma, mu and u should be capitalised because they're constant (if they remain as constant globals; however, in my suggested code I show them instead as function parameters that are lower-case).

Move your simulation out of the global namespace into a function.

Consider adding numerical tests. I have only shown a regression test and cannot vouch for its accuracy. For this to work you need to send a constant seed to Numpy's random generator.

Don't overabbreviate simulation to sim or column to clm. I do not know what sev stands for.

Don't create a temp_mat - you don't need it.

Your main, outer for loop is tricky to vectorise, because - due to no_clm - your inner arrays are jagged. You can work around this by populating a fully square matrix and then zeroing out partial rows parametrically.

Fundamentally, you cannot represent mat_res as a matrix with two columns, because the first result variable is non-parametric and the second one is parametric, so they will have different shapes.

Don't use zeros - everywhere that you've used it, empty is more appropriate.

To parametrise the way you want, make lim and xs three-dimensional arrays where two of the dimensions are size-1; have clm_sev be a three-dimensional array where the non-1 dimension is in a different position. Then, most operations in Numpy that combine the two will use broadcasting. The first part of a Monte Carlo function could be to assert that these parameter arrays are of the same length, and reshape them to be suitable for broadcasting.

Don't use min and max; use clip.

Your range(1, doesn't seem correct to me - you were skipping your first row and leaving it zero, executing one fewer case than you specified, and skewing your mean. Just let it start from 0.

Suggested

import numpy as np
import scipy.stats as sp
def monte_carlo(
 lim: np.ndarray,
 xs: np.ndarray,
 sigma: float = 0.5,
 u: float = 10,
 mu: float = 100,
 n_simulations: int = 1000,
) -> tuple[np.float64, np.ndarray]:
 # Put the parameters into shapes usable by our broadcasting
 lim = lim.reshape((1, 1, -1))
 xs = xs.reshape((1, 1, -1))
 n_parameters = lim.shape[-1]
 # Assert that the parameter lengths are the same
 if xs.shape[-1] != n_parameters:
 raise ValueError('Parameter shape mismatch')
 # Column-vector, for each simulation, of simulation row widths
 n_columns = sp.poisson.rvs(mu=mu, size=(n_simulations, 1, 1))
 # The maximum width of any simulation
 width = n_columns.max()
 # Row-vector of simple indices over the width of the simulation matrix
 x_index = np.arange(width)[np.newaxis, :, np.newaxis]
 # "Sev" (whatever that is) of random values, one row per simulation
 sev = sp.lognorm.rvs(s=sigma, scale=np.exp(u), size=(n_simulations, width, 1))
 # Zero out any simulation value beyond the Poisson-determined width
 sev[x_index >= n_columns] = 0
 # Parametric clipping will yield a three-dimensional matrix: n_simulations * width * n_parameters
 clipped = np.clip(a=sev - xs, a_min=0, a_max=lim)
 result_1 = sev.sum(axis=1).mean()
 result_2 = clipped.sum(axis=1).mean(axis=0)
 return result_1, result_2
def test() -> None:
 np.random.seed(0) # to reproduce the same results
 lim = np.linspace(start=38e3, stop=42e3, num=5)
 xs = np.linspace(start=1800, stop=2200, num=5)
 mean_1, mean_2 = monte_carlo(lim, xs)
 tol = {'rtol': 0, 'atol': 1e-8}
 assert np.isclose(mean_1, 2505180.2094949023, **tol)
 assert np.all(
 np.isclose(
 mean_2,
 (
 2177820.61316294,
 2180250.94121704,
 2181574.66242172,
 2181868.23243192,
 2181257.68169412,
 ),
 **tol,
 )
 )
if __name__ == '__main__':
 test()

Question 5

thank you very much for this. I did not know about broadcasting so that was useful. What is parameter fixing? Your link in the post goes to a poisson distribution. Speed wise, I assume there is no other way to speed it up for high level of sims (1m+)? Thank you again

Question 6

From that page, Alternatively, the distribution object can be called (as a function) to fix the shape and location. This returns a "frozen" RV object holding the given parameters fixed. Freeze the distribution and display the frozen pmf: [...]

Question 7

Sorry missed that. What's the logic behind freezing it - it seems to return the same result, is it speed related?

Question 8

I don't know whether the speedup is going to be significant, but it's the most logical thing to do to define your parameters once instead of passing them every time

Question 9

this is brilliant thanks. Do you have any resource on learning more about broadcasting and multidimensional arrays? Learned a lot just working through your code, but most of it was new to me! thanks again.

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2021-12-29 22:30:39Z

sigma, mu and u should be capitalised because they're constant (if they remain as constant globals; however, in my suggested code I show them instead as function parameters that are lower-case).

Move your simulation out of the global namespace into a function.

Consider adding numerical tests. I have only shown a regression test and cannot vouch for its accuracy. For this to work you need to send a constant seed to Numpy's random generator.

Don't overabbreviate simulation to sim or column to clm. I do not know what sev stands for.

Don't create a temp_mat - you don't need it.

Your main, outer for loop is tricky to vectorise, because - due to no_clm - your inner arrays are jagged. You can work around this by populating a fully square matrix and then zeroing out partial rows parametrically.

Fundamentally, you cannot represent mat_res as a matrix with two columns, because the first result variable is non-parametric and the second one is parametric, so they will have different shapes.

Don't use zeros - everywhere that you've used it, empty is more appropriate.

To parametrise the way you want, make lim and xs three-dimensional arrays where two of the dimensions are size-1; have clm_sev be a three-dimensional array where the non-1 dimension is in a different position. Then, most operations in Numpy that combine the two will use broadcasting. The first part of a Monte Carlo function could be to assert that these parameter arrays are of the same length, and reshape them to be suitable for broadcasting.

Don't use min and max; use clip.

Your range(1, doesn't seem correct to me - you were skipping your first row and leaving it zero, executing one fewer case than you specified, and skewing your mean. Just let it start from 0.

Suggested

import numpy as np
import scipy.stats as sp
def monte_carlo(
 lim: np.ndarray,
 xs: np.ndarray,
 sigma: float = 0.5,
 u: float = 10,
 mu: float = 100,
 n_simulations: int = 1000,
) -> tuple[np.float64, np.ndarray]:
 # Put the parameters into shapes usable by our broadcasting
 lim = lim.reshape((1, 1, -1))
 xs = xs.reshape((1, 1, -1))
 n_parameters = lim.shape[-1]
 # Assert that the parameter lengths are the same
 if xs.shape[-1] != n_parameters:
 raise ValueError('Parameter shape mismatch')
 # Column-vector, for each simulation, of simulation row widths
 n_columns = sp.poisson.rvs(mu=mu, size=(n_simulations, 1, 1))
 # The maximum width of any simulation
 width = n_columns.max()
 # Row-vector of simple indices over the width of the simulation matrix
 x_index = np.arange(width)[np.newaxis, :, np.newaxis]
 # "Sev" (whatever that is) of random values, one row per simulation
 sev = sp.lognorm.rvs(s=sigma, scale=np.exp(u), size=(n_simulations, width, 1))
 # Zero out any simulation value beyond the Poisson-determined width
 sev[x_index >= n_columns] = 0
 # Parametric clipping will yield a three-dimensional matrix: n_simulations * width * n_parameters
 clipped = np.clip(a=sev - xs, a_min=0, a_max=lim)
 result_1 = sev.sum(axis=1).mean()
 result_2 = clipped.sum(axis=1).mean(axis=0)
 return result_1, result_2
def test() -> None:
 np.random.seed(0) # to reproduce the same results
 lim = np.linspace(start=38e3, stop=42e3, num=5)
 xs = np.linspace(start=1800, stop=2200, num=5)
 mean_1, mean_2 = monte_carlo(lim, xs)
 tol = {'rtol': 0, 'atol': 1e-8}
 assert np.isclose(mean_1, 2505180.2094949023, **tol)
 assert np.all(
 np.isclose(
 mean_2,
 (
 2177820.61316294,
 2180250.94121704,
 2181574.66242172,
 2181868.23243192,
 2181257.68169412,
 ),
 **tol,
 )
 )
if __name__ == '__main__':
 test()

thank you very much for this. I did not know about broadcasting so that was useful. What is parameter fixing? Your link in the post goes to a poisson distribution. Speed wise, I assume there is no other way to speed it up for high level of sims (1m+)? Thank you again
From that page, Alternatively, the distribution object can be called (as a function) to fix the shape and location. This returns a "frozen" RV object holding the given parameters fixed. Freeze the distribution and display the frozen pmf: [...]
Sorry missed that. What's the logic behind freezing it - it seems to return the same result, is it speed related?
I don't know whether the speedup is going to be significant, but it's the most logical thing to do to define your parameters once instead of passing them every time
this is brilliant thanks. Do you have any resource on learning more about broadcasting and multidimensional arrays? Learned a lot just working through your code, but most of it was new to me! thanks again.

Stack Exchange Network

Monte-carlo simulation code - simple exercise

1 Answer 1

Suggested

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Monte-carlo simulation code - simple exercise

1 Answer 1

Suggested

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions