I'm trying to create two random variables which are correlated with one another, and I believe the best way is to draw from a bivariate normal distribution with given parameters (open to other ideas). The uncorrelated version looks like this:
import numpy as np
sigma = np.random.uniform(.2, .3, 80)
theta = np.random.uniform( 0, .5, 80)
However, for each one of the 80 draws, I want the sigma value to be related to the theta value. Any thoughts?
-
what do you want the covariance matrix (rho) to be?Gregg Lind– Gregg Lind2011年12月29日 23:50:52 +00:00Commented Dec 29, 2011 at 23:50
-
2Correct me if I am wrong, but shouldn't you be using normal instead of uniform for normal distribution?Pavan Yalamanchili– Pavan Yalamanchili2011年12月30日 00:00:59 +00:00Commented Dec 30, 2011 at 0:00
3 Answers 3
Use the built-in: http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.multivariate_normal.html
>>> import numpy as np
>>> mymeans = [13,5]
>>> # stdevs = sqrt(5),sqrt(2)
>>> # corr = .3 / (sqrt(5)*sqrt(2) = .134
>>> mycov = [[5,.3], [.3,2]]
>>> np.cov(np.random.multivariate_normal(mymeans,mycov,500000).T)
array([[ 4.99449936, 0.30506976],
[ 0.30506976, 2.00213264]])
>>> np.corrcoef(np.random.multivariate_normal(mymeans,mycov,500000).T)
array([[ 1. , 0.09629313],
[ 0.09629313, 1. ]])
- As shown, things get a little hairier if you have to adjust for not-unit variances)
- more reference: http://www.riskglossary.com/link/correlation.htm
- To be real-world meaningful, the covariance matrix must be symmetric and must also be positive definite or positive semidefinite (it must be invertable). Particular anti-correlation structures might not be possible.
3 Comments
import multivariate_normal from scipy can be used. Suppose we create random variables x and y:
from scipy.stats import multivariate_normal
rv_mean = [0, 1] # mean of x and y
rv_cov = [[1.0,0.5], [0.5,2.0]] # covariance matrix of x and y
rv = multivariate_normal.rvs(rv_mean, rv_cov, size=10000)
You have x from rv[:,0] and y from rv[:,1]. Correlation coefficients can be obtained from
import numpy as np
np.corrcoef(rv.T)
Comments
The two normal distributions are defined by a mean and a variance:
means = [0, 0] # respective means
var_xx = 1 ** 2 # var x = std x squared
var_yy = 1 ** 2
The covariance between the two distributions is defined by a covariance matrix made of the variances and the two covariances. The two covariances x/y and y/x are equal:
import numpy as np
cov_xy = 0.5
cov = np.array([[var_xx, cov_xy],
[cov_xy, var_yy]])
N pairs are drawn from the distributions using a random generator and the function multivariate_normal. Optional check_valid='raise' is used to check the covariance matrix is actually symmetric and positive semi-definite:
g = np.random.default_rng()
N = 100
pairs = g.multivariate_normal(means, cov, size=N, check_valid='raise')
As an example, let's plot these pairs:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(pairs[:,0], pairs[:,1])