Sampling from bivariate normal in python

Question 1

I'm trying to create two random variables which are correlated with one another, and I believe the best way is to draw from a bivariate normal distribution with given parameters (open to other ideas). The uncorrelated version looks like this:

import numpy as np
sigma = np.random.uniform(.2, .3, 80)
theta = np.random.uniform( 0, .5, 80)

However, for each one of the 80 draws, I want the sigma value to be related to the theta value. Any thoughts?

Question 2

what do you want the covariance matrix (rho) to be?

Question 3

Correct me if I am wrong, but shouldn't you be using normal instead of uniform for normal distribution?

Question 4

Use the built-in: http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.multivariate_normal.html

>>> import numpy as np
>>> mymeans = [13,5] 
>>> # stdevs = sqrt(5),sqrt(2)
>>> # corr = .3 / (sqrt(5)*sqrt(2) = .134
>>> mycov = [[5,.3], [.3,2]] 
>>> np.cov(np.random.multivariate_normal(mymeans,mycov,500000).T)
array([[ 4.99449936, 0.30506976],
 [ 0.30506976, 2.00213264]])
>>> np.corrcoef(np.random.multivariate_normal(mymeans,mycov,500000).T)
array([[ 1. , 0.09629313],
 [ 0.09629313, 1. ]])

As shown, things get a little hairier if you have to adjust for not-unit variances)
more reference: http://www.riskglossary.com/link/correlation.htm
To be real-world meaningful, the covariance matrix must be symmetric and must also be positive definite or positive semidefinite (it must be invertable). Particular anti-correlation structures might not be possible.

Question 5

Perfect -- given my mediocre statistical background, could you explain what the values in mycov are related to? I assume that the "5" and the "1" are the variances which correspond to each vector of interest? Thanks again,

Question 6

yes indeed! 5 and 1 are the variances, and .3 is the covariances. If you just want correlations, you have to jigger it a bit more, as described.

Question 7

Great answer! This is much easier than premultiplying a (Gaussian random) vector with a matrix to induce some covariances.

Question 8

import multivariate_normal from scipy can be used. Suppose we create random variables x and y:

from scipy.stats import multivariate_normal
rv_mean = [0, 1] # mean of x and y 
rv_cov = [[1.0,0.5], [0.5,2.0]] # covariance matrix of x and y
rv = multivariate_normal.rvs(rv_mean, rv_cov, size=10000)

You have x from rv[:,0] and y from rv[:,1]. Correlation coefficients can be obtained from

import numpy as np
np.corrcoef(rv.T)

Question 9

The two normal distributions are defined by a mean and a variance:

means = [0, 0] # respective means
var_xx = 1 ** 2 # var x = std x squared
var_yy = 1 ** 2

The covariance between the two distributions is defined by a covariance matrix made of the variances and the two covariances. The two covariances x/y and y/x are equal:

import numpy as np
cov_xy = 0.5
cov = np.array([[var_xx, cov_xy],
 [cov_xy, var_yy]])

N pairs are drawn from the distributions using a random generator and the function multivariate_normal. Optional check_valid='raise' is used to check the covariance matrix is actually symmetric and positive semi-definite:

g = np.random.default_rng()
N = 100
pairs = g.multivariate_normal(means, cov, size=N, check_valid='raise')

As an example, let's plot these pairs:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(pairs[:,0], pairs[:,1])

enter image description here

Gregg Lind 21.4k15 gold badges70 silver badges81 bronze badges · Accepted Answer · 2011-12-30 00:23:35Z

Use the built-in: http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.multivariate_normal.html

>>> import numpy as np
>>> mymeans = [13,5] 
>>> # stdevs = sqrt(5),sqrt(2)
>>> # corr = .3 / (sqrt(5)*sqrt(2) = .134
>>> mycov = [[5,.3], [.3,2]] 
>>> np.cov(np.random.multivariate_normal(mymeans,mycov,500000).T)
array([[ 4.99449936, 0.30506976],
 [ 0.30506976, 2.00213264]])
>>> np.corrcoef(np.random.multivariate_normal(mymeans,mycov,500000).T)
array([[ 1. , 0.09629313],
 [ 0.09629313, 1. ]])

As shown, things get a little hairier if you have to adjust for not-unit variances)
more reference: http://www.riskglossary.com/link/correlation.htm
To be real-world meaningful, the covariance matrix must be symmetric and must also be positive definite or positive semidefinite (it must be invertable). Particular anti-correlation structures might not be possible.

Perfect -- given my mediocre statistical background, could you explain what the values in mycov are related to? I assume that the "5" and the "1" are the variances which correspond to each vector of interest? Thanks again,
yes indeed! 5 and 1 are the variances, and .3 is the covariances. If you just want correlations, you have to jigger it a bit more, as described.
Great answer! This is much easier than premultiplying a (Gaussian random) vector with a matrix to induce some covariances.

CollectivesTM on Stack Overflow

Sampling from bivariate normal in python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related