I'd like to generate correlated arrays of x and y coordinates, in order to test various matplotlib plotting approaches, but I'm failing somewhere, because I can't get numpy.random.multivariate_normal to give me the samples I want. Ideally, I want my x values between -0.51, and 51.2, and my y values between 0.33 and 51.6 (though I suppose equal ranges would be OK, since I can constrain the plot afterwards), but I'm not sure what mean (0, 0?) and covariance values I should be using to get these samples from the function.
2 Answers 2
As the name implies numpy.random.multivariate_normal generates normal distributions, this means that there is a non-null probability of finding points outside of any given interval. You can generate correlated uniform distributions but this a little more convoluted. Take a look here for two possible methods.
If you want to go with the normal distribution you can set up the sigmas so that your half-interval correspond to 3 standard deviations (you can also filter out the bad points if needed). In this way you will have ~99% of your points inside your interval, ex:
import numpy as np
from matplotlib.pyplot import scatter
xx = np.array([-0.51, 51.2])
yy = np.array([0.33, 51.6])
means = [xx.mean(), yy.mean()]
stds = [xx.std() / 3, yy.std() / 3]
corr = 0.8 # correlation
covs = [[stds[0]**2 , stds[0]*stds[1]*corr],
[stds[0]*stds[1]*corr, stds[1]**2]]
m = np.random.multivariate_normal(means, covs, 1000).T
scatter(m[0], m[1])
enter image description here
4 Comments
.T after the function call. multivariate_normal generates an array with a row per observation (1000 rows), and a column per variable (2 columns). .T transposes the array and the shape becomes (2, 1000).means[0]=(xx[0]+xx[1])/2 and stds[0]=(xx[1]-xx[0])/6. I was confused a bit why you need the sample standard deviations of xx and yy. Then I realized that for two data points it simplies to the half-width of the given range (if .std() uses the biased sample variance definition with the 1/N factor instead of 1/(N-1), which it does.)can use Cholesky-decomposition to simulate correlated random variables given a correlation matrix - like here
import numpy as np
import matplotlib.pyplot as plt
# desired correlation matrix
cor_matrix = np.array([[1.0, 0.8],
[0.8, 1.0]])
L = np.linalg.cholesky(cor_matrix)
# build some signals that will result in the desired correlation matrix
X = L.dot(np.random.normal(0,1,(2,1000))) # the more the sample (1000) the better
# estimate their correlation matrix
print(np.corrcoef(X))
print(X.shape)
plt.scatter(X[0], X[1])
plt.show()
Comments
Explore related questions
See similar questions with these tags.