jfenger/correlation

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
correlation.py		correlation.py
setup.py		setup.py

Repository files navigation

correlation

Calculate confidence intervals for correlation coefficients, including Pearson's R, Kendall's tau, Spearman's rho, and customized correlation measures.

Methodology

Two approaches are offered to calculate the confidence intervals, one parametric approach based on normal approximation, and one non-parametric approach based on bootstrapping.

Parametric Approach

Say r_hat is the correlation we obtained, then with a transformation

z = ln((1+r)/(1-r))/2,

z would approximately follow a normal distribution,
with a mean equals to z(r_hat),
and a variance sigma^2 that equals to 1/(n-3), 0.437/(n-4), (1+r_hat^2/2)/(n-3) for the Pearson's r, Kendall's tau, and Spearman's rho, respectively (read Ref. [1, 2] for more details). n is the array length.

The (1-alpha) CI for r would be

(T(z_lower), T(z_upper))

where T is the inverse of the transformation mentioned earlier

T(x) = (exp(2x) - 1) / (exp(2x) + 1),

z_lower = z - z_(1-alpha/2) sigma,

z_upper = z + z_(1-alpha/2) sigma.

This normal approximation works when the absolute values of the Pearson's r, Kendall's tau, and Spearman's rho are less than 1, 0.8, and 0.95, respectively.

Nonparametric Approach

For the nonparametric approach, we simply adopt a naive bootstrap method.

We sample a pair (x_i, y_i) with replacement from the original (paired) samples until we have a sample size that equals to n, and calculate a correlation coefficient from the new samples.
Repeat this process for a large number of times (by default we use 5000),
then we could obtain the (1-alpha) CI for r by taking the alpha/2 and (1-alpha/2) quantiles of the obtained correlation coefficients.

References

[1] Bonett, Douglas G., and Thomas A. Wright. "Sample size requirements for estimating Pearson, Kendall and Spearman correlations." Psychometrika 65, no. 1 (2000): 23-28.
[2] Bishara, Anthony J., and James B. Hittner. "Confidence intervals for correlations when data are not normal." Behavior research methods 49, no. 1 (2017): 294-309.

Installation:

pip install correlation

conda install -c wangxiangwen correlation

Example Usage:

>>> import correlation
>>> a, b = list(range(2000)), list(range(200, 0, -1)) * 10
>>> correlation.corr(a, b, method='spearman_rho')
(-0.0999987624920335, # correlation coefficient
 -0.14330929583811683, # lower endpoint of CI
 -0.056305939127336606, # upper endpoint of CI
 7.446171861744971e-06) # p-value

About

Calculate confidence intervals for correlation coefficients

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

jfenger/correlation

Folders and files

Latest commit

History

Repository files navigation

correlation

Methodology

Parametric Approach

Nonparametric Approach

References

Installation:

Example Usage:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

jfenger/correlation

Folders and files

Latest commit

History

Repository files navigation

correlation

Methodology

Parametric Approach

Nonparametric Approach

References

Installation:

Example Usage:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages