0

I have an array of shape (n, timesteps), where n is the number of trials and timesteps is the length of each trial. Each value of this array denotes a stochastic measurement.
I would like to implement a generic function that computes a confidence interval for a given statistic (mean, median, ...) assuming 1) the underlying distribution is normal, 2) is Student's.

Something like

def normal_ci(
 data: np.array,
 axis: int = 0,
 statistic: Callable = np.mean,
 confidence: float = 0.95
):

and a similar function student_ci().

My problem is that, by default, scipy functions compute intervals for the mean, am I right? Like in this answer.

Nick ODell
28.2k7 gold badges53 silver badges93 bronze badges
asked Nov 10, 2024 at 2:12

1 Answer 1

2

This is on Stack Overflow, so the computational answer is to use the bootstrap.

from typing import Callable
import numpy as np
from scipy import stats
rng = np.random.default_rng(84354894298246)
def normal_ci(
 data: np.array,
 axis: int = 0,
 statistic: Callable = np.mean,
 confidence: float = 0.95
):
 res = stats.bootstrap((data,), statistic, axis=axis,
 confidence_level=confidence)
 return tuple(res.confidence_interval)
# generate 1000 samples, each of length 100
data = rng.standard_normal(size=(1000, 100))
# compute the confidence interval for each
low, high = normal_ci(data, axis=-1)
# the confidence interval contains the population value
# of the statistic in 95% of cases
np.mean((low < 0) & (0 < high)) # 0.953

Since you know the families from which your data are drawn, you could look into the parametric bootstrap.

def normal_ci(
 data: np.array,
 axis: int = 0,
 statistic: Callable = np.mean,
 confidence: float = 0.95
):
 # fit a normal distribution to the data
 mean = np.mean(data, axis=axis)
 std = np.std(data, ddof=1, axis=axis)
 # resample data from the fitted normal distribution
 n_resamples = 999
 m = data.shape[axis] # assuming axis is an integer
 resample = rng.normal(loc=mean, scale=std, size=(m, n_resamples) + mean.shape)
 # compute the statistic for each of the resamples to estimate
 # the distribution
 statistic_distribution = statistic(resample, axis=0)
 # Generate the confidence interval
 # percentile bootstrap (very crude)
 alpha = 1 - confidence
 low, high = np.quantile(statistic_distribution, [alpha/2, 1-alpha/2], axis=0)
 return low, high
low, high = normal_ci(data, axis=-1)
np.mean((low < 0) & (0 < high)) # 0.954

SciPy's bootstrap function uses the BCa method by default, which is reasonably sophisticated; this parametric bootstrap is what I could write in a few minutes. So for parametric bootstrap, you really should research what other libraries are out there.

If you're not happy with the bootstrap, you'd need to do some research: for each of the statistics and population families you're interested in, you need to know the distribution of the statistic of a sample drawn from your population. For a sample from a normal distribution, the sample mean is t-distributed, the variance is chi-squared distributed, etc... and that is not a question for Stack Overflow.

answered Nov 10, 2024 at 4:33
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, I have already implemented bootstrapping but I didn't think about the parametric version. Thanks! I tried yours but when I pass statistic = np.median to your parametric implementation, I get an interval that is almost the same to the mean's. This does not happen if I do non-parametric bootstrapping.
Both are approximate and stochastic. I explained that the second was a crude sketch of parametric bootstrapping. So yeah, they may give different results. Check whether the coverage seems to be about right over many trials; that is the true test. Thanks are appreciated, but please remember Stack Overflow's recommendations on how to say it: "Comments are not recommended for any of the following... Compliments which do not add new information ("+1, great answer!"); instead, upvote it and pay it forward;" (stackoverflow.com/help/privileges/comment).
Sure, I always upvote and accept an answer if it's right :) Sometime I wait a bit because when I do and ask for something with a comment, the person who answered just ignores my comments. Thanks again for replying!

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.