7
\$\begingroup\$

I'm trying to generate prediction bands for an exponential fit to some 2-dimensional data (available here).

The data (blue points), best fit found by scipy.optimize.curve_fit (red curve), and lower & upper 95% prediction bands (green curves) can be seen in the image below.

enter image description here

I'd love some confirmation that the code is actually doing things correctly and I haven't missed some step or simply used the wrong statistical tools.

The references I used are basically this post, and this video, both adapted to work with a general function, rather than being hardcoded to work with a linear function of the form \$y = a*x+b\$.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import stats
def make_plot(x, xd, yd, popt, upb, lpb):
 # Plot data.
 plt.scatter(xd, yd)
 # Plot best fit curve.
 plt.plot(x, func(x, *popt), c='r')
 # Prediction band (upper)
 plt.plot(x, upb, c='g')
 # Prediction band (lower)
 plt.plot(x, lpb, c='g')
 plt.show()
def func(x, a, b, c):
 '''Exponential 3-param function.'''
 return a * np.exp(b * x) + c
def predband(x, xd, yd, f_vars, conf=0.95):
 """
 Code adapted from Rodrigo Nemmen's post:
 http://astropython.blogspot.com.ar/2011/12/calculating-prediction-band-
 of-linear.html
 Calculates the prediction band of the regression model at the
 desired confidence level.
 Clarification of the difference between confidence and prediction bands:
 "The prediction bands are further from the best-fit line than the
 confidence bands, a lot further if you have many data points. The 95%
 prediction band is the area in which you expect 95% of all data points
 to fall. In contrast, the 95% confidence band is the area that has a
 95% chance of containing the true regression line."
 (from http://www.graphpad.com/guides/prism/6/curve-fitting/index.htm?
 reg_graphing_tips_linear_regressio.htm)
 Arguments:
 - x: array with x values to calculate the confidence band.
 - xd, yd: data arrays.
 - a, b, c: linear fit parameters.
 - conf: desired confidence level, by default 0.95 (2 sigma)
 References:
 1. http://www.JerryDallal.com/LHSP/slr.htm, Introduction to Simple Linear
 Regression, Gerard E. Dallal, Ph.D.
 """
 alpha = 1. - conf # Significance
 N = xd.size # data sample size
 var_n = len(f_vars) # Number of variables used by the fitted function.
 # Quantile of Student's t distribution for p=(1 - alpha/2)
 q = stats.t.ppf(1. - alpha / 2., N - var_n)
 # Std. deviation of an individual measurement (Bevington, eq. 6.15)
 se = np.sqrt(1. / (N - var_n) * np.sum((yd - func(xd, *f_vars)) ** 2))
 # Auxiliary definitions
 sx = (x - xd.mean()) ** 2
 sxd = np.sum((xd - xd.mean()) ** 2)
 # Predicted values (best-fit model)
 yp = func(x, *f_vars)
 # Prediction band
 dy = q * se * np.sqrt(1. + (1. / N) + (sx / sxd))
 # Upper & lower prediction bands.
 lpb, upb = yp - dy, yp + dy
 return lpb, upb
# Read data from file.
xd, yd = np.loadtxt('exponential_data.dat', unpack=True)
# Find best fit of data with 3-parameters exponential function.
popt, pcov = curve_fit(func, xd, yd)
# Generate equi-spaced x values.
x = np.linspace(xd.min(), xd.max(), 100)
# Call function to generate lower an upper prediction bands.
lpb, upb = predband(x, xd, yd, popt, conf=0.95)
# Plot.
make_plot(x, xd, yd, popt, upb, lpb)
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 18, 2015 at 21:20
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

There is a nice discussion about confidence/prediction bands in the documentation for the excellent kmpfit module. You need to include the convariance matrix from the fit in the calculation of the bands, which I don't see in your code. There is an example in the link that walks through it.

Here is their implementation:

def confpred_band(x, dfdp, prob, fitobj, f, prediction, abswei=False, err=None):
 #----------------------------------------------------------
 # Return values for a confidence or a prediction band.
 # See documentation for methods confidence_band and 
 # prediction_band
 #---------------------------------------------------------- 
 from scipy.stats import t
 # Given the confidence or prediction probability prob = 1-alpha
 # we derive alpha = 1 - prob 
 alpha = 1 - prob
 prb = 1.0 - alpha/2
 tval = t.ppf(prb, fitobj.dof)
 C = fitobj.covar
 n = len(fitobj.params) # Number of parameters from covariance matrix
 p = fitobj.params
 N = len(x)
 if abswei:
 covscale = 1.0 # Do not apply correction with red. chi^2
 else:
 covscale = fitobj.rchi2_min
 df2 = numpy.zeros(N)
 for j in range(n):
 for k in range(n):
 df2 += dfdp[j]*dfdp[k]*C[j,k]
 if prediction:
 df = numpy.sqrt(err*err+covscale*df2)
 else:
 df = numpy.sqrt(covscale*df2)
 y = f(p, x)
 delta = tval * df 
 upperband = y + delta
 lowerband = y - delta 
 return y, upperband, lowerband
answered Mar 18, 2015 at 21:39
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Please don't only include the code from the link. Add some explanation to it to help the author of the original post. \$\endgroup\$ Commented Mar 18, 2015 at 22:11
  • \$\begingroup\$ The example in the page linked is filled with errors and won't run. The code given here is better but it doesn't really match with that example, mainly because it receives 8 params when the code in the example sends 5. Furthermore I believe the example talks about confidence intervals and not prediction intervals (graphpad.com/support/faqid/1506). \$\endgroup\$ Commented Mar 19, 2015 at 11:42

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.