NCL Home> Documentation> Functions> General applied math, unknown

regline_stats

Performs simple linear regression including confidence estimates, an ANOVA table and 95% mean response estimates.

Available in version 6.2.0 and later.

Prototype

load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl" ; This library is automatically loaded
 ; from NCL V6.2.0 onward.
 ; No need for user to explicitly load.
	function regline_stats (
		x [*] : numeric, 
		y [*] : numeric 
	)
	return_val [1] : float or double with many attributes

Arguments

x

A one dimensional array of size N containing the independent variable. Missing values, indicated by _FillValue, are allowed but are ignored.

y

A one-dimensional array of length N containing the corresponding dependent variable. Missing values, indicated by _FillValue, are allowed but are ignored.

Return value

A scalar containing the calculated linear regression coefficient. In addition, many statistical quantities are returned as attributes of b. The 95% limits were introduced in version 6.4.0

The return value will be of type double if x or y is double, and float otherwise.

Description

regline_stats performs a simple linear regression. The calculated regression coefficient represents the rate of change in the dependent variable for a unit change in the independent variable. This coefficient is sometimes referred to as the slope or trend.

The user may wish to take into account the uncertainty associated with each estimated parameter and the overall uncertainty. This information is returned as the attributes stderr, tval and pval. In addition, an ANalysis Of VAriance [ANOVA] table is returned.

Any missing values are ignored.
Missing values should be indicated by the _FillValue attribute.

Chad Hermann created a function linreg which derived many of the same statistics. It is written entirely in NCL and is here.

Debasish Mazumder (NCAR/RAL) submitted code for the 95% mean response about the regression line. This was included within the regline_stats in the 6.4.0 release of NCL. A reference for this is Chapter 11 of

 Applied Statistics and Probability for Engineers
 D. C. Montgomery and G. C. Runge
 ISBN: 0471204544

From the WWW:

  • the confidence intervals tells you about the likely location of the true population parameter
  • the prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval.

GRAPHICS NOTE: For the computations, the x and y pairs can be in any order. However, subsequent use for plotting lines often requires that the x and the associated y be in monotonically {in/de}creasing order. See Example 2 below.

See Also

regline, reg_multlin, reg_multlin_stats

Examples

See examples regress_1a.ncl and regress_1b.ncl for plot and usage of returned information. Example 1

The following example was taken from:

 Brownlee
 Statistical Theory and Methodology
 J Wiley 1965 pgs: 342-346 QA276 .B77
See example regress_1a.ncl for plot and usage of returned information.
 x = (/ 1190.,1455.,1550.,1730.,1745.,1770. \
 , 1900.,1920.,1960.,2295.,2335.,2490. \
 , 2720.,2710.,2530.,2900.,2760.,3010. /)
 
 y = (/ 1115.,1425.,1515.,1795.,1715.,1710. \
 , 1830.,1920.,1970.,2300.,2280.,2520. \
 , 2630.,2740.,2390.,2800.,2630.,2970. /)
 
 rc = regline_stats(x,y) ; linear regression coef
 print(rc)
The print(rc) yielded the following (edited) output. All the comments were manually added.
 Variable: rc
 Type: float
 Total Size: 4 bytes
 1 values
 Number of Dimensions: 1
 Dimensions and sizes:	[1]
 Coordinates: 
 Number Of Attributes: 38
 _FillValue :	9.96921e+36
 long_name :	simple linear regression
 model :	Yest = b(0) + b(1)*X
 N :	18 ; # of observations
 NP :	1 ; # of predictor variables
 M :	2 ; # of returned coefficients
 bstd :	( 0, 0.9947125 ) ; standardized regression coefficient
 ; [ignore 1st element]
 ; ANOVA information: SS=>Sum of Squares
 SST :	4823324 ; Total SS: sum((Y-Yavg)^2) 
 SSE :	50871.91 ; Residual SS: sum((Yest-Y)^2)
 SSR :	4772452 ; sum((Yest-Yavg)^2)
 MST :	283724.9
 MSE :	3179.494
 MSE_dof :	16
 MSR :	4772452
 RSE :	56.387 ; residual standard error; sqrt(MSE)
 RSE_dof :	15
 F :	1501.01 ; MSR/MSE
 F_dof :	( 1, 16 )
 F_pval :	1.51066e-17
 r2 :	0.989453 ; square of the Pearson correlation coefficient 
 r :	0.9947125 ; multiple (overall) correlation: sqrt(r2)
 r2a :	0.9887938 ; adjusted r2... better for small N
 fuv :	0.01054698 ; (1-r2): fraction of variance of the regressand
 ; (dependent variable) Y which cannot be explained
 ; i.e., which is not correctly predicted, by the 
 ; explanatory variables X.
 Yest :	[ARRAY of 18 elements] ; Yest = b(0) + b(1)*x 
 Yavg :	2125.278 ; avg(y)
 Ystd :	532.6583 ; stddev(y)
 Xavg :	2165 ; avg(x)
 Xstd :	543.6721 ; stddev(x)
 stderr :	( 56.05802, 0.02515461 ) ; std. error of each b
 tval :	( 0.2738642, 38.74285 ) ; t-value
 pval :	( 0.7874895, 5.017699e-18 ) ; p-value
 ; following added in version 6.4.0
 b95 : ( 0.9212361, 1.027887 ) ; 2.5% and 97.5% regression coef. confidence intervals
 y95 : ( -459.9983, 490.703 ) ; 2.5% and 97.5% y-intercept confidence intervals
 YMR025 : [ARRAY of 18 elements] ; 2.5% mean response
 YMR975 : [ARRAY of 18 elements] ; 97.5% mean response
 YPI025 :	[ARRAY of 18 elements] ; 2.5% predition interval
 YPI975 :	[ARRAY of 18 elements] ; 97.5% predition interval
 ; Original regline attributes 
 ; provided for backward cmpatibility
 nptxy :	18
 xave :	2165
 yave :	2125.278
 rstd :	56.387
 yintercept :	15.35228
 b :	( 15.35228, 0.9745615 ) ; b(0) + b(1)*x
 (0)	0.9745615 ; regression coefficient 

Example 2

The following example was taken from: http://www.stat.ucla.edu/~hqxu/stat105/pdf/ch11.pdf. It is a commonly used WWW example. AS noted above, regline_stats does not require any special ordering. However for subs Here the independent values (x) and associated dependent variables (y) are placed into ascending order for subsequent plotting.

See example regress_1b.ncl for plot and usage of returned information.

; Hydrocarbon level (%) 
 x = (/ 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40 \
 , 1.19, 1.15, 0.98, 1.01, 1.11, 1.20, 1.26, 1.32, 1.43, 0.95 /)
; Purity (%) 
 y = (/ 90.01, 89.05, 91.43, 93.74, 96.73, 94.45, 87.59, 91.77, 99.42, 93.65 \
 , 93.54, 92.52, 90.56, 89.54, 89.85, 90.39, 93.25, 93.41, 94.98, 87.33 /) 
; perform regression on the variables
 rc = regline_stats(xx,yy) ; linear regression coef
 print(rc)
; For graphical purposes, it is recommended that the independent variable be 
; monotonically {in/de}creasing
 ii = dim_pqsort_n(x,1,0)
 xx = x(ii)
 yy = y(ii)
 ...
 plot = gsn_csm_xy(wks,xx,yy, res)
yielded
 Variable: rc
 Type: float
 Total Size: 4 bytes
 1 values
 Number of Dimensions: 1
 Dimensions and sizes:	[1]
 Coordinates: 
 Number Of Attributes: 41
 _FillValue :	9.96921e+36
 long_name :	simple linear regression
 model :	Yest = b(0) + b(1)*X
 N :	20
 NP :	1
 M :	2
 bstd :	( 0, 0.9367155 )
 SST :	173.3769
 SSE :	21.24981
 SSR :	152.1272
 MST :	9.125101
 MSE :	1.180545
 MSE_dof :	18
 MSR :	152.1272
 RSE :	1.086529
 RSE_dof :	17
 F :	128.8618
 F_dof :	( 1, 18 )
 F_pval :	1.227305e-09
 r2 :	0.8774363
 r :	0.9367157
 r2a :	0.8706272
 fuv :	0.1225637
 Yest :	[ARRAY of 20 elements]
 Yavg :	92.1605
 Ystd :	3.020778
 Xavg :	1.196
 Xstd :	0.1893034
 stderr :	( 1.593479, 1.316763 )
 tval :	( 46.61706, 11.35169 )
 pval :	( 4.660713e-21, 6.580665e-10 )
 nptxy :	20
 xave :	1.196
 yave :	92.1605
 rstd :	1.086529
 yintercept :	74.28331
 
 b :	( 74.28331, 14.94748 )
 b95 :	( 12.18108, 17.71389 )
 y95 :	( 60.07995, 88.48667 )
 YMR025 : [ARRAY of 20 elements] ; 2.5% mean response
 YMR975 : [ARRAY of 20 elements] ; 97.5% mean response
 YPI025 :	[ARRAY of 20 elements] ; 2.5% predition interval
 YPI975 :	[ARRAY of 20 elements] ; 97.5% predition interval
 (0)	14.94748
Example 3

Use the 'Old Faithful' data set used by several R regression examples Here, we use NCL's regline and the ANOVA version regline_stats.

See regress_8.ncl for a plot of the data and the regression line.

Simplefied R code:
 eruption.lm = lm(eruptions ~ waiting, data=faithful)
 summary(eruption.lm) 
Summary output:
 Call: 
 lm(formula = eruptions ~ waiting, data = faithful) 
 
 Residuals: 
 Min 1Q Median 3Q Max 
 -1.2992 -0.3769 0.0351 0.3491 1.1933 
 
 Coefficients: 
 Estimate Std. Error t value Pr(>|t|) 
  (Intercept) -1.87402 0.16014 -11.7 <2e-16 *** waiting 0.07563 0.00222 34.1 <2e-16 *** 
 --- 
 Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 
 
 Residual standard error: 0.497 on 270 degrees of freedom 
 Multiple R-squared: 0.811, Adjusted R-squared: 0.811 
  
The NCL code is:
 diri = "./"
 fili = "faithful_R.dat"
 pthi = diri + fili
 ncol = 3
 data = readAsciiTable(pthi, ncol, "float", 1)
 x = data(:,2) ; eruption interval (minutes); independent
 y = data(:,1) ; eruption duration (minutes); dependent
 nxy = dimsizes(y)
 x@long_name = "Eruption Interval (min)"
 y@long_name = "Eruption Duration (min)"
 ;print("============ OLD Faithful================")
 ;print(sprintf("%4.0f", x)+" "+sprintf("%4.0f", y))
 ;print("=========================================")
;---Assorted statistics and regression line fit
 ;rxy = escorc(x,y) ; cross-correlation
 ;print(rxy)
 ;print("=========================================")
 rcl = regline(x, y) ; basic linear regression
 print(rcl)
 df = rcl@nptxy-2
 prob = (1 - betainc(df/(df+rcl@tval^2), df/2.0, 0.5) )
 print("=========================================")
 rcl_anova = regline_stats(x,y) ; linear regression: ANOVA
 print(rcl_anova)
 print("=========================================")
The (slightly) edited output is:
 Variable: rcl
 Type: float
 Total Size: 4 bytes
 1 values
 Number of Dimensions: 1
 Dimensions and sizes:	[1]
 Coordinates: 
 Number Of Attributes: 7
 _FillValue :	9.96921e+36
 yintercept :	-1.874016
 yave :	3.487783
 xave :	70.89706
 nptxy :	272
 rstd :	0.002218541
 tval :	34.08904
 (0)	0.07562795
 (0)	=========================================
 
 
 Variable: rcl_anova
 Type: float
 Total Size: 4 bytes
 1 values
 Number of Dimensions: 1
 Dimensions and sizes:	[1]
 Coordinates: 
 Number Of Attributes: 44
 _FillValue :	9.96921e+36
 long_name :	simple linear regression
 model :	Yest = b(0) + b(1)*X
 N :	272
 NP :	1
 M :	2
 bstd :	( 0, 0.9008112 )
 SST :	353.0395
 SSE :	66.56177
 SSR :	286.4776
 MST :	1.302729
 MSE :	0.2465251
 MSE_dof :	270
 MSR :	286.4776
 RSE :	0.4965129
 RSE_dof :	269
 F :	1162.063
 F_dof :	( 1, 270 )
 F_pval :	 0
 r2 :	0.8114605
 r :	0.900811
 r2a :	0.8107622
 fuv :	0.1885395
 Yest :	
 Yavg :	3.487783
 Ystd :	1.141371
 Xavg :	70.89706
 Xstd :	13.59497
 stderr :	( 0.1601433, 0.002218541 )
 tval :	( -11.70212, 34.08904 )
 pval :	( 7.089058e-26, 0 )
 nptxy :	272
 xave :	70.89706
 yave :	3.487783
 rstd :	0.4965129
 yintercept :	-1.874016
 b :	( -1.874016, 0.07562795 )
 tnull :	34.08905
 b95 :	( 0.07126011, 0.07999578 )
 y95 :	( -2.189304, -1.558728 )
 YMR025 :	
 YMR975 :	
 YPI025 :	
 YPI975 :	
 (0)	0.07562795


AltStyle によって変換されたページ (->オリジナル) /