Gretl is an open-source, cross-platform econometrics program. Its development is hosted by sourceforge.
The U.S. National Institute of Standards and Technology (NIST) publishes a set of statistical reference datasets. The object of this project is to "improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software".
As of September 2010 the website for the project can be found at:
http://itl.nist.gov/div898/strd/general/main.html
while the datasets are at
http://itl.nist.gov/div898/strd/general/dataarchive.html
For testing gretl I have made use of the datasets pertaining to Linear Regression and Univariate Summary Statistics (the others deal with ANOVA and nonlinear regression).
I quote from the NIST text "Certification Method & Definitions" regarding their certified computational results (emphasis added):
For all datasets, multiple precision calculations (accurate to 500 digits) were made using the preprocessor and FORTRAN subroutine package of Bailey (1995, available from NETLIB). Data were read in exactly as multiple precision numbers and all calculations were made with this very high precision. The results were output in multiple precision, and only then rounded to fifteen significant digits. These multiple precision results are an idealization. They represent what would be achieved if calculations were made without roundoff or other errors. Any typical numerical algorithm (i.e. not implemented in multiple precision) will introduce computational inaccuracies, and will produce results which differ slightly from these certified values.
It is not to be expected that results obtained from ordinary statistical packages will agree exactly with NIST's multiple precision benchmark figures. But the benchmark provides a very useful test for egregious errors and imprecision.
Table 1 below shows the performance of both gretl's standard regression facility and the gretl plugin based on the Gnu Multiple Precision (GMP) library. In the Gretl column the "min. correct significant digits" figure shows, for each model, the least number of correct significant digits in the gretl results when the various statistics associated with the model (regression coefficients and standard errors, sum of squared residuals, standard error of residuals, F statistic and R2) are compared with the NIST certified values. The GMP plugin column simply records whether the gretl results were correct to at least 12 sigificant figures for all the statistics. For these tests gretl was compiled using gcc 2.95.3 with the -O2 optimization flag, linked against glibc-2.2.5, and run on an IBM ThinkPad with Pentium III processor.
Table 1. NIST linear regression tests
Dataset | Model | Gretl (min. correct significant digits) | GMP plugin (correct to at least 12 digits?) |
---|---|---|---|
Norris | Simple linear regression | 9 | Yes |
Pontius | Quadratic | 8 | Yes |
NoInt1 | Simple regression, no intercept | 9 (but see text) | Yes |
NoInt2 | Simple regression, no intercept | 9 (but see text) | Yes |
Filip | 10th degree polynomial | 0 (see text) | Yes |
Longley | Multiple regression, six independent variables | 8 | Yes |
Wampler1 | 5th degree polynomial | 7 | Yes |
Wampler2 | 5th degree polynomial | 9 | Yes |
Wampler3 | 5th degree polynomial | 7 | Yes |
Wampler4 | 5th degree polynomial | 7 | Yes |
Wampler5 | 5th degree polynomial | 7 | Yes |
As can be seen from the table, gretl does a good job of tracking the certified results. With the Filip data set, where the model is
In the NoInt1 and NoInt2 datasets there is a methodological disagreement over the calculation of the coefficient of determination, R2, where the regression does not have an intercept. gretl reports the square of the correlation coefficient between the fitted and actual values of the dependent variable in this case, while the NIST figure is
genr r2alt = 1 - $ess/sum(y * y)
and the numbers thus obtained were in agreement with the certified values, up to gretl's precision.
As for the univariate summary statistics, the certified values given by NIST are for the sample mean, sample standard deviation and sample lag-1 autocorrelation coefficient. NIST note that the latter statistic "may have several definitions". The certified value is computed as
genr y1 = y(-1) genr ybar = mean(y) genr devy = y - ybar genr devy1 = y1 - ybar genr ssy = sum(devy * devy) smpl 2 ; genr ssyy1 = sum(devy * devy1) genr rnist = ssyy1 / ssy |
The figure rnist was then compared with the certified value.
With this modification, all the summary statistics were in agreement (to the precision given by gretl) for all datasets (PiDigits, Lottery, Lew, Mavro, Michelso, NumAcc1, NumAcc2, NumAcc3 and NumAcc4).