Compute Effect Sizes for continuous or categorical data
Description
Psychologists' so-called "effect size" reveals the practical significance of only one regressor. This function generalizes their algorithm to two or more regressors (p>2). Generalization first converts the xi regressor into a categorical treatment variable with only two categories. One imagines that observations larger than the median (xit> median(xi)) are "treated," and those below the median are "untreated." The aim is the measure the size of the (treatment) effect of (xi) on y. Denote other variables with postscript "o" as (xo). Since we have p regressors in our multiple regression, we need to remove the nonlinear kernel regression effect of other variables (xo) on y while focusing on the effect of xi. There are two options in treating (xo) (i) letting xo be as they are in the data (ii) converting xo to binary at the median. One chooses the first option (i) by setting the logical argument ane=TRUE in calling the function. ane=TRUE is the default. Set ane=FALSE for the second option.
Usage
effSizCut(y, bigx, ane = TRUE)
Arguments
y
A (T x 1) vector of dependent variable data values.
bigx
A (T x p) data matrix of xi regressor variables associated with the regression.
ane
logical variable controls the treatment of other regressors. If ane=TRUE (default), other regressors are used in kernel regression without forcing them to be binary variables. When ane=FALSE, the kernel regression removes the effect of other regressors when other regressors are also binary type categorical variables
Value
out vector with p values of t-statistics for p regressors
Note
The aim is to answer the following question. Which regressor has the largest effect on the dependent variable? We assume that the signs of regressors are already adjusted such that a numerically larger effect size suggests that the corresponding regressor is most important, having the largest effect size in explaining y the dependent variable.
Author(s)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
See Also
Examples
set.seed(9)
y=sample(1:15,replace = TRUE)
x1=sample(2:16, replace = TRUE)
x2=sample(3:17, replace = TRUE)
effSizCut(y,bigx=cbind(x1,x2),ane=TRUE)
fncut auxiliary converts continuous data into two categories
Description
This is an internal function of the R package practicalSigni Psychologists use effect size to evaluate the practical importance of a treatment on a dependent variable using a binary [0,1] variable. Assuming numerical data, we can always compute the median and regard values < or = the median as zero and other values as unity.
Usage
fncut(x)
Arguments
x
numerical vector of data values
Value
x vector of zeros and ones split at the median.
Author(s)
Prof. H. D. Vinod, Fordham University, NY
Compute thirteen measures of practical significance
Description
Thirteen methods are denoted m1 to m13. Each yields p numbers when there are p regressors denoted xi. m1=OLS coefficient slopes. m2= t-stat of each slope. m3= beta coefficients OLS after all variables have mean zero and sd=1. m4= Pearson correlation coefficient between y and xi (only two variables at a time, assuming linearity). Let r*(y|xi) denote the generalized correlation coefficient allowing for nonlinearity from Vinod (2021, 2022). It does not equal analogous r*(xi|y). The larger of the two, max(r*(y|xi), r*(xi|y)), is given by the function depMeas() from the 'generalCorr' package. m5= depMeas, which allows nonlinearity. m5 is not comprehensive because it measures only two variables, y and xi, at a time. m6= generalized partial correlation coefficient or GPCC. This is the first comprehensive measure of practical significance. m7=a generalization of psychologists' "effect size" after incorporating the nonlinear effect of other variables. m8= local linear partial (dy/dxi) using the 'np' package for kernel regressions and local linear derivatives. m9= partial derivative (dy/dxi) using the 'NNS' package. m10=importance measure using NNS.boost() function of 'NNS.' m11=Shapley Value measure of importance (cooperative game theory). m12 and m13= two versions of the random forest algorithm measuring the importance of regressors.
Usage
pracSig13(y, bigx, yes13 = rep(1, 13), verbo = FALSE)
Arguments
y
input dependent variable data as a vector
bigx
input matrix of p regressor variables
yes13
vector of ones to compute respective 13 measures m1 to m13. Default is all ones to compute all e.g., yes13[10]=0 means do not compute the m10 method.
verbo
logical to print results along the way default=FALSE
Details
If m6, m10 slow down computations, we recommend setting yes13[6]=0=yes13[10] to turn off slowcomputation of m6 and m10 at least initially to get quick answers for other m's.
Value
output matrix (p x 13) containing m1 to m13 criteria (numerical measures of practical significance) along columns and a row for each regressor (excluding the intercept).
Note
needs the function kern(), which requires package 'np'. also needs 'NNS', 'randomForest', packages.
The machine learning methods are subject to random seeds. For some seed values, m10 values from NNS.boost() become degenerate and are reported as NA or missing. In that case the average ranking output r613 from reportRank() needs manual adjustments.
Author(s)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
References
Vinod, H. D."Generalized Correlation and Kernel Causality with Applications in Development Economics" in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D.", "Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark," (March 8, 2015). https://www.ssrn.com/abstract=2574891
Vinod, H. D. "Generalized, Partial and Canonical Correlation Coefficients," Computational Economics (2021) SpringerLink vol. 59, pp.1-28. URL https://link.springer.com/article/10.1007/s10614-021-10190-x
Vinod, H. D. "Kernel regression coefficients for practical significance," Journal of Risk and Financial Management 15(1), 2022 pp.1-13. https://doi.org/10.3390/jrfm15010032
Vinod, H. D.", "Hands-On Intermediate Econometrics Using R" (2022) World Scientific Publishers: Hackensack, NJ. https://www.worldscientific.com/worldscibooks/10.1142/12831
See Also
See Also as effSizCut ,
See Also as reportRank
Compute the p-value for exact correlation significance test using Taraldsen's exact methods.
Description
Compute the p-value for exact correlation significance test using Taraldsen's exact methods.
Usage
pvTarald(n, rho = 0, obsr)
Arguments
n
number of observations, n-1 is degrees of freedom
rho
True unknown population correlation coefficient in the r-interval [-1, 1], default=0
obsr
observed r or correlation coefficient
Value
ans is the p-value or probability from sampling distribution of observing a correlation as extreme or more extreme than the input obsr or observed r.
Note
needs function hypergeo from the package of that name.
Author(s)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
References
Taraldsen, G. "The Confidence Density for Correlation" Sankhya: The Indian Journal of Statistics 2023, Volume 85-A, Part 1, pp. 600-616.
See Also
See Also as qTarald ,
Compute the quantile for exact t test density using Taraldsen's methods
Description
Compute the quantile for exact t test density using Taraldsen's methods
Usage
qTarald(n, rho = 0, cum)
Arguments
n
number of observastions, n-1 is degrees of freedom
rho
True unknown population correlation coefficient, default=0
cum
cumulative probability for which quantile is needed
Value
r quantile of Taraldsen's density for correlation coefficient.
Note
needs function hypergeo::hypergeo(). The quantiles are rounded to 3 places and computed by numerical methods.
Author(s)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
References
Taraldsen, G. "The Confidence Density for Correlation" Sankhya: The Indian Journal of Statistics 2023, Volume 85-A, Part 1, pp. 600-616.
See Also
See Also as pvTarald ,
Function to report ranks of 13 criteria for practical significance
Description
This function generates a report based on the regression of y on bigx. It acknowledges that some methods for evaluating the importance of regressor in explaining y may give the importance value with a wrong (unrealistic) sign. For example, m2 reports t-values. Imagine that due to collinearity m2 value is negative when the correct sign from prior knowledge of the subject matter is that the coefficient should be positive, and hence the t-stat should be positive. The wrong sign means the importance of regressor in explaining y should be regarded as relatively less important. The larger the absolute size of the t-stat, the less its true importance in measuring y. The ranking of coefficients computed here suitably deprecates the importance of the regressor when its coefficient has the wrong sign (perverse direction).
Usage
reportRank(
y,
bigx,
yesLatex = 1,
yes13 = rep(1, 13),
bsign = 0,
dig = 3,
verbo = FALSE
)
Arguments
y
A (T x 1) vector of dependent variable data y
bigx
a (T x p) data marix of xi regressor variables associated with the regression
yesLatex
default 1 means print Latex-ready Tables
yes13
default vector of ones to compute all 13 measures.
bsign
A (p x 1) vector of right signs of regression coefficients. Default is bsign=0 means the right sign is the same as the sign of the covariance, cov(y, xi)
dig
digits to be printed in latex tables, default, dig=d33
verbo
logical to print results by pracSig13, default=FALSE
Value
v15
practical significance index values (sign adjusted) for m1 to m5 using older linear and /or bivariate methods
v613
practical significance index values for m6 to m13 newer comprehensive and nonlinear methods
r15
ranks and average rank for m1 to m5 using older linear and /or bivariate methods
r613
ranks and average rank for m6 to m13 newer comprehensive and nonlinear methods
Note
The machine learning methods are subject to random seeds. For some seed values, m10 values from NNS.boost() rarely become degenerate and are reported as NA or missing. In that case the average ranking output r613 here needs adjustment.
Author(s)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
See Also
Examples
set.seed(9)
y=sample(1:15,replace = TRUE)
x0=sample(2:16, replace = TRUE)
x2=sample(3:17, replace = TRUE)
x3=sample(4:18,replace = TRUE)
options(np.messages=FALSE)
yes13=rep(1,13)
yes13[10]=0
reportRank(y,bigx=cbind(x0,x2,x3),yes13=yes13)