67
$\begingroup$

Christopher Manning's writeup on logistic regression in R shows a logistic regression in R as follows:

ced.logr <- glm(ced.del ~ cat + follows + factor(class), 
 family=binomial)

Some output:

> summary(ced.logr)
Call:
glm(formula = ced.del ~ cat + follows + factor(class),
 family = binomial("logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-3.24384 -1.34325 0.04954 1.01488 6.40094
Coefficients:
 Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.31827 0.12221 -10.787 < 2e-16
catd -0.16931 0.10032 -1.688 0.091459
catm 0.17858 0.08952 1.995 0.046053
catn 0.66672 0.09651 6.908 4.91e-12
catv -0.76754 0.21844 -3.514 0.000442
followsP 0.95255 0.07400 12.872 < 2e-16
followsV 0.53408 0.05660 9.436 < 2e-16
factor(class)2 1.27045 0.10320 12.310 < 2e-16
factor(class)3 1.04805 0.10355 10.122 < 2e-16
factor(class)4 1.37425 0.10155 13.532 < 2e-16
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 958.66 on 51 degrees of freedom
Residual deviance: 198.63 on 42 degrees of freedom
AIC: 446.10
Number of Fisher Scoring iterations: 4

He then goes into some detail about how to interpret coefficients, compare different models, and so on. Quite useful.

However, how much variance does the model account for? A Stata page on logistic regression says:

Technically, $R^2$ cannot be computed the same way in logistic regression as it is in OLS regression. The pseudo-$R^2,ドル in logistic regression, is defined as 1ドル - \frac{L1}{L0},ドル where $L0$ represents the log likelihood for the "constant-only" model and $L1$ is the log likelihood for the full model with constant and predictors.

I understand this at the high level. The constant-only model would be without any of the parameters (only the intercept term). Log likelihood is a measure of how closely the parameters fit the data. In fact, Manning sort of hints that the deviance might be $-2 \log L$. Perhaps null deviance is constant-only and residual deviance is $-2 \log L$ of the model? However, I'm not crystal clear on it.

Can someone verify how one actually computes the pseudo-$R^2$ in R using this example?

kjetil b halvorsen
85.4k32 gold badges215 silver badges694 bronze badges
asked Mar 19, 2011 at 22:44
$\endgroup$
4
  • 7
    $\begingroup$ The usually excellent UCLA statistical computing pages have made a rare error here -- there shouldn't be any parentheses in the expression for pseudo-$R^2,ドル i.e. it should be 1ドル-L_1/L_0$. (Sorry for not answering your queries as I'm about to head for bed -- I'm sure someone else will have answered this before I'm awake enough to do so.) $\endgroup$ Commented Mar 19, 2011 at 23:08
  • 7
    $\begingroup$ A somewhat related question was asked here, Logistic Regression: Which pseudo R-squared measure is the one to report (Cox & Snell or Nagelkerke)?. $\endgroup$ Commented Mar 20, 2011 at 10:52
  • 5
    $\begingroup$ This page discusses several pseudo-R^2s. $\endgroup$ Commented Jul 9, 2011 at 2:19
  • 2
    $\begingroup$ Note: the related question doesn't like any pseudo-R^2s, but prefers cross-validation or holdout test prediction. $\endgroup$ Commented Jul 9, 2011 at 2:39

5 Answers 5

65
$\begingroup$

Don't forget the rms package, by Frank Harrell. You'll find everything you need for fitting and validating GLMs.

Here is a toy example (with only one predictor):

set.seed(101)
n <- 200
x <- rnorm(n)
a <- 1
b <- -2
p <- exp(a+b*x)/(1+exp(a+b*x))
y <- factor(ifelse(runif(n)<p, 1, 0), levels=0:1)
mod1 <- glm(y ~ x, family=binomial)
summary(mod1)

This yields:

Coefficients:
 Estimate Std. Error z value Pr(>|z|) 
(Intercept) 0.8959 0.1969 4.55 5.36e-06 ***
x -1.8720 0.2807 -6.67 2.56e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
(Dispersion parameter for binomial family taken to be 1)
 Null deviance: 258.98 on 199 degrees of freedom
Residual deviance: 181.02 on 198 degrees of freedom
AIC: 185.02

Now, using the lrm function,

require(rms)
mod1b <- lrm(y ~ x)

You soon get a lot of model fit indices, including Nagelkerke $R^2,ドル with print(mod1b):

Logistic Regression Model
lrm(formula = y ~ x)
 Model Likelihood Discrimination Rank Discrim. 
 Ratio Test Indexes Indexes 
Obs 200 LR chi2 77.96 R2 0.445 C 0.852 
 0 70 d.f. 1 g 2.054 Dxy 0.705 
 1 130 Pr(> chi2) <0.0001 gr 7.801 gamma 0.705 
max |deriv| 2e-08 gp 0.319 tau-a 0.322 
 Brier 0.150 
 Coef S.E. Wald Z Pr(>|Z|)
Intercept 0.8959 0.1969 4.55 <0.0001 
x -1.8720 0.2807 -6.67 <0.0001 

Here, $R^2=0.445$ and it is computed as $\left(1-\exp(-\text{LR}/n)\right)/\left(1-\exp(-(-2L_0)/n)\right),ドル where LR is the $\chi^2$ stat (comparing the two nested models you described), whereas the denominator is just the max value for $R^2$. For a perfect model, we would expect $\text{LR}=2L_0,ドル that is $R^2=1$.

By hand,

> mod0 <- update(mod1, .~.-x)
> lr.stat <- lrtest(mod0, mod1)
> (1-exp(-as.numeric(lr.stat$stats[1])/n))/(1-exp(2*as.numeric(logLik(mod0)/n)))
[1] 0.4445742
> mod1b$stats["R2"]
 R2 
0.4445742 

Ewout W. Steyerberg discussed the use of $R^2$ with GLM, in his book Clinical Prediction Models (Springer, 2009, § 4.2.2 pp. 58-60). Basically, the relationship between the LR statistic and Nagelkerke's $R^2$ is approximately linear (it will be more linear with low incidence). Now, as discussed on the earlier thread I linked to in my comment, you can use other measures like the $c$ statistic which is equivalent to the AUC statistic (there's also a nice illustration in the above reference, see Figure 4.6).

answered Apr 20, 2011 at 23:21
$\endgroup$
3
  • $\begingroup$ Can you please explain how you obtained .445? I used 1-exp(-77.96/200) but I got .323. What I am doing wrong? Thanks. $\endgroup$ Commented Sep 11, 2014 at 1:01
  • 3
    $\begingroup$ Which one is Nagelkerke R2? $\endgroup$ Commented Nov 10, 2017 at 7:24
  • 3
    $\begingroup$ @JetLag Under Discrimination Indexes, the Nagelkerke is abbreviated as R2 (i.e. 0.445). You can check this using the function NagelkerkeR2() from the package fmsb. $\endgroup$ Commented Feb 28, 2018 at 15:50
13
$\begingroup$

To easily get a McFadden's pseudo $R^2$ for a fitted model in R, use the "pscl" package by Simon Jackman and use the pR2 command. http://cran.r-project.org/web/packages/pscl/index.html

Ferdi
5,25510 gold badges48 silver badges64 bronze badges
answered Jun 20, 2014 at 19:23
$\endgroup$
12
$\begingroup$

Before calculating the pseudo-$R^2$ for your logistic regression, I want to ask you, do you think McFadden’s or McKelvey-Zavoina’s pseudo-$R^2$ measures good enough? The paper has been published. Surrogate R-squared measure

Cannot find a suitable R-squared measure for binary or ordinal data regression models? Here comes our new research product: the surrogate R-squared.

An R package is available here SurrogateRsq.


Old answer:

Firstly, the McFadden’s Pseudo-$R^2$ of the logistic model does not imply the proportion of the variance of the response explained by explanatory variables at all. But one of the purposes of developing a goodness-of-fit is this property. So I want to go deeper and broader for you to think about what could be a good goodness-of-fit measure for the logit/probit model.

Before answering this question, I want to emphasize "comparability" first. This is the reason why the OLS $R^2$ is such an extensively used measure of goodness of fit. It is useful because we can compare the $R^2$ of different models to get an idea about how each model performs to fit the data and what their adequacy of them is.

Another aspect of the "comparability" is when we compare models across different samples! consider the study of job satisfaction where data may be collected in three different fashions using (a) a quantitative score on a scale of 0-100; (b) a dichotomous indicator Yes/No, or (c) a five-category rating ranging from extremely unsatisfied to extremely satisfied. Although neither the samples nor the empirical models used to draw inferences are the same, "most empirical researchers are explicitly or implicitly making rough comparisons of ‘goodness of fit’ across" these models and samples, because they address similar domain questions (Veall and Zimmermann, 1996). In social studies, "the research experience in the area is far more important than any statistical criteria" such as what specific method is used to select variables (Veall and Zimmermann, 1996).

We believe it is vital to have a goodness-of-fit measure for logit/probit models that is analogous to the OLS R2, as it will ensure the comparability of different models across different samples for similar research questions. There is no existing pseudo-$R^2$ known so far that can meet all the needs.

To solve this issue, we have developed a new goodness-of-fit measure to resemble the OLS R-squared that can also imply the proportion of the variance of the surrogate response explained by explanatory variables. We have had the paper "A new goodness-of-fit measure for probit models: surrogate $R^2$" published with all the details. Please check the package webpage as well: https://xiaoruizhu.github.io/SurrogateRsq/.


Old answer:

Be careful with the calculation of Pseudo-$R^2$:

McFadden’s Pseudo-$R^2$ is calculated as $R^2_M=1- \frac{ln\hat{L}_{full}}{ln\hat{L}_{null}}$, where $ln\hat{L}_{full}$ is the log-likelihood of full model, and $ln\hat{L}_{full}$ is log-likelihood of model with only intercept.

Two approaches to calculate Pseudo-$R^2$:

  1. Use deviance: since $deviance = -2*ln(L_{full})$, $null.deviance = -2*ln(L_{null})$

    pR2 = 1 - mod$deviance / mod$null.deviance # works for glm

But the above approach doesn't work for out-of-sample Pseudo $R^2$

  1. Use the "logLik" function in R and definition(also works for in-sample)

    mod_null <- glm(y~1, family = binomial, data = insample) 1- logLik(mod)/logLik(mod_null)

This can be slightly modified to compute out-of-sample Pseudo $R^2$

Example:

out-of-sample pseudo-R

Usually, the out-of-sample pseudo-$R^2$ is calculated as $$R_p^2=1−\frac{L_{est.out}}{L_{null.out}},$$ where $L_{est.out}$ is the log-likelihood for the out-of-sample period based on the estimated coefficients of the in-sample period, while and $L_{null.out}$ is the log-likelihood for an intercept-only model for the out-of-sample period.

Codes:

pred.out.link <- predict(mod, outSample, type = "link")
mod.out.null <- glm(Default~1, family = binomial, data = outSample)
pR2.out <- 1 - sum(outSample$y * pred.out.link - log(1 + exp(pred.out.link))) / logLik(mod.out.null)
answered Apr 11, 2017 at 22:04
$\endgroup$
4
  • $\begingroup$ $deviance = -2*ln(L_{full})$ does not hold for binomial, just see model1 <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp, data = esoph, family = binomial) and call model1$deviance and -2*logLik(model1). $\endgroup$ Commented Nov 2, 2019 at 11:22
  • $\begingroup$ @Tomas I'm not familiar with the model you wrote, but for logistic regression they appear equal: model1 <- glm(am ~ mpg + disp + hp, data = mtcars, family = binomial) and call model1$deviance and -2*logLik(model1) $\endgroup$ Commented Jul 29, 2021 at 21:38
  • $\begingroup$ @Xiaorui in the 2nd line of your last code block, do you mean "mod.out.null <- glm(y~1, family = binomial, data = outSample)"? If not, could you explain what "gam" and the "Default" variable are? $\endgroup$ Commented Jan 14, 2022 at 22:21
  • $\begingroup$ Sorry, @Phil, I used the gam from the mgcv package to run the "generalized additive models", so I have that line. But you can easily change it to glm for your purpose. $\endgroup$ Commented Feb 18, 2022 at 15:30
7
$\begingroup$

if deviance were proportional to log likelihood, and one uses the definition (see for example McFadden's here)

pseudo R^2 = 1 - L(model) / L(intercept)

then the pseudo-$R^2$ above would be 1ドル - \frac{198.63}{958.66}$ = 0.7928

The question is: is reported deviance proportional to log likelihood?

$\endgroup$
4
  • 3
    $\begingroup$ This pseudo-R^2 does not agree at all with the Nagelkerke R^2 of @chl's answer. $\endgroup$ Commented Jul 9, 2011 at 2:13
  • $\begingroup$ Deviance was defined a -2*LL when I was in school. $\endgroup$ Commented Jul 14, 2016 at 19:11
  • $\begingroup$ @dfrankow it does not agree, because Nagelkerke is a normalization of the Cox and Snell R2, which is different than McFaddens R2. $\endgroup$ Commented Aug 23, 2016 at 20:11
  • $\begingroup$ It doesn't agree when i do 1-loglik(Mod)/loglik(NullMod) . What could be the reason? $\endgroup$ Commented Jan 1, 2020 at 5:01
2
$\begingroup$

If its out of sample, then I believe the $R^2$ must be computed with the according log-likelihoods as $R^2=1-\frac{ll_{full}}{ll_{constant}}$, where $ll_{full}$ is the log-likelihood of the test data with the predictive model calibrated on the training set, and $ll_{constant}$ is the log-likelihood of the test data with a model with just a constant fitted on the training set, and then use the fitted constant to predict on the testing set computing the probabilities and therefore get the log-likelihood.

Note that in a linear regression, is analogous, the out of sample $R^2$ is computed as $R^2=1-\frac{\sum_{i}(y_{i}-\hat{y}_i)^2}{\sum_{i}(y_{i}-\overline{y}_{train})^2}$, where in particular if we look at the denominator term $\sum_{i}(y_{i}-\overline{y}_{train})^2$, the prediction uses the average over the training set, $\overline{y}_{train}$. This is like if we fit a model in the training data with just a constant, so we have to minimize $\sum_{i}(y_i-\beta_0)^2$, which results in $\hat{\beta}_0=\overline{y}_{train}$, then, this plain constant predictive model is the one used as benchamrk (i.e. in the denominator of the oos $R^2$ term) for the computation of the out of sample $R^2$.

answered Apr 1, 2019 at 18:27
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.