An ordered logit model is given; enter image description here
Consider the following log-likelihood function of the logit model: enter image description here
Also given is that k = {0,1,2,3,4}, where for k = 0 & k = 4 we have alpha being -Inf and +Inf respectively. Now my question is how can I code a function that optimizes the log-likelihood function, that is find the MLE of B = B1 and alpha for k = {1,2,3}.
Here is my try;
J <- length(Poss)
log_likelihood <- function(params, Poss = Poss){
beta <- params[1]
alpha <- c(-Inf, params[2:4], +Inf)
for (k in 2:4) {
for (j in J) {
delta[k] <- alpha[k] - Poss[j] * beta
ll <- delta[k] - log(1 + exp(delta[k]) - delta[k] + log(1 + exp(delta[k])))
return(ll)
}
}
return(sum(ll))
}
guess_optim <- c(0,0,0,0) #for beta1 and alpha1,2,3
optim(guess_optim, log_likelihood, Poss = Poss)
Only this does not work, how can I fix this issue?
1 Answer 1
I think the very first issue is that you return(ll) inside the inner loop, which will result in the very first value of the very first record -- your procedure does not iterate until it reaches return(sum(ll)). The second issue is that observations should only contribute to the likelihood for the category they're actually in, you don't make that distinction. Finally, optim minimizes, so you shouldn't return the value you want to maximize.
To test the implementation, let's obtain some example data first (from here):
## Example data
dat <- foreign::read.dta("https://stats.idre.ucla.edu/stat/data/ologit.dta")
Y <- as.integer(cut(dat$gpa, 4)) ## Create 4 quartiles
X <- dat$public ## Dummy predictor, could be slope too
## This is the result we'll be reproducing
MASS::polr(as.factor(Y) ~ X)
#> Coefficients:
#> X
#> 1.178314
#>
#> Intercepts:
#> 1|2 2|3 3|4
#> -2.43862931 -0.06816316 2.05272249
Now, here's an implementation that will get you roughly the same parameter estimates:
## Logit link inversion
expit <- function(x) 1/(1+exp(-x))
## -log-likelihood function
ll <- function(params, X, Y) {
alpha <- params[-1]
beta <- params[1]
ll <- unlist(mapply(\(x, y) {
eta <- alpha - x*beta
mu <- expit(eta)
## This line calculates expit(delta) - expit(delta_-1) in probability scale
probs <- pmin(pmax(diff(c(0, mu, 1)), .Machine$double.eps), 1)
## Only use the probability of the observed category!
log(probs[y])
}, x=X, y=Y))
## Return inverse so value is maximized
-sum(ll)
}
guess <- c(1, -2, 0, 2)
optim(guess, ll, X=X, Y=Y)
#> $par
#> [1] 1.17991554 -2.43853589 -0.06775819 2.05298068
Some notes:
- I calculated the likelihood in the probability scale and then logged, not directly using the formula you were given (but simple algebra will show they are the same). Rather than setting log-odds to (negative) infinity I set the probabilities to 0/1.
- You need to provide decent starting values for this to work. In particular the intercepts should already be distinct & in the correct order (always increasing or decreasing depending on how you want to order your categories), starting with all of them equal will cause problems.
- This relies on
Ybeing numeric + sequential starting from 1 and will only handle a single numericXpredictor, a proper fitting routine should allow much more flexibility (e.g. converting the levels ofYto this ordered sequence internally).
Possis defined/created. What do you mean by "only"?