Log Likelihood Calculations for Censored Data in R

Question 1

I am calculating the log-likelihood of data that have some observations censored. For uncensored "a" clusters, the likelihood calculates P(Y=y). For censored "b" clusters, it calculates that Y is at least size Y, P(Y≥y). So the likelihood is L=mult[P(Y=y)P(Y≥y)].

The below code is my working, actual-in-use code that calculates this log likelihood based on censorship status. However, when in place with several other functions, it is extremely slow since it has to iterate through each row in the dataset (datasets are usually several thousand rows long, and maximization runs through several thousands of iterations of R and k values).

My hope is that someone will know of an alternative way to approach this that will increase performance. I am (obviously) not an expert coder.

new_likelihood <- function(Y,R,k) {
 p_function <- function(y,n){ #Create a dummy function to use in for loop
 exp(log(n)-log(y)+lgamma(k*y+y-n)-(lgamma(k*y)+lgamma(y-n+1))+(y-n)*log(R/k)-(k*y+y-n)*log(1+R/k))
 }
 liks_a <- numeric(nrow(Y)) # initialize vector of logliks for `a` clusters
 liks_b <- numeric(nrow(Y)) # initialize vector of logliks for `b` clusters
 #Loop through each row of the data
 for(i in 1:nrow(Y)){
 y <- Y[i,1]
 n <- Y[i,2]
 c <- Y[i,3]
 if (c==0){ #For uncensored, apply regular PDF
 liks_a[i] <- log(p_function(y,n))
 } #outputs a value of 0 in liks_a[i] if c=1, so doesnt affect sum(log(liks))
 if(c==1){ #for censored, sum to do 1-(sum(Y-1))
 liks_b[i] <- log(1-sum(p_function(1:(y-1),n)))
 } #outputs a value of 0 if in liks_b[i] if c=0, so doesnt affect sum(log(liks)) 
 }
 sumliks <- sum(liks_a,liks_b)
 return(sumliks)
}
# Example data
Y_censor0 <- cbind(c(3,2,2,10,1,1),c(1,1,1,1,1,1),c(0,0,0,0,0,0)) # 3-column data w/ no one censored
Y_censor1 <- cbind(c(3,2,2,10,1,1),c(1,1,1,1,1,1),c(0,0,1,1,0,0)) # 3-column data with some censored
# Test
new_likelihood(Y_censor0,0.9,0.25) 
new_likelihood(Y_censor1,0.9,0.25) 
```

Question 2

Your function p_function is vectorized and that is nice. There's a little difficulty for the sum though.

Below is the improved code that takes advantage of the vectorization. I also replaced log(1+x) with log1p(x), which is better for a small |x|. And I replaced the stuff with the lgamma functions with some stuff with the lbeta function.

new_likelihood2 <- function(Y, R, k) {
 p_function <- function(y, n){ 
 exp(log(n) - log(y) - lbeta(k*y, y-n+1) - log(k*y+y-n) + (y-n)*log(R/k) - (k*y+y-n)*log1p(R/k))
 }
 
 p_function_sum <- function(y, n){
 vapply(
 seq_along(y), function(i) sum(p_function(1:(y[i]-1), n[i])), numeric(1)
 )
 }
 
 uncensored <- Y[, 3] == 0
 Yuncensored <- Y[uncensored, 1:2]
 Ycensored <- Y[!uncensored, 1:2]
 liks_a <- log(p_function(Yuncensored[, 1], Yuncensored[, 2]))
 liks_b <- log1p(-p_function_sum(Ycensored[, 1], Ycensored[, 2]))
 
 sum(liks_a, liks_b)
}

Stéphane Laurent Stéphane Laurent 2671 silver badge6 bronze badges · Accepted Answer · 2020-11-25 12:43:06Z

Your function p_function is vectorized and that is nice. There's a little difficulty for the sum though.

Below is the improved code that takes advantage of the vectorization. I also replaced log(1+x) with log1p(x), which is better for a small |x|. And I replaced the stuff with the lgamma functions with some stuff with the lbeta function.

new_likelihood2 <- function(Y, R, k) {
 p_function <- function(y, n){ 
 exp(log(n) - log(y) - lbeta(k*y, y-n+1) - log(k*y+y-n) + (y-n)*log(R/k) - (k*y+y-n)*log1p(R/k))
 }
 
 p_function_sum <- function(y, n){
 vapply(
 seq_along(y), function(i) sum(p_function(1:(y[i]-1), n[i])), numeric(1)
 )
 }
 
 uncensored <- Y[, 3] == 0
 Yuncensored <- Y[uncensored, 1:2]
 Ycensored <- Y[!uncensored, 1:2]
 liks_a <- log(p_function(Yuncensored[, 1], Yuncensored[, 2]))
 liks_b <- log1p(-p_function_sum(Ycensored[, 1], Ycensored[, 2]))
 
 sum(liks_a, liks_b)
}

Stack Exchange Network

Log Likelihood Calculations for Censored Data in R

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Log Likelihood Calculations for Censored Data in R

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions