1
\$\begingroup\$

I am calculating the log-likelihood of data that have some observations censored. For uncensored "a" clusters, the likelihood calculates P(Y=y). For censored "b" clusters, it calculates that Y is at least size Y, P(Y≥y). So the likelihood is L=mult[P(Y=y)P(Y≥y)].

The below code is my working, actual-in-use code that calculates this log likelihood based on censorship status. However, when in place with several other functions, it is extremely slow since it has to iterate through each row in the dataset (datasets are usually several thousand rows long, and maximization runs through several thousands of iterations of R and k values).

My hope is that someone will know of an alternative way to approach this that will increase performance. I am (obviously) not an expert coder.

new_likelihood <- function(Y,R,k) {
 p_function <- function(y,n){ #Create a dummy function to use in for loop
 exp(log(n)-log(y)+lgamma(k*y+y-n)-(lgamma(k*y)+lgamma(y-n+1))+(y-n)*log(R/k)-(k*y+y-n)*log(1+R/k))
 }
 liks_a <- numeric(nrow(Y)) # initialize vector of logliks for `a` clusters
 liks_b <- numeric(nrow(Y)) # initialize vector of logliks for `b` clusters
 #Loop through each row of the data
 for(i in 1:nrow(Y)){
 y <- Y[i,1]
 n <- Y[i,2]
 c <- Y[i,3]
 if (c==0){ #For uncensored, apply regular PDF
 liks_a[i] <- log(p_function(y,n))
 } #outputs a value of 0 in liks_a[i] if c=1, so doesnt affect sum(log(liks))
 if(c==1){ #for censored, sum to do 1-(sum(Y-1))
 liks_b[i] <- log(1-sum(p_function(1:(y-1),n)))
 } #outputs a value of 0 if in liks_b[i] if c=0, so doesnt affect sum(log(liks)) 
 }
 sumliks <- sum(liks_a,liks_b)
 return(sumliks)
}
# Example data
Y_censor0 <- cbind(c(3,2,2,10,1,1),c(1,1,1,1,1,1),c(0,0,0,0,0,0)) # 3-column data w/ no one censored
Y_censor1 <- cbind(c(3,2,2,10,1,1),c(1,1,1,1,1,1),c(0,0,1,1,0,0)) # 3-column data with some censored
# Test
new_likelihood(Y_censor0,0.9,0.25) 
new_likelihood(Y_censor1,0.9,0.25) 
```
asked May 9, 2020 at 17:17
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Your function p_function is vectorized and that is nice. There's a little difficulty for the sum though.

Below is the improved code that takes advantage of the vectorization. I also replaced log(1+x) with log1p(x), which is better for a small |x|. And I replaced the stuff with the lgamma functions with some stuff with the lbeta function.

new_likelihood2 <- function(Y, R, k) {
 p_function <- function(y, n){ 
 exp(log(n) - log(y) - lbeta(k*y, y-n+1) - log(k*y+y-n) + (y-n)*log(R/k) - (k*y+y-n)*log1p(R/k))
 }
 
 p_function_sum <- function(y, n){
 vapply(
 seq_along(y), function(i) sum(p_function(1:(y[i]-1), n[i])), numeric(1)
 )
 }
 
 uncensored <- Y[, 3] == 0
 Yuncensored <- Y[uncensored, 1:2]
 Ycensored <- Y[!uncensored, 1:2]
 liks_a <- log(p_function(Yuncensored[, 1], Yuncensored[, 2]))
 liks_b <- log1p(-p_function_sum(Ycensored[, 1], Ycensored[, 2]))
 
 sum(liks_a, liks_b)
}
answered Nov 25, 2020 at 12:43
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.