Expectation Maximization using a Poisson likelihood function

Question 1

I am trying to apply the expectation-maximization algorithm to estimate missing count data but all the packages in R, such as missMethods, assume a multivariate Gaussian distribution. How would I apply the expectation-maximization algorithm to estimate missing count data assuming a Poisson distribution?

Say we have data that look like this:

x <- c(100, 96, 79, 109, 111, NA, 93, 95, 119, 90, 121, 96, NA, 
 NA, 85, 95, 110, 97, 87, 104, 101, 87, 87, NA, 89, NA, 
 113, NA, 95, NA, 119, 115, NA, 105, NA, 80, 90, 108, 90, 
 99, 111, 93, 99, NA, 87, 89, 87, 126, 101, 106)

Applying impute_EM using missMethods (missMethods::impute_EM(x, stochastic = FALSE)) gives an answer but the data are not continuous but discrete.

I understand that questions like these require a minimum, reproducible example, but I honestly do not know where to start. Even suggested reading to point me in the right direction would be helpful.

Question 2

Wouldn't you expect that the result of an imputation for count data would be "discrete"? After all you would in many cases be submitting that output to functions which will at the very least be giving you warnings if you have non-integer inputs and at worst simply erroring out.

Question 3

I apologize but I am a little confused with your comment. Imputation based on the mean or some other statistic is not doing the same thing as expectation maximization. I wanted to use a more rigorous method to estimate missing values before submitting it to another analysis.

Question 4

Defining x0:

x0 <- x[!is.na(x)]

The Jeffreys/reference prior for a Poisson distribution with mean lambda is 1/sqrt(lambda). From the observed values, this results in lambda having a gamma reference posterior with a shape parameter sum(x0) + 0.5 and a rate parameter 1/length(x0). You could take n samples of lambda with:

lambda <- rgamma(n, sum(x0) + 0.5, length(x0))

Then sample n missing values (xm) with

xm <- rpois(n, lambda)

Alternatively, since a Gamma-Poisson compound distribution can be formulated as a negative binomial (after integrating out lambda):

xm <- rnbinom(n, sum(x0) + 0.5, length(x0)/(length(x0) + 1L))

As a function:

MI_poisson <- function(x, n) {
 x0 <- x[!is.na(x)]
 rbind(matrix(x0, ncol = n, nrow = length(x0)),
 matrix(rnbinom(n*(length(x) - length(x0)), sum(x0) + 0.5, length(x0)/(length(x0) + 1L)), ncol = n))
}

This will return a matrix with n columns where each column contains the original vector x with all NA values imputed. Each column could be used separately in further analysis, then the results can be aggregated.

Question 5

You do realize this is not EM imputation yet the question clearly talks of EM imputation.

Question 6

@Onyambu wouldn't EM result in the mean of the non-NA values since it's a single sample space and the ML is the mean?

Question 7

The issue is that this method is not capturing the inherent stochasticity present in the missing data. This is why I wanted to use EM. Is it that difficult to adapt the EM algorithm to an exponential distribution that is not normal?

Question 8

Are you looking to get a single value to replace all the NA values (single imputation), or do you want each NA to be replaced independently with a random value (multiple imputation)?

Question 9

My understanding is that EM is used for single imputation. See the modified answer for a multiple imputation solution.

jblood94 17.6k2 gold badges12 silver badges26 bronze badges · Accepted Answer · 2022-02-09 17:28:03Z

Defining x0:

x0 <- x[!is.na(x)]

The Jeffreys/reference prior for a Poisson distribution with mean lambda is 1/sqrt(lambda). From the observed values, this results in lambda having a gamma reference posterior with a shape parameter sum(x0) + 0.5 and a rate parameter 1/length(x0). You could take n samples of lambda with:

lambda <- rgamma(n, sum(x0) + 0.5, length(x0))

Then sample n missing values (xm) with

xm <- rpois(n, lambda)

Alternatively, since a Gamma-Poisson compound distribution can be formulated as a negative binomial (after integrating out lambda):

xm <- rnbinom(n, sum(x0) + 0.5, length(x0)/(length(x0) + 1L))

As a function:

MI_poisson <- function(x, n) {
 x0 <- x[!is.na(x)]
 rbind(matrix(x0, ncol = n, nrow = length(x0)),
 matrix(rnbinom(n*(length(x) - length(x0)), sum(x0) + 0.5, length(x0)/(length(x0) + 1L)), ncol = n))
}

This will return a matrix with n columns where each column contains the original vector x with all NA values imputed. Each column could be used separately in further analysis, then the results can be aggregated.

You do realize this is not EM imputation yet the question clearly talks of EM imputation.
@Onyambu wouldn't EM result in the mean of the non-NA values since it's a single sample space and the ML is the mean?
The issue is that this method is not capturing the inherent stochasticity present in the missing data. This is why I wanted to use EM. Is it that difficult to adapt the EM algorithm to an exponential distribution that is not normal?
Are you looking to get a single value to replace all the NA values (single imputation), or do you want each NA to be replaced independently with a random value (multiple imputation)?
My understanding is that EM is used for single imputation. See the modified answer for a multiple imputation solution.

CollectivesTM on Stack Overflow

Expectation Maximization using a Poisson likelihood function

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related