SAS and R: Poisson distribution

Catalogs of posts

Showing posts with label Poisson distribution. Show all posts

Tuesday, March 15, 2011

Example 8.30: Compare Poisson and negative binomial count models

How similar can a negative binomial distribution get to a Poisson distribution?

When confronted with modeling count data, our first instinct is to use Poisson regression. But in practice, count data is often overdispersed. We can fit the overdispersion in the Poisson (Section 4.1) using quasi-likelihood methods, but a better alternative might be to use a negative binomial regression (section 4.1.5). Nick has a paper exploring these models (and others) in an application.

One concern about this is how well the negative binomial might approximate the Poisson, if in fact a Poisson obtains.

We present here a function and a macro to explore how similar the negative binomial can get to the Poisson, if we keep the means of the distributions equal. But before doing so, it will be helpful to review their definitions:

The Poisson is defined as P(Y=y | l) = [e^(-l)l^y]/y!
and the negative binomial as: P(X=x | n,p) = [(n + x + 1)! / (x!)(n+1)!] p^n (1-p)^x

In the Poisson, the mean is l, while the negative binomial counts the number of failures x before n successes, where the probability of success is p. The mean of X is np/(1-p). There are several characterizations of the negative binomial.

R

In R, the pnbinom() function (section 1.10) can be called either with the parameters n and p given above, or by specifying the mean mu and a dispersion parameter (denoted size), where mu = np/(1-p) as above. It's convenient to parameterize via the mean, to keep the negative binomial mean equal to the Poisson mean.

Our function will accept a series of integers and a mean value as input, and plot the Poisson cumulative probabilities and the negative binomial cumulative probabilities for three values of n. We make use of the type="n" option in the plot() function (section 5.1.1) and add the negative binomial values with the lines() function (section 5.2.1).


poissonvsnb = function(values,mean) {
 probs = ppois(values,mean)
 plot(y=probs, x=values, type="n", ylim=c(0,1))
 lines(y=probs, x=values, col="red")
 readline("Poisson shown. Press Enter to continue...")
 nbprobs1 = pnbinom(values, mu=mean, size=1)
 nbprobs5 = pnbinom(values, mu=mean, size=5)
 nbprobs40 = pnbinom(values, mu=mean, size=40)
 lines(y=nbprobs1, x=values, col="black")
 lines(y=nbprobs5, x=values, col="blue")
 lines(y=nbprobs40, x=values, col="green")
}
poissonvsnb(0:10,1)

The result is shown above. The red line representing the Poisson is completely overplotted by the negative binomial with size=40. This can be seen when running live, due to the readline() statement, which waits for input before continuing.

SAS

In SAS, the cdf function (section 1.10) does not have the flexibility of parameterizing directly via the mean. To add to the confusion, SAS uses another characterization of the negative binomial, which counts the number of successes x before n failures with the effect that the mean is now n(1-p)/p. Thus is we want to hold the mean constant, we need to solve for p and find probabilities from the distribution where p = n/(n + mu).

To make this process a little less cumbersome to type, we'll also demonstrate the use of proc fcmp, which allows you to compile functions that can be used in data steps and some other procedures. In general, it works as you might hope, with a function statement and a return statement. The only hassle is telling SAS where to store the functions and where to find them when they're needed.


proc fcmp outlib=sasuser.funcs.test;
function poismean_nb(mean, size);
 return(size/(mean+size));
 endsub;
run;

options cmplib = sasuser.funcs;
run;

Now we're ready to write a macro to replicate the R function. Note how the new function is nested within the call to the cdf function, with the appropriate size parameter. The overlay option allows plotting several y values on the same x axis; the r option to the symbol statement (section 5.1.19) keeps the symbol in effect for several y values. SAS generates a legend easily; this allows us to see the (mostly overplotted) Poisson. Using readline() to pause the output (as in R) is not available.

As a suggestion about how to write macros in SAS, I left this one a little messy. I first wrote the code to make the plot once, with the number of X values and the mean specified in the code with fixed values. This makes two extra lines of code, but when I converted to a macro, I only needed to change the fixed values to the macro parameters. For elegance, I would omit the first two lines and replace the later occurrences of n and mean with the macro parameters.


%macro nbptest(maxn, mean);
data nbp;
n = &maxn;
mean = &mean;
do i = 0 to n;
 probpois = cdf("POISSON", i, mean);
 probnb1 = CDF("NEGBINOMIAL", i, poismean_nb(mean, 1), 1);
 probnb5 = CDF("NEGBINOMIAL", i, poismean_nb(mean, 5), 5);
 probnb40 = CDF("NEGBINOMIAL", i, poismean_nb(mean, 40), 40);
 output;
 end;
run;

axis1 order = (0 to 1 by .2) minor=none ;
symbol1 v=none i=j r=4;
proc gplot data=nbp;
plot (probpois probnb1 probnb5 probnb40)*i / 
 overlay vaxis=axis1 legend;
run; quit;
%mend;

%nbptest(10,2);

The results are shown below. The negative binomial approaches the Poisson very closely as size increases, holding the mean constant.

Posted by Ken Kleinman

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: count models, lines(), negative binomial distribution, overlay option, plot, plot colors, pnbinom(), Poisson distribution, ppois(), readline(), SAS macro, ylim option 1 comments

Monday, February 28, 2011

Example 8.28: should we buy snowstorm insurance?

[フレーム]

It's been a long winter so far in New England, with many a snow storm. In this entry, we consider a simulation to complement the analytic solution for a probability problem concerning snow.

Consider a company that buys a policy to insure its revenue in the event of major snowstorms that shut down business. The policy pays nothing for the first such snowstorm of the year and 10,000ドル for each one thereafter, until the end of the year. The number of major snowstorms per year that shut down business is assumed to have a Poisson distribution with mean 1.5. What is the expected amount paid to the company under this policy during a one-year period?

Let SNOW be the number of snowstorms, and pay the amount paid out by the insurance. The following chart may be useful in discerning the patttern:


SNOW PAY 10000*(snow-1)
0 0 -10000
1 0 0
2 10000 10000
3 20000 20000

The analytic solution is straightforward, but involves a truncation of the first snowstorm. Since we can assume that the random variable SNOW ~ Poisson(1.5) we know that E[SNOW] = 1.5 and E[10000*(SNOW-1)] = 10000*E[snow] - 10000 =わ 15000 -ひく 10000 =わ 5000.

E[PAY] is equal to E[10000*(SNOW-1]) + 10000*P(SNOW=0) so the exact answer is


10000*P(snow=0) + 15000 - 10000 =
10000*exp(-1.5) + 15000 -ひく 10000 =わ 7231ドル

Here the advantage of simulation is that it may provide a useful check on the results, as well as a ready measure of variability. In this situation, the code is quite simple, but the approach is powerful.

R


numsim = 1000000
snow = rpois(numsim, 1.5)
pay = snow - 1 # subtract one
pay[snow==0] = 0 # deal with the pesky P(snow=0)
sim = mean(pay*10000)
analytic = 10000*(dpois(0, 3/2) + 3/2 - 1)

Yielding the following:


> sim
[1] 7249.55
> analytic
[1] 7231.302

SAS
The simulation and analytic solutions are also straightforward in SAS. Here the analytic result is only calculated once


data snow_insurance;
do i = 1 to 1000000;
 nsnow = ranpoi(0, 1.5);
 payout = max(nsnow -1, 0) * 10000;
 output;
end;
analytic = 10000 * (cdf("POISSON", 0, 1.5) + 1.5 -1);
output;
run;

proc means data=snow_insurance mean;
 var payout analytic;
run;

This results in the following output:


Variable Mean
------------------------
payout 7236.96
analytic 7231.30
------------------------

Posted by Nick Horton

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Poisson distribution, probability distributions, simulation studies, snowstorms, truncated distribution 2 comments

Subscribe to SAS and R!

RSS: Or: Get SAS and R by Email

Search the SAS and R Blog

The book (second edition, 2014)

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

Buy a book

SAS and R: Data Management, Statistical Analysis, and Graphics, Second Edition

Using SAS for Data Management, Statistical Analysis, and Graphics

Using R for Data Management, Statistical Analysis, and Graphics

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

Sidebar list of all entries.

SAS and R

Catalogs of posts

Tuesday, March 15, 2011

Example 8.30: Compare Poisson and negative binomial count models

Monday, February 28, 2011

Example 8.28: should we buy snowstorm insurance?

About SAS and R

Topics discussed