Emerging thoughts on the Poisson as a limit of the Binomial

I am trying to set down some thoughts on the relationship between the Binomial and the Poisson. The specific problem emerges from modelling data where I assume Y_{i1} \sim Poisson(\lambda_i) and Y_{i2} | n_i \sim Binomial(n_i, p_i). The specific problem is that I assume that Y_{i1} and Y_{i2} are correlated in some way, and have essentially been modelling this by assuming that \mbox{logit} (p_i) = \eta_{i1} + u_{i1} and \log(\lambda)=\eta_{i2}+u_{i2} and that the correlation can be induced by assuming U_{i1},U_{i2} are bivariate Normal random variables. However the models don’t work the way I think they should.

So the first thing to look at is the variance of estimators of p and \lambda for simple univariate samples. For the Binomial, E[X] = np and Var(X)=np(1-p) , so if we have \hat{p}=\frac{x}{n} then Var(\hat{p})=Var(\frac{X}{n}) = \frac{p(1-p)}{n} . For the Poisson, we have E[X]=\lambda Var(X)=\lambda and given that \hat{\lambda}=\frac{\sum_{i=1}^n x_i}{n} we have Var(\hat{\lambda})=\frac{\lambda}{n} .

The interesting thing about this is what happens when you think about taking limits. One explanation for the Poisson suggests that if you let \lambda = np in the Binomial and take n \to \infty holding \lambda constant. If you do this you find that:

  • Poisson: Var(X) \lambda
  • Binomial: Var(X) = np(1-p) = \lambda (1-p)

However, as n \to \infty , to hold \lambda constant then clearly p \to 0 in some way and so in the limit Var(X) is identical. Given that I’m usually rather dismissive of asymptotics, it’s interesting to see just how quickly these two converge.

Behaviour of estimate of lambda

But look what happens if you consider the variance of the estimators.

  • Direct Var(\hat{\lambda}) = \frac{\hat{\lambda}}{n}
  • Derived from the Binomial Var(\hat{\lambda}) = \hat{p} (1-\hat{p}) (note that p=\frac{\lambda}{n} )

Variance of lambda

Well, I suppose one thing to say is that the limiting behaviour only applies where we have np held constant where p \to 0 so perhaps it’s no surprise that there is a large discrepancy where p \geq 0.5 .

But the speculation concerns the way the inferred Binomial has lower variance than the equivalent Poisson. As there is no such thing in real life as Poisson or Binomial random variables, this looks to me as if assuming one or the other has a bearing on the assumed precision of their estimators. In my modelling situation, assuming a Binomial will have a stronger influence on the model fit than assuming a Poisson.

This entry was posted in Theory I should know about and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s