## Emerging thoughts on the Poisson as a limit of the Binomial

I am trying to set down some thoughts on the relationship between the Binomial and the Poisson. The specific problem emerges from modelling data where I assume $Y_{i1} \sim Poisson(\lambda_i)$ and $Y_{i2} | n_i \sim Binomial(n_i, p_i)$. The specific problem is that I assume that $Y_{i1}$ and $Y_{i2}$ are correlated in some way, and have essentially been modelling this by assuming that $\mbox{logit} (p_i) = \eta_{i1} + u_{i1}$ and $\log(\lambda)=\eta_{i2}+u_{i2}$ and that the correlation can be induced by assuming $U_{i1},U_{i2}$ are bivariate Normal random variables. However the models don’t work the way I think they should.

So the first thing to look at is the variance of estimators of $p$ and $\lambda$ for simple univariate samples. For the Binomial, $E[X] = np$ and $Var(X)=np(1-p)$, so if we have $\hat{p}=\frac{x}{n}$then $Var(\hat{p})=Var(\frac{X}{n}) = \frac{p(1-p)}{n}$. For the Poisson, we have $E[X]=\lambda$$Var(X)=\lambda$ and given that $\hat{\lambda}=\frac{\sum_{i=1}^n x_i}{n}$ we have $Var(\hat{\lambda})=\frac{\lambda}{n}$.

The interesting thing about this is what happens when you think about taking limits. One explanation for the Poisson suggests that if you let $\lambda = np$ in the Binomial and take $n \to \infty$ holding $\lambda$ constant. If you do this you find that:

• Poisson: $Var(X) \lambda$
• Binomial: $Var(X) = np(1-p) = \lambda (1-p)$

However, as $n \to \infty$, to hold $\lambda$ constant then clearly $p \to 0$ in some way and so in the limit $Var(X)$ is identical. Given that I’m usually rather dismissive of asymptotics, it’s interesting to see just how quickly these two converge.

But look what happens if you consider the variance of the estimators.

• Direct $Var(\hat{\lambda}) = \frac{\hat{\lambda}}{n}$
• Derived from the Binomial $Var(\hat{\lambda}) = \hat{p} (1-\hat{p})$ (note that $p=\frac{\lambda}{n}$)

Well, I suppose one thing to say is that the limiting behaviour only applies where we have $np$ held constant where $p \to 0$ so perhaps it’s no surprise that there is a large discrepancy where $p \geq 0.5$.

But the speculation concerns the way the inferred Binomial has lower variance than the equivalent Poisson. As there is no such thing in real life as Poisson or Binomial random variables, this looks to me as if assuming one or the other has a bearing on the assumed precision of their estimators. In my modelling situation, assuming a Binomial will have a stronger influence on the model fit than assuming a Poisson.