(Here is an attempt to explain why Bayesian methods are the inference method of choice)
O’Hagan (Kendall’s Library) suggests:
Some data are observed, and we wish to make a statement (inference) about one or more unknown features of the system which gave rise to the data. This statement is the posterior distribution.
O’Hagan gives the example of an opinion poll:
- There is a population
- There is a mechanism for selecting a sample from the population
- There is a mechanism for determining whether each individual will state yes/no
Our inference will use the responses in the sample to make a statement about the population (the proportion of people who will say yes). Robert suggests that we are interest in interpretation not explanation (that might need a caveat).
So far, so clear. But what do Bayesian methods add:
- Conditioning on the data (Robert): statistical inference is an inversion process – we are trying to “derive effects from their causes taking into account both a probabilistic model and totally random factors”
- Prior information (O’H): what we know about the parameter “before” seeing the data. The counter-argument to people worrying about this prior is that in Frequentist statistics, the estimator of the Binomial parameter is , regardless of what else we know about the problem at hand. In other words, although Bayesian methods avoid some of the ad-hockery one has with Frequentist statistics, it acknowledges that every problem is different. This has the makings of a nice argument for using priors?
- Subjective probability (O’H): (actually I quite like de Finetti’s statement that there is no such thing as probability, it saves a lot of worrying about different kinds of probability) anyway this amounts to saying that we have expressions of our degrees of belief in various values of parameters. As the investigator does not know it should be treated as a random variable.
- Self-consistency (O’H):: Bayesian methods rely entirely on probability operations, so Posterior = Prior Likelihood is entirely coherent. Robert emphasis this point – we reduce what is unknown to something that is random (a probabilization of uncertainty). By taking unknown parameters to be random variables (which requires that they have prior distributions) we can combine these with the likelihood to form the posterior by simple use of Bayes Theorem. There is only one answer (no ad-hockery in the methods, so the posterior distribution is the answer – albeit one requiring different summaries). I think there is also an argument that the likelihood makes more sense as a Bayesian concept (certainly Robert makes this point). The likelihood is a function of for fixed (which obtains individual values from the pdf of x for various values of theta). Frequentist methods take the likelihood to be a consequence of the investigators model of the process generating the data (a frequency probability, based on random sampling from an infinite population). Good suggests that this is tautological. Robert notes that it is not a density. Conversely, in Bayesian statistics, when one regards probability as subjective, it is a degree of belief of data taking certain values given hypothetical information that the parameters take certain values – i.e. not a hypothesis about data generation but a statement made by the investigator about the problem at hand. Robert also states that we implement the likelihood principle (as derived from Sufficiency and Conditionality) when using Bayesian methods.
- No ad hockery (O’H):: Unlike Frequentist statistics there isn’t the need to invent new estimators, new methods for constructing confidence intervals, new hypothesis tests. We have the posterior distribution and can manipulate that as we see fit.
Strengths of Bayesian Approach (according to O’Hagan, needs expanding) versus weaknesses of Frequentist approaches
- Fundamentally sound / Philosophically flawed
- Very flexible / restrictive range of inference
- Clear and direct inference / very indirect meanings (probability of obtaining these data if the null were true)
- Uses all available information / ignores prior information that may be available
Hence we have a fundamentally superior approach, which can use more information. It can however be more difficult to interpret.