Bayesian model comparison
Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]
Overview
A common problem in statistical inference is to use data to decide between two or more competing models. Frequentist statistics uses hypothesis tests for this purpose. There are several Bayesian approaches. One approach is through Bayes factors.
The posterior probability of a model given data, Pr(H|D), is given by Bayes' theorem:
- <math>Pr(H|D) = \frac{Pr(D|H)Pr(H)}{Pr(D)}</math>
The key data-dependent term Pr(D|H) is a likelihood, and is sometimes called the evidence for model H; evaluating it correctly is the key to Bayesian model comparison.
The evidence is usually the normalizing constant or partition function of another inference, namely the inference of the parameters of model H given the data D.
The plausibility of two different models H1 and H2, parametrised by model parameter vectors <math> \theta_1 </math> and <math> \theta_2 </math> is assessed by the Bayes factor given by
- <math> \frac{\Pr(D|H_2)}{\Pr(D|H_1)}
= \frac{\int \Pr(\theta_2|H_2)\Pr(D|\theta_2,H_2)\,d\theta_2} {\int \Pr(\theta_1|H_1)\Pr(D|\theta_1,H_1)\,d\theta_1 }. </math>
Thus the Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. Alternatively, the Maximum likelihood estimate could be used for each of the parameters.
An advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure. It thus guards against overfitting.
Another approach is to treat model comparison as a decision problem, computing the expected value or cost of each model choice.
Another approach is to use Minimum Message Length (MML).
See also
- Akaike information criterion
- Schwarz's Bayesian information criterion
- Conditional predictive ordinate
- Deviance information criterion
- Wallace's Minimum Message Length (MML)
- Model selection
References
- Gelman, A., Carlin, J.,Stern, H. and Rubin, D. Bayesian Data Analysis. Chapman and Hall/CRC.(1995)
- Bernardo, J., and Smith, A.F.M., Bayesian Theory. John Wiley. (1994)
- Lee, P.M. Bayesian Statistics. Arnold.(1989).
- Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M., Bayesian Methods for Nonlinear Classification and Regression. John Wiley. (2002).
- Richard O. Duda, Peter E. Hart, David G. Stork (2000) Pattern classification (2nd edition), Section 9.6.5, p. 487-489, Wiley, ISBN 0-471-05669-3
- Chapter 24 in Probability Theory - The logic of science by E. T. Jaynes, 1994.
- David J.C. MacKay (2003) Information theory, inference and learning algorithms, CUP, ISBN 0-521-64298-1, (also available online)
External links
- The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay, discusses Bayesian model comparison in Chapter 28, p343.