×

Bayes model averaging with selection of regressors. (English) Zbl 1073.62004

Summary: When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs.
The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.

MSC:

62C10 Bayesian problems; characterization of Bayes procedures
65C40 Numerical analysis or methods applied to Markov chains
62J05 Linear regression; mixed models
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Barbieri, Optimal predictive model selection, Technical Report 02-02 (2002)
[2] Berger, Discussion of the paper by Mitchell and Beauchamp, J. Am. Statist. Ass. 83 pp 1033– (1988)
[3] Breiman, Heuristics of instability and stabilisation in model selection, Ann. Statist. 24 pp 2350– (1996) · Zbl 0867.62055
[4] Breiman, Predicting multivariate responses in multiple linear regression (with discussion), J. R. Statist. Soc. 59 pp 3– (1997) · Zbl 0897.62068
[5] Brown, Wavelength selection in multicomponent near-infrared calibration, J. Chemometr. 6 pp 151– (1992)
[6] â(1993) Measurement, Regression, and Calibration. Oxford: Clarendon.
[7] Brown, The choice of variables in multivariate regression: a non-conjugate Bayesian decision theory approach, Biometrika 86 pp 635– (1999) · Zbl 1072.62510
[8] â(2001) Bayesian wavelet regressions on curves with application to a spectroscopic calibration problem.J. Am. Statist. Ass., 96, 398â408. · Zbl 1022.62027
[9] Brown, Regression, sequenced measurements and coherent calibration, Bayesian Statistics 4 pp 97– (1992)
[10] Brown, Chemometrics and spectral frequency selection, Phil. Trans. R. Soc. Lond. 337 pp 311– (1991) · Zbl 0747.62120
[11] Brown, Multivariate Bayesian variable selection and prediction, J. R. Statist. Soc. 60 pp 627– (1998) · Zbl 0909.62022
[12] â(1998b) Bayesian wavelength selection in multicomponent analysis. J. Chemometr., 12, 173â182.
[13] Brown, Adaptive multivariate ridge regression, Ann. Statist. 8 pp 64– (1980) · Zbl 0425.62053
[14] Clyde, Prediction via orthogonalised model mixing, J. Am. Statist. Ass. 91 pp 1197– (1996)
[15] Clyde, Flexible empirical Bayes estimation for wavelets, J. R. Statist. Soc. 62 pp 681– (2000) · Zbl 0957.62006
[16] Copas, Regression, prediction and shrinkage (with discussion), J. R. Statist. Soc. 45 pp 311– (1983) · Zbl 0532.62048
[17] Dawid, Some matrix-variate distribution theory: notational considerations and a Bayesian application, Biometrika 68 pp 265– (1981) · Zbl 0464.62039
[18] â (1988) The infinite regress and its conjugate analysis. In Bayesian Statistics 3 (eds J. M.Bernardo, M. H.DeGroot, D. V.Lindley and A. F. M.Smith), pp. 95â110. Oxford: Clarendon.
[19] Dempster, Alternatives to least squares in multiple regression, Multivariate Statistical Inference pp 25– (1973) · Zbl 0298.62014
[20] Draper, Assessment and propagation of model uncertainty, J. R. Statist. Soc. 57 pp 45– (1995) · Zbl 0812.62001
[21] FernÃ!ndez, Benchmark priors for Bayesian model averaging, J. Econometr. 100 pp 381– (2001) · Zbl 1091.62507
[22] Gelman, Inference from iterative simulation using multiple sequences (with discussion), Statist. Sci. 7 pp 457– (1992) · Zbl 1386.65060
[23] George, Approaches for Bayesian variable selection, Statist. Sin. 7 pp 339– (1997) · Zbl 0884.62031
[24] Hoerl, Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12 pp 55– (1970) · Zbl 0202.17205
[25] â(1970b) Ridge regression: applications to nonorthogonal problems. Technometrics, 12, 69â82. · Zbl 0202.17206
[26] Hoeting, Bayesian model averaging: a tutorial, Statist. Sci. 14 pp 382– (1999) · Zbl 1059.62525
[27] Karlsson, Determination of nitrate in municipal waste water by UV spectroscopy, Anal. Chem. Acta 312 pp 107– (1995)
[28] Lanza, Application of near infrared spectroscopy for predicting the sugar content of fruit juices, J. Food Sci. 49 pp 995– (1984)
[29] Leamer, A Bayesian interpretation of pretesting, J. R. Statist. Soc. 38 pp 85– (1976) · Zbl 0321.62039
[30] Lindley, The choice of variables in multiple regression (with discussion), J. R. Statist. Soc. 30 pp 31– (1968) · Zbl 0155.26702
[31] Madigan, Model selection and accounting for model uncertainty in graphical models using Occam’s window, J. Am. Statist. Ass. 89 pp 1535– (1994) · Zbl 0814.62030
[32] Madigan, Bayesian graphical models for discrete data, Int. Statist. Rev. 63 pp 215– (1995) · Zbl 0834.62003
[33] Mitchell, Bayesian variable selection in linear regression, J. Am. Statist. Ass. 83 pp 1023– (1988) · Zbl 0673.62051
[34] Osborne, Practical NIR Spectroscopy (1993)
[35] Raftery, Bayesian model averaging for linear regression models, J. Am. Statist. Ass. 92 pp 179– (1997) · Zbl 0888.62026
[36] San Martini, A predictive model selection criterion, J. R. Statist. Soc. 46 pp 296– (1984) · Zbl 0566.62004
[37] Seber, Multivariate Observations (1984)
[38] Sha, Bayesian variable selection in multinomial models with application to spectral data and DNA microarrays, Technical Report UKC/IMS/02/05 (2002)
[39] Smith, A general Bayesian linear model, J. R. Statist. Soc. 35 pp 67– (1973) · Zbl 0256.62055
[40] Sundberg, Multivariate calibrationâdirect and indirect regression methodology, Scand. J. Statist. 26 pp 161– (1999) · Zbl 0935.62077
[41] Vannucci, Predictor selection for model averaging, Bayesian Methods with Applications to Science, Policy and Official Statistics pp 553– (2001)
[42] West, Bayesian regression analysis in the “large p, small n” paradigm with application in DNA microarray studies, Technical Report (2000)
[43] Zellner, On assessing prior distributions and Bayesian regression analysis with g-prior distributions, Bayesian Inference and Decision TechniquesâEssays in Honour of Bruno de Finetti pp 233– (1986)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.