×

The GIC for model selection: A hypothesis testing approach. (English) Zbl 0951.62056

Summary: We consider the model (subset) selection problem for linear regression. Although hypothesis testing and model selection are two different approaches, there are similarities between them. We combine these two approaches together and propose a particular choice of the penalty parameter in the generalized information criterion (GIC), which leads to a model selection procedure that inherits good properties from both approaches, i.e., its overfitting and underfitting probabilities converge to 0 as the sample size \(n \to \infty\) and, when \(n\) is fixed, its overfitting probability is controlled to be approximately under a pre-assigned level of significance.

MSC:

62J05 Linear regression; mixed models
62H15 Hypothesis testing in multivariate analysis
65C60 Computational problems in statistics (MSC2010)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Akaike, H., Statistical predictor identification, Ann. Inst. Statist. Math., 22, 203-217 (1970) · Zbl 0259.62076
[2] Allen, D. M., The relationship between variable selection and data augmentation and a method for prediction, Technometrics, 16, 12-127 (1974) · Zbl 0286.62044
[3] Bickel, P.; Zhang, P., Variable selection in nonparametric regression with categorical covariates, J. Amer. Statist. Assoc., 87, 90-97 (1992) · Zbl 0763.62019
[4] Hannan, E. J.; Quinn, B. G., The determination of the order of an autoregression, J. Roy. Statist. Soc. Ser. B, 41, 190-195 (1979) · Zbl 0408.62076
[5] Mallows, C. L., Some comments on \(C_p\), Technometrics, 15, 661-675 (1973) · Zbl 0269.62061
[6] Nishii, R., Asymptotic properties of criteria for selection of variables in multiple regression, Ann. Statist., 12, 758-765 (1984) · Zbl 0544.62063
[7] Potscher, B. M., Model selection under nonstationary: autoregressive models and stochastic linear regression models, Ann. Statist., 17, 1257-1274 (1989) · Zbl 0683.62049
[8] Rao, C. R.; Wu, Y., A strongly consistent procedure for model selection in a regression problem, Biometrika, 76, 369-374 (1989) · Zbl 0669.62051
[9] Rao, J.S., 1999. Bootstrap choice of cost complexity for better subset selection. Statist. Sinica 9, 273-287.; Rao, J.S., 1999. Bootstrap choice of cost complexity for better subset selection. Statist. Sinica 9, 273-287. · Zbl 0928.62021
[10] Rao, J.S., 1995. An adaptive information criterion for covariate selection in generalized linear models. Technical Report.; Rao, J.S., 1995. An adaptive information criterion for covariate selection in generalized linear models. Technical Report.
[11] Schwartz, G., Estimating the dimension of a model, Ann. Statist., 6, 461-464 (1978) · Zbl 0379.62005
[12] Shao, J., Linear model selection by cross-validation, J. Amer. Statist. Assoc., 88, 486-494 (1993) · Zbl 0773.62051
[13] Shibata, R., Approximate efficiency of a selection procedure for the number of regression variables, Biometrika, 71, 43-49 (1984) · Zbl 0543.62053
[14] Stone, M., Cross-validatory choice and assessment of statistical predictions, J. Roy. Statist. Soc. Ser. B, 36, 111-147 (1974) · Zbl 0308.62063
[15] Zhang, P., Model selection via multifold cross-validation, Ann. Statist., 21, 299-313 (1993) · Zbl 0770.62053
[16] Zheng, X.; Loh, W.-Y., Consistent variable selection in linear models, J. Amer. Statist. Assoc., 90, 151-156 (1995) · Zbl 0818.62060
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.