×

High-dimensional Gaussian model selection on a Gaussian design. (English) Zbl 1191.62076

Summary: We consider the problem of estimating the conditional mean of a real Gaussian variable \(Y=\sum_{i=1}^p\theta_iX_i+\varepsilon \) where the vector of the covariates \((X_i)_{1\leq i\leq p}\) follows a joint Gaussian distribution. This issue often occurs when one aims at estimating the graph or the distribution of a Gaussian graphical model. We introduce a general model selection procedure which is based on the minimization of a penalized least squares type criterion. It handles a variety of problems such as ordered and complete variable selection, allows to incorporate some prior knowledge on the model and applies when the number of covariates \(p\) is larger than the number of observations \(n\). Moreover, it is shown to achieve a non-asymptotic oracle inequality independently of the correlation structure of the covariates. We also exhibit various minimax rates of estimation in the considered framework and hence derive adaptivity properties of our procedure.

MSC:

62G08 Nonparametric regression and quantile regression
62J05 Linear regression; mixed models
60E15 Inequalities; stochastic orderings

Software:

pcalg; GMRFLib
PDFBibTeX XMLCite
Full Text: DOI arXiv EuDML

References:

[1] H. Akaike. Statistical predictor identification. Ann. Inst. Statist. Math. 22 (1970) 203-217. · Zbl 0259.62076 · doi:10.1007/BF02506337
[2] H. Akaike. A new look at the statistical model identification. IEEE Trans. Automat. Control 19 (1974) 716-723. · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[3] S. Arlot. Model selection by resampling penalization. Electron. J. Stat. 3 (2009) 557-624. · Zbl 1326.62097 · doi:10.1214/08-EJS196
[4] Y. Baraud, C. Giraud and S. Huet. Gaussian model selection with an unknown variance. Ann. Statist. 37 (2009) 630-672. · Zbl 1162.62051 · doi:10.1214/07-AOS573
[5] P. Bickel, Y. Ritov and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 (2009) 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[6] L. Birgé. A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory 51 (2005) 1611-1615. · Zbl 1283.62030 · doi:10.1109/TIT.2005.844101
[7] L. Birgé and P. Massart. Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 (1998) 329-375. · Zbl 0954.62033 · doi:10.2307/3318720
[8] L. Birgé and P. Massart. Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 (2001) 203-268. · Zbl 1037.62001 · doi:10.1007/s100970100031
[9] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 (2007) 33-73. · Zbl 1112.62082 · doi:10.1007/s00440-006-0011-8
[10] F. Bunea, A. Tsybakov and M. Wegkamp. Aggregation for Gaussian regression. Ann. Statist. 35 (2007) 1674-1697. · Zbl 1209.62065 · doi:10.1214/009053606000001587
[11] F. Bunea, A. Tsybakov and M. Wegkamp. Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 (2007) 169-194 (electronic). · Zbl 1146.62028 · doi:10.1214/07-EJS008
[12] E. J. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Inform. Theory 51 (2005) 4203-4215. · Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[13] E. Candes and T. Tao. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 (2007) 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[14] E. Candès and Y. Plan. Near-ideal model selection by l 1 minimization. Ann. Statist. To appear, 2009. · Zbl 1173.62053 · doi:10.1214/08-AOS653
[15] R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Statistics for Engineering and Information Science . Springer, New York, 1999. · Zbl 0937.68121
[16] N. A. C. Cressie. Statistics for Spatial Data. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics . Wiley, New York, 1993. (Revised reprint of the 1991 edition, Wiley.) · Zbl 0799.62002
[17] K. R. Davidson and S. J. Szarek. Local operator theory, random matrices and Banach spaces. In Handbook of the Geometry of Banach Spaces, Vol. I 317-366. North-Holland, Amsterdam, 2001. · Zbl 1067.46008 · doi:10.1016/S1874-5849(01)80010-3
[18] C. Giraud. Estimation of Gaussian graphs by model selection. Electron. J. Stat. 2 (2008) 542-563. · Zbl 1320.62094 · doi:10.1214/08-EJS228
[19] T. Gneiting. Power-law correlations, related models for long-range dependence and their simulation. J. Appl. Probab. 37 (2000) 1104-1109. · Zbl 0972.62079 · doi:10.1239/jap/1014843088
[20] M. Kalisch and P. Bühlmann. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 (2007) 613-636. · Zbl 1222.68229
[21] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 (2000) 1302-1338. · Zbl 1105.62328 · doi:10.1214/aos/1015957395
[22] S. L. Lauritzen. Graphical Models. Oxford Statistical Science Series 17 . The Clarendon Press, Oxford University Press, New York, 1996.
[23] C. L. Mallows. Some comments on C p . Technometrics 15 (1973) 661-675. · Zbl 0269.62061 · doi:10.2307/1267380
[24] P. Massart. Concentration Inequalities and Model Selection. Lecture Notes in Mathematics 1896 . Springer, Berlin, 2007. (Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003, with a foreword by Jean Picard.) · Zbl 1170.60006 · doi:10.1007/978-3-540-48503-2
[25] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 (2006) 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[26] V. H. de la Peña and E. Giné. Decoupling. Probability and Its Applications . Springer, New York, 1999. (From dependence to independence, randomly stopped processes. U -statistics and processes. Martingales and beyond.)
[27] D. von Rosen. Moments for the inverted Wishart distribution. Scand. J. Statist. 15 (1988) 97-109. · Zbl 0663.62063
[28] H. Rue and L. Held. Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104 . Chapman & Hall/CRC, London, 2005. · Zbl 1093.60003
[29] K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science 308 (2005) 523-529.
[30] J. Schäfer and K. Strimmer. An empirical Bayes approach to inferring large-scale gene association network. Bioinformatics 21 (2005) 754-764.
[31] G. Schwarz. Estimating the dimension of a model. Ann. Statist. 6 (1978) 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[32] R. Shibata. An optimal selection of regression variables. Biometrika 68 (1981) 45-54. JSTOR: · Zbl 0464.62054 · doi:10.1093/biomet/68.1.45
[33] C. Stone. An asymptotically optimal histogram selection rule. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983) 513-520. Wadsworth Statist./Probab. Ser. Wadsworth, Belmont, CA, 1985. · Zbl 1373.62213
[34] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288. JSTOR: · Zbl 0850.62538
[35] A. Tsybakov. Optimal rates of aggregation. In 16th Annual Conference on Learning Theory 2777 303-313. Springer, Heidelberg, 2003. · Zbl 1208.62073 · doi:10.1007/b12006
[36] N. Verzelen and F. Villers. Goodness-of-fit tests for high-dimensional Gaussian linear models. Ann. Statist. To appear, 2009. · Zbl 1453.62229
[37] M. J. Wainwright. Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. Technical Report 725, Department of Statistics, UC Berkeley, 2007. · Zbl 1367.94106
[38] A. Wille, P. Zimmermann, E. Vranova, A. Fürholz, O. Laule, S. Bleuler, L. Hennig, A. Prelic, P. von Rohr, L. Thiele, E. Zitzler, W. Gruissem and P. Bühlmann. Sparse graphical Gaussian modelling of the isoprenoid gene network in arabidopsis thaliana. Genome Biology 5 (2004), no. R92.
[39] P. Zhao and B. Yu. On model selection consistency of Lasso. J. Mach. Learn. Res. 7 (2006) 2541-2563. · Zbl 1222.62008
[40] H. Zou. The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 (2006) 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.