×

The sparsity and bias of the LASSO selection in high-dimensional linear regression. (English) Zbl 1142.62044

Summary: N. Meinshausen and P. Bühlmann [Ann. Stat. 34, No. 3, 1436–1462 (2006; Zbl 1113.62082)] showed that for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition the LASSO is consistent even when the number of variables is of greater order than the sample size. P. Zhao and B. Yu [J. Machine Learning Res. 7, 2541–2567 (2006)] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate.
In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the \(\ell_{\alpha}\)-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.

MSC:

62J05 Linear regression; mixed models
62H12 Estimation in multivariate analysis
62G08 Nonparametric regression and quantile regression
62J07 Ridge regression; shrinkage estimators (Lasso)

Citations:

Zbl 1113.62082
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statist. Sinica 9 611-677. · Zbl 0949.60077
[2] Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the lasso. Technical report M979, Dept. Statistics, Florida State Univ. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[3] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[4] Davidson, K. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces I (W. B. Johnson and J. Lindenstrauss, eds.) 317-366. North-Holland, Amsterdam. · Zbl 1067.46008 · doi:10.1016/S1874-5849(01)80010-3
[5] Donoho, D. L. (2006). For most large underdetermined systems of equations, the minimal \ell 1 -norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907-934. · Zbl 1105.90068 · doi:10.1002/cpa.20131
[6] Donoho, D.L. and Johnstone, I. (1994). Minimax risk over \ell p -balls for \ell q -error. Probab. Theory Related Fields 99 277-303. · Zbl 0802.62006 · doi:10.1007/BF01199026
[7] Eaton, M. L. (1983). Multivariate Statistics : A Vector Space Approach . Wiley, New York. · Zbl 0587.62097
[8] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[9] Efron, B., Hastie, T. and Tibshirani, R. (2007). Discussion of: “The Dantzig selector: Statistical estimation when p is much larger than n ”. Ann. Statist. 35 2358-2364. · doi:10.1214/009053607000000433
[10] Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947-1975. · Zbl 0829.62066 · doi:10.1214/aos/1176325766
[11] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252-261. · Zbl 0428.60039 · doi:10.1214/aop/1176994775
[12] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[13] Huang, J., Ma, S. and Zhang, C.-H. (2007). Adaptive LASSO for sparse high-dimensional regression models. Statist. Sinica . · Zbl 1255.62198
[14] Knight, K. and Fu, W. J. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[15] Leng, C., Lin, Y. and Wahba, G. (2006). A note on the LASSO and related procedures in model selection. Statist. Sinica 16 1273-1284. · Zbl 1109.62056
[16] Meinshausen, N. and Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[17] Meinshausen, N. and Yu, B. (2006). Lasso-type recovery of sparse representations for high-dimensional data. Technical report, Dept. Statistics, Univ. California, Berkeley. · Zbl 1155.62050
[18] Osborne, M., Presnell, B. and Turlach, B. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-404. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389
[19] Osborne, M., Presnell, B. and Turlach, B. (2000b). On the lasso and its dual. J. Comput. Graph. Statist. 9 319-337. JSTOR: · doi:10.2307/1390657
[20] Silverstein, J. W. (1985). The smallest eigenvalue of a large dimensional Wishart matrix. Ann. Probab. 13 1364-1368. · Zbl 0591.60025 · doi:10.1214/aop/1176992819
[21] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[22] van de Geer, S. (2007). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[23] Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at http://www.arxiv.org/PS_cache/math/pdf/0605/0605740v1.pdf.
[24] Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Machine Learning Research 7 2541-2567. · Zbl 1222.62008
[25] Zhang, C.-H. and Huang, J. (2006). Model-selection consistency of the LASSO in high-dimensional linear regression. Technical Report No. 2006-003, Dept. Statistics, Rutgers Univ.
[26] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.