×

Model selection in nonparametric regression. (English) Zbl 1019.62037

From the paper: Model selection using a penalized data-splitting device is studied in the context of nonparametric regression. Finite sample bounds under mild conditions are obtained. The resulting estimates are adaptive for large classes of functions.
We study the additive regression problem \(Y=\eta(X) +\varepsilon\), where \(\eta:\mathbb{R}^d \to\mathbb{R}\) is the unknown regression function, \(\varepsilon\) is the random error and \(X\in\mathbb{R}^d\) is the random covariate. The available data consists of independent observations \((X_j, Y_j)\). Our aim is to find an estimator \(\widehat\eta\) in some class \({\mathcal G}\) such that the mean squared error \(\mathbb{E}(\widehat \eta-\eta)^2(X)\) is as small as possible. We consider the least squares estimator \(\widehat g\in{\mathcal G}\), which satisfies \[ \sum_j\bigl( Y_j-\widehat g(X_j)\bigr)^2 \leq\sum_j\bigl( Y_j-g(X_j) \bigr)^2\quad\text{for all }g\in{\mathcal G}. \]

MSC:

62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference
60F05 Central limit and other weak theorems
60E15 Inequalities; stochastic orderings
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] ANTHONY, M. and BARTLETT, P. (1999). Neural Network Learning: Theoretical Foundations. Cambridge Univ. Press. · Zbl 0968.68126 · doi:10.1017/CBO9780511624216
[2] BARAUD, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467-493. · Zbl 0997.62027 · doi:10.1007/s004400000058
[3] BARRON, A. (1987). Are Bay es rules consistent in information? In Open Problems in Communication and Computation (T. Cover and B. Gopinath, eds.) 85-91. Springer, Berlin.
[4] BARRON, A. (1991). Complexity regularization with applications to artificial neural networks. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 561-576. Kluwer, Dordrecht. · Zbl 0739.62001
[5] BARRON, A., BIRGÉ, L. and MASSART, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413. · Zbl 0946.62036 · doi:10.1007/s004400050210
[6] BARTLETT, P., BOUCHERON, S. and LUGOSI, G. (2002). Model selection and error estimation. Machine Learning 48 85-113. · Zbl 0998.68117 · doi:10.1023/A:1013999503812
[7] DEVROy E, L. and LUGOSI, G. (1996). A universally acceptable smoothing factor for kernel density estimates. Ann. Statist. 24 2499-2512. · Zbl 0867.62024 · doi:10.1214/aos/1032181164
[8] DEVROy E, L. and LUGOSI, G. (1997). Nonasy mptotic universal smoothing factors, kernel complexity and Yatracos classes. Ann. Statist. 25 2626-2635. · Zbl 0897.62035 · doi:10.1214/aos/1030741088
[9] EINMAHL, U. and MASON, D. (1996). Some universal results on the behavior of increments of partial sums. Ann. Probab. 24 1388-1407. · Zbl 0872.60022 · doi:10.1214/aop/1065725186
[10] HENGARTNER, N., WEGKAMP, M. and MATZNER-LØBER, E. (2002). Bandwidth selection for local linear regression smoothers. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 791-804. JSTOR: · Zbl 1067.62038 · doi:10.1111/1467-9868.00361
[11] HENGARTNER, N. and WEGKAMP, M. (1999). A note on model selection procedures in nonparametric classification. Preprint, Dept. Statistics, Yale Univ. JSTOR: · Zbl 04552551 · doi:10.2307/2685730
[12] HENGARTNER, N. and WEGKAMP, M. (2001). Estimation and selection procedures in regression: The L1-approach. Canad. J. Statist. 29 621-632. JSTOR: · Zbl 0994.62030 · doi:10.2307/3316011
[13] IBRAGIMOV, R. and SHARAKHMETOV, SH. (1998). On an exact constant for the Rosenthal inequality. Theory Probab. Appl. 42 294-302. · Zbl 0927.60023
[14] LUGOSI, G. and NOBEL, A. (1999). Adaptive model selection using empirical complexities. Ann. Statist. 27 1830 - 1864. · Zbl 0962.62034 · doi:10.1214/aos/1017939241
[15] SHAO, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486-494. JSTOR: · Zbl 0773.62051 · doi:10.2307/2290328
[16] VAN DE GEER, S. (1990). Estimating a regression function. Ann. Statist. 18 907-924. · Zbl 0709.62040 · doi:10.1214/aos/1176347632
[17] VAN DE GEER, S. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press. · Zbl 0953.62049
[18] VAN DE GEER, S. and WEGKAMP, M (1996). Consistency for the least squares estimator in nonparametric regression. Ann. Statist. 24 2513-2523. · Zbl 0867.62027 · doi:10.1214/aos/1032181165
[19] VAPNIK, V. (1998). Statistical Learning Theory. Wiley, New York. · Zbl 0935.62007
[20] WEGKAMP, M. H. (1999). Quasi-universal bandwidth selection for kernel density estimators. Canad. J. Statist. 27 409-420 JSTOR: · Zbl 1066.62521 · doi:10.2307/3315649
[21] YANG, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 574-588. JSTOR: · Zbl 1018.62033 · doi:10.1198/016214501753168262
[22] YANG, Y. and BARRON, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564 - 1599. · Zbl 0978.62008 · doi:10.1214/aos/1017939142
[23] NEW HAVEN, CONNECTICUT 06520 E-MAIL: marten.wegkamp@yale.edu
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.