×

Bandwidth selection: Classical or plug-in? (English) Zbl 0938.62035

Summary: Bandwidth selection for procedures such as kernel density estimation and local regression have been widely studied over the past decade. Substantial “evidence” has been collected to establish superior performance of modern plug-in methods in comparison to methods such as cross validation; this has ranged from detailed analysis of rates of convergence, to simulations, to superior performance on real datasets.
We take a detailed look at some of this evidence, looking into the sources of differences. Our findings challenge the claimed superiority of plug-in methods on several fronts. First, plug-in methods are heavily dependent on arbitrary specification of pilot bandwidths and fail when this specification is wrong. Second, the often-quoted variability and undersmoothing of cross validation simply reflects the uncertainty of bandwidth selection; plug-in methods reflect this uncertainty by oversmoothing and missing important features when given difficult problems. Third, we look at asymptotic theory. Plug-in methods use available curvature information in an inefficient manner, resulting in inefficient estimates. Previous comparisons with classical approaches penalized the classical approaches for this inefficiency. Asymptotically, the plug-in based estimates are beaten by their own pilot estimates.

MSC:

62G07 Density estimation
62G20 Asymptotic properties of nonparametric inference
62-07 Data analysis (statistics) (MSC2010)
62A09 Graphical methods in statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Appl. Statist. 39 357-365. · Zbl 0707.62186 · doi:10.2307/2347385
[2] Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71 353-360. JSTOR: · doi:10.1093/biomet/71.2.353
[3] Chiu, S. T. (1991). Bandwidth selection for kernel density estimation. Ann. Statist. 19 1883-1905. · Zbl 0749.62022 · doi:10.1214/aos/1176348376
[4] Cleveland, W. S. (1993). Visualizing Data. Hobart Press, Summit, NJ.
[5] Cleveland, W. S. and Devlin, S. J. (1988). Locally weighted regression: an approach to regression analysis by local fitting. J. Amer. Statist. Assoc. 83 596-610. · Zbl 1248.62054
[6] Cleveland, W. S. and Loader, C. R. (1996). Smoothing by local regression: principles and methods. In Statistical Theory and Computational Aspects of Smoothing (W. Härdle and M. G. Schimek, eds.) 10-49. Physica, Heidelberg.
[7] Duin, R. P. W. (1976). On the choice of smoothing parameter for Parzen estimators of probability density functions. IEEE Trans. Comput. C-25 1175-1179. · Zbl 0359.93035 · doi:10.1109/TC.1976.1674577
[8] Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Ann. Statist. 21 196-216. · Zbl 0773.62029 · doi:10.1214/aos/1176349022
[9] Gasser, T., Kneip, A. and K öhler, W. (1991). A flexible and fast method for automatic smoothing. J. Amer. Statist. Assoc. 86 643-652. JSTOR: · Zbl 0733.62047 · doi:10.2307/2290393
[10] Habbema, J. D. F., Hermans, J. and VanDer Broek, K. (1974). A stepwise discriminant analysis program using density estimation. In COMPSTAT 1974, Proceedings in Computational Statistics, Vienna (G. Bruckman ed.) 101-110. Physica, Heidelberg.
[11] Hall, P., Sheather, S. J., Jones, M. C. and Marron, J. S. (1991). On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78 263-270. JSTOR: · Zbl 0733.62045 · doi:10.1093/biomet/78.2.263
[12] Härdle, W., Hall, P. and Marron, J. S. (1992). Regression smoothing parameters that are not far from their optimal. J. Amer. Statist. Assoc. 87 227-233. · Zbl 0850.62352 · doi:10.2307/2290473
[13] Henderson, R. (1916). Note on graduation by adjusted average. Trans. Actuarial Soc. America 17 43-48.
[14] Hjort, N. L. and Jones, M. C. (1996). Locally parametric nonparametric density estimation. Ann. Statist. 24 1619-1647. · Zbl 0867.62030 · doi:10.1214/aos/1032298288
[15] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. 91 401-407. JSTOR: · Zbl 0873.62040 · doi:10.2307/2291420
[16] Lejeune, M. and Sarda, P. (1992). Smooth estimators of distribution and density functions. Comput. Statist. Data Anal. 14 457-471. Loader, C. R. (1996a). Local likelihood density estimation. Ann. Statist. 24 1602-1618. Loader, C. R. (1996b). Local Regression and Likelihood. Electronic book, http://cm.bell-labs.com/ stat/project/locfit/. URL: · Zbl 0937.62581
[17] Mallows, C. L. (1973). Some comments on Cp. Technometrics 15 661-675. · Zbl 0269.62061 · doi:10.2307/1267380
[18] Marron, J. S. (1996). A personal view of smoothing and statistics. In Statistical Theory and Computational Aspects of Smoothing (W. Härdle and M. G. Schimek eds.) 1-9. Physica, Heidelberg.
[19] Marron, J. S. and Wand, M. P. (1992). Exact mean integrated squared error. Ann. Statist. 20 712-736. · Zbl 0746.62040 · doi:10.1214/aos/1176348653
[20] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman and Hall, London. · Zbl 0744.62098
[21] Park, B. U. and Marron, J. S. (1990). Comparison of data-driven bandwidth selectors. J. Amer. Statist. Assoc. 85 66-72.
[22] Park, B. U. and Turlach, B. A. (1992). Practical performance of several data driven bandwidth selectors. Comput. Statist. 7 251-270. · Zbl 0775.62100
[23] Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215-1230. · Zbl 0554.62035 · doi:10.1214/aos/1176346788
[24] Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27 832-837. · Zbl 0073.14602 · doi:10.1214/aoms/1177728190
[25] Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9 65-78. · Zbl 0501.62028
[26] Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257-1270. JSTOR: · Zbl 0868.62034 · doi:10.2307/2291516
[27] Schuster, E. F. and Gregory, G. G. (1981). On the nonconsistency of maximum likelihood nonparametric density estimators. In Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface (W. F. Eddy, ed.) 295-298. Springer, Berlin.
[28] Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. Wiley, New York. · Zbl 0850.62006
[29] Scott, D. W. and Terrell, G. R. (1987). Biased and unbiased cross-validation in density estimation. J. Amer. Statist. Assoc. 82 1131-1146. JSTOR: · Zbl 0648.62037 · doi:10.2307/2289391
[30] Sheather, S. J. (1992). The performance of six popular bandwidth selection methods on some real datasets. Comput. Statist. 7 225-250. · Zbl 0775.62103
[31] Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683-690. JSTOR: · Zbl 0800.62219
[32] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. · Zbl 0617.62042
[33] Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Statist. 8 1348-1360. · Zbl 0451.62033 · doi:10.1214/aos/1176345206
[34] Taylor, C. C. (1989). Bootstrap choice of the smoothing parameter in kernel density estimation. Biometrika 76 705-712. JSTOR: · Zbl 0678.62042 · doi:10.1093/biomet/76.4.705
[35] Tibshirani, R. J. and Hastie, T. J. (1987). Local likelihood estimation. J. Amer. Statist. Assoc. 82 559-567. JSTOR: · Zbl 0626.62041 · doi:10.2307/2289465
[36] Woodroofe, M. (1970). On choosing a delta sequence. Ann. Math. Statist. 41 1665-1671. · Zbl 0229.62022 · doi:10.1214/aoms/1177696810
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.