×

Flexible smoothing with \(B\)-splines and penalties. With comments and a rejoinder by the authors. (English) Zbl 0955.62562

Summary: \(B\)-splines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent \(B\)-splines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of \(B\)-splines, of their construction and of penalized likelihood is presented. We discuss properties of penalized \(B\)-splines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented.

MSC:

62G05 Nonparametric estimation
62G07 Density estimation
62G08 Nonparametric regression and quantile regression

Software:

FITPACK; KernSmooth
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ashford, R. and Walker, P. J. (1972). Quantal response analysis for a mixture of populations. Biometrics 28 981-988.
[2] Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analy sis: Theory and Practice. MIT Press. · Zbl 0332.62039
[3] Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatter plots. J. Amer. Statist. Assoc. 74 829-836. · Zbl 0423.62029 · doi:10.2307/2286407
[4] Cox, M. G. (1981). Practical spline approximation. In Topics in Numerical Analy sis (P. R. Turner, ed.). Springer, Berlin. · Zbl 0492.65004 · doi:10.1007/BFb0063201
[5] de Boor, C. (1977). Package for calculating with B-splines. SIAM J. Numer. Anal. 14 441-472. JSTOR: · Zbl 0364.65008 · doi:10.1137/0714026
[6] de Boor, C. (1978). A Practical Guide to Splines. Springer, Berlin. · Zbl 0406.41003
[7] Dierckx, P. (1993). Curve and Surface Fitting with Splines. Clarendon, Oxford. · Zbl 0782.41016
[8] Diggle P. and Marron J. S. (1988). Equivalence of smoothing parameter selectors in density and intensity estimation. J. Amer. Statist. Assoc. 83 793-800. JSTOR: · Zbl 0662.62036 · doi:10.2307/2289308
[9] Eilers, P. H. C. (1990). Smoothing and interpolation with generalized linear models. Quaderni di Statistica e Matematica Applicata alle Scienze Economico-Sociali 12 21-32. Eilers, P. H. C. (1991a). Penalized regression in action: estimating pollution roses from daily averages. Environmetrics 2 25-48. Eilers, P. H. C. (1991b). Nonparametric density estimation with grouped observations. Statist. Neerlandica 45 255-270. · Zbl 04504322 · doi:10.1111/j.1467-9574.1991.tb01308.x
[10] Eilers, P. H. C. (1995). Indirect observations, composite link models and penalized likelihood. In Statistical Modelling (G. U. H. Seeber et al., eds.). Springer, New York.
[11] Eilers, P. H. C. and Marx, B. D. (1992). Generalized linear models with P-splines. In Advances in GLIM and Statistical Modelling (L. Fahrmeir et al., eds.). Springer, New York.
[12] Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. Dekker, New York. · Zbl 0702.62036
[13] Friedman, J. and Silverman, B. W. (1989). Flexible parsimonious smoothing and additive modeling (with discussion). Technometrics 31 3-39. JSTOR: · Zbl 0672.65119 · doi:10.2307/1270359
[14] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London. · Zbl 0832.62032
[15] Green, P. J. and Yandell, B. S. (1985). Semi-parametric generalized linear models. In Generalized Linear Models (B. Gilchrist et al., eds.). Springer, New York. Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J. and Os trowski, E. (1994). A Handbook of Small Data Sets. Chapman and Hall, London.
[16] Härdle, W. (1990). Applied Nonparametric Regression. Cambridge Univ. Press. · Zbl 0714.62030
[17] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London. · Zbl 0747.62061
[18] Kooperberg, C. and Stone, C. J. (1991). A study of logspline density estimation. Comput. Statist. Data Anal. 12 327-347. · Zbl 0825.62442 · doi:10.1016/0167-9473(91)90115-I
[19] Kooperberg, C. and Stone, C. J. (1992). Logspline density estimation for censored data. J. Comput. Graph. Statist. 1 301- 328.
[20] Marron, J. S. and Ruppert, D. (1994). Transformations to reduce boundary bias in kernel density estimation. J. Roy. Statist. Soc. Ser. B 56 653-671. JSTOR: · Zbl 0805.62046
[21] Marx, B. D. and Eilers, P. H. C. (1994). Direct generalized additive modelling with penalized likelihood. Paper presented at the 9th Workshop on Statistical Modelling, Exeter, 1994.
[22] Marx, B. D. and Eilers, P. H. C. (1996). Direct generalized additive modelling with penalized likelihood. Unpublished manuscript. · Zbl 1042.62580
[23] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London. · Zbl 0744.62098
[24] O’Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statist. Sci. 1 505-527. · Zbl 0625.62110 · doi:10.1214/ss/1177013525
[25] O’Sullivan, F. (1988). Fast computation of fully automated logdensity and log-hazard estimators. SIAM J. Sci. Statist. Comput. 9 363-379. · Zbl 0688.65083 · doi:10.1137/0909024
[26] Reinsch, C. (1967). Smoothing by spline functions. Numer. Math. 10 177-183. · Zbl 0161.36203 · doi:10.1007/BF02162161
[27] Sakamoto, Y., Ishiguro, M. and Kitagawa, G. (1986). Akaike Information Criterion Statistics. Reidel, Dordrecht. · Zbl 0608.62006
[28] Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York. · Zbl 0850.62006
[29] Silverman, B. W. (1985). Some aspects of the spline smoothing approach to nonparametric regression curve fitting (with discussion). J. Roy. Statist. Soc. Ser. B 47 1-52. JSTOR: · Zbl 0606.62038
[30] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analy sis. Chapman and Hall, London. · Zbl 0617.62042
[31] Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia. · Zbl 0813.62001
[32] Wand, M. P. and Jones, M. C. (1993). Kernel Smoothing. Chapman and Hall, London. · Zbl 0775.62105 · doi:10.2307/2290332
[33] Whittaker, E. T. (1923). On a new method of graduation. Proc. Edinburgh Math. Soc. 41 63-75.
[34] ron and Park (1992). In the following, I provide a brief review to explain the defects and some remedy to the classical selectors for kernel regression estimate. Let us assume the simplest model of a circular design with equally spaced design points. yt = \mu xt + t, where t are i.i.d. noise. For the kernel estimate \mu with a bandwidth, we often use the mean of sum of squared errors totically equivalent in Rice (1984). All of these procedures rely on the residual sum of squares RSS. Mallows (1973) proposed the procedure based on the observation that Chiu, S.-T. (1996). A comparative review of bandwidth selection for kernel density estimation. Statist. Sinica 6 129-145. · Zbl 0850.62359
[35] Hall, P. and Johnstone, I. (1992). Empirical functionals and efficient smoothing parameter selection. J. Roy. Statist. Soc. Ser. B 54 519-521. JSTOR: · Zbl 0786.62050
[36] Hall, P., Marron, J. S. and Park, B. U. (1992). Smoothed crossvalidation. Probab. Theory Related Fields 92 1-20. · Zbl 0742.62042 · doi:10.1007/BF01205233
[37] Mallows, C. (1973). Some comments on Cp. Technometrics 15 661-675. · Zbl 0269.62061 · doi:10.2307/1267380
[38] Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215-1230. · Zbl 0554.62035 · doi:10.1214/aos/1176346788
[39] Scott, D. W. and Terrell, G. R. (1987). Biased and unbiased cross-validation in density estimation. J. Amer. Statist. Assoc. 82 1131-1146. JSTOR: · Zbl 0648.62037 · doi:10.2307/2289391
[40] U, Q2 = BTB U, = diag and U is an orthogonal matrix such that Q-1/2 2 DTDQ-1/2 2 = U UT. The columns of G can be identified with a new set of functions known as the Demmler- Reinsch (DR) basis. Specifically these are piecewise poly nomial functions, so that the elements of G satisfy xi = Gi. Besides having useful orthogonality properties the DR basis can be ordered by frequency and larger values of will exhibit more oscillations (in fact 1 zero crossings). Figure 1(a) plots several of the basis functions for m = 133 equally spaced x’s and 20 equally spaced interior knots. Figure 1(b) illustrates the expected poly nomial increase in the size of as a function of. The Demmler-Reinsch basis provides an informative interpretation of the spline estimate. Let f d and Xiang and Wahba (1996). For the density estimation problem in Section 8, I could not find the definition of the H matrix to understand the AIC proposed, but whatever it is, it should be subject to the same scrutiny before being recommended as ”optimal.” In ordinary Gaussian regression, the optimality of GCV is well established in the literature. For the AIC score presented in (27), however, I would like some empirical evidence to be convinced of its optimality. The skepticism is partly due to some empirical evidence suggesting that the trace of H may not be a consistent characterization of the effective dimension of the model. Such evidence can be found in Gu (1996), available online at http:// www.stat.lsa.umich.edu/ chong/ps/modl.ps. URL:
[41] Barry, D. (1993). Testing for additivity of a regression function. Ann. Statist. 21 235-254. · Zbl 0771.62033 · doi:10.1214/aos/1176349024
[42] Cox, D. D. and Chang, Y.-F. (1990). Iterated state space algorithms and cross validation for generalized smoothing splines. Technical Report 49, Dept. Statistics, Univ. Illinois.
[43] Cox, D. D., Koh, E., Wahba, G. and Yandell, B. S. (1988). Testing the (parametric) null model hy pothesis in (semiparametric) partial and generalized spline models. Ann. Statist. 16 113-119. · Zbl 0673.62017 · doi:10.1214/aos/1176350693
[44] Gu, C. (1992). Cross validating non Gaussian data. Journal of Computational and Graphical Statistics 1 169-179.
[45] Gu, C. (1996). Model indexing and smoothing parameter selection in nonparametric function estimation. Technical Report 93-55 (rev.), Dept. Statistics, Purdue Univ.
[46] Wahba, G. (1983). Bayesian ”confidence intervals” for the crossvalidated smoothing spline. J. Roy. Statist. Soc. Ser. B 45 133-150. JSTOR: · Zbl 0538.65006
[47] Xiang, D. and Wahba, G. (1996). A generalized approximate cross validation for smoothing splines with non-Gaussian date. Statist. Sinica. · Zbl 0854.62044
[48] Gijbels, 1996). Conservation of moments seems unimportant. In regression, I do not see the desirability. In density estimation, simple corrections of kernel density estimates for variance inflation exist, but make little difference away from the normal density (Jones,
[49] ). Indeed, getting means and variances right is a normality-based concept, so corrected kernel estimators act in a normal-driven semiparametric manner. Efron and Tibshirani (1996) propose more sophisticated moment conservation, but initial indications are that this is no better nor worse than alternative semiparametric density estimators (Hjort,
[50] ). ”The computations, including those for crossvalidation, are relatively inexpensive and easily incorporated into standard software.” Again, proponents of the two competing methods I have mentioned would claim the same for the first half of this and advocates of regression splines would claim the lot. The authors make no particularly novel contribution to automatic bandwidth selection. Crossvalidation and AIC are in a class of methods (e.g., Härdle, 1990, pages 166-167) which, while not being downright bad, allow scope for improvement.
[51] ). Binning is the major computational device of all kernel-ty pe estimators (Fan and Marron, 1994). The local likelihood approach is already deeply understood theoretically. Comparison of P-splines’s reasonable boundary performance with local poly nomials’s reasonable boundary performance is not yet available through theory or simulations. An interesting point mentioned in the paper is the apparent continuum between few-parameter parametric fits at one end and fully ”nonparametric” techniques at the other, with many-parameter par Ansley, C. F., Kohn, R. and Wong, C. M. (1993). Nonparametric spline regression with prior information. Biometrika 80 75- 88. JSTOR: · Zbl 0771.62027 · doi:10.1093/biomet/80.1.75
[52] Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Statist. 24 000-000. · Zbl 0878.62028 · doi:10.1214/aos/1032181161
[53] Fan, J. and Gijbels, I. (1996). Local Poly nomial Modelling and Its Applications. Chapman and Hall, London. · Zbl 0873.62037
[54] Fan, J. and Marron, J. S. (1994). Fast implementations of nonparametric curve estimators. J. Comput. Graph. Statist. 3 35-56.
[55] Hjort, N. L. (1996). Performance of Efron and Tibshirani’s semiparametric denisty estimator. Unpublished manuscript.
[56] Hjort, N. L. and Jones, M. C. (1996). Locally parametric nonparametric density estimation. Ann. Statist. 24 1619-1647. · Zbl 0867.62030 · doi:10.1214/aos/1032298288
[57] Jones, M. C. (1991). On correcting for variance inflation in kernel density estimation. Comput. Statist. Data Anal. 11 3-15. · Zbl 0850.62344 · doi:10.1016/0167-9473(91)90049-8
[58] Jones, M. C. (1996). On close relations of local likelihood density estimation. Unpublished manuscript. · Zbl 0882.62034 · doi:10.1007/BF02562622
[59] Loader, C. R. (1996). Local likelihood density estimation. Ann. Statist. 24 1602-1618 · Zbl 0867.62034 · doi:10.1214/aos/1032298287
[60] Marron, J. S. (1996). A personal view of smoothing and statistics (with discussion). Comput. Statist.
[61] Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257-1270. JSTOR: · Zbl 0868.62034 · doi:10.2307/2291516
[62] Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer, New York. · Zbl 0859.62035
[63] timality and minimax properties (Fan, 1993). For density estimation Engel and Gasser (1995) show a minimax property of the fixed bandwith kernel method within a large class of estimators containing penalized likelihood estimators. The presented paper does not provide any argument, neither theoretical nor by simulations, supporting any superiority of P-splines over their many competitors. In the regression case, the theoretical properties of P-splines might be evaluated by combining arguments of de Boor (1978) on the asy mptotic bias and variance of B-splines in (dependence on m, the spline order k and the smoothness of the underlying function) with the well-known results on smoothing splines. The authors propose to use AIC or crossvalidation to select the smoothing parameter. However, a careful look at their method reveals that there are in fact two free parameters: and the number n of knots. If n m, then we essentially obtain a smoothing spline fit, while results might be very different if n m. Indeed, the estimate might crucially depend on n. Therefore, why not determine and n by cross-validation or a related method? The following theoretical arguments may suggest that such a procedure will work. Note that AIC and cross-validation are very close to unbiased risk estimation which consists of estimating the optimal values of and n by minimizing m et al. (1996). Software for the 1992 version, written in C and interfaced to S-PLUS, is publically available from Statlib. (The 1992 version of LOGSPLINE employ s only knot deletion; here, however, we focus on the 1996 version, which uses both knot addition and knot deletion.) LOGSPLINE can provide estimates on both finite and infinite intervals, and it can handle censored data. The results of LOGSPLINE on the Old Faithful data and the suicide data are very similar to the corresponding results of P-splines [the suicide data is an example in Kooperberg and Stone (1992)]. Here we consider a much more challenging data set. The solid line in Figure 1 shows the logspline density estimate based on a random sample of 7,125 annual net incomes in the United Kingdom [Family Expenditure Survey (1968-1983)]. (The data have been rescaled to have mean 1.) The nine knots that were selected by LOGSPLINE are indicated. Note that four of these knots are extremely close to the peak near 0 24. This peak is due to the UK old age pension, which caused many people to have nearly identical incomes. In Kooperberg and Stone (1992) we concluded that the height and location of this peak are accurately estimated by LOGSPLINE. There are several reasons why this data is more challenging than the Old Faithful and suicide data: the data set is much larger, so that it is more of a challenge to computing resources (the LOGSPLINE estimate took 9 seconds on a Sparc 10 workstation); the width of the peak is about 0.02, compared to the range 11.5 of the data; there is a severe outlier (the largest observation is 11.5, the second largest is 7.8); and the rise of the density to the left of the peak is very steep. To get an impression of what the P-splines procedure would yield for this data, I first removed the largest observation so that there would not be any long gaps in the data, reducing the maximum observation to 7.8. The dashed line in Figure 1 is the LOGSPLINE estimate to the data with fixed knots at i/20 \times 7 8, for i = 0 1 20 (using 20 intervals, as in most P-spline examples.) The resulting fit should be similar to a P-spline fit with = 0. In this estimate it appears that the narrow peak is completely missed and that, because of the steep rise of the density to the left of the peak and the lack of sufficiently many knots near the peak, two modes are estimated where only one mode exists.
[64] ors as in Wahba (1978). In particular, the usual priors are specified independently of sample size, whereas one would want to use more B-splines with a larger sample. Furthermore, the integral of the second derivative squared is easier to interpret from a non-Bayesian perspective than the sum of squares of second differences of B-spline coefficients. I take issue with the authors’s claim that their method does not have boundary problems. P-splines are approximately equivalent to smoothing splines which do have boundary effects (Speckman, 1983). To explain, consider minimizing from equation (5),
[65] Sheather (1996). We use AIC and get essentially the same amount of smoothing, ”a second generation result at a first generation price.” Again, we do not wish to imply that AIC is the final answer, but show that it is more useful than sometimes suggested.
[66] LR, local regression; LRB, local regression with binning; SS, smoothing splines; SSB, smoothing splines with band solver; RSF, regression splines with fixed knots; RSA, regression splines with adaptive knots; PS, P-splines. The row ”Adaptive flexibility available” means that a software implementation is readily available Cook, R. D. and Weisberg, S. (1994). Regression Graphics. Wiley, New York. · Zbl 0925.62287
[67] Eilers, P. H. C. (1988). Autoregressive models with latent variables. In COMPSTAT 1988 Proceedings (D. Edwards and N. E. Raun, eds.). physica-Verlag.
[68] Engel, J. and Gasser, T. (1995). A minimax result for a class of nonparametric density estimators. Nonparametric Statistics 4 327-334. · Zbl 1380.62145 · doi:10.1080/10485259508832624
[69] Family Expenditure Survey (1968-1983). Annual base tapes and reports (1968-1983). Dept. Employ ment, Statistics Division, Her Majesty’s Stationary Office, London.
[70] Fan, J. (1993). Local linear regression smoothers and their minimax efficiency. Ann. Statist. 21 196-216. · Zbl 0773.62029 · doi:10.1214/aos/1176349022
[71] Fan, J. and Gijbels, I. (1995). Data-driven bandwidth selection in local poly nomial fitting: variable bandwidth and spatial adaptation. J. Roy. Statist. Soc. Ser. B 57 371-394. JSTOR: · Zbl 0813.62033
[72] Fan, J., Hall, P., Martin, M. A. and Patil, P. (1996). On local smoothing of nonparametric curve estimators. J. Amer. Statist. Assoc. 91 258-266. Foley, J. D., van Dam, A., Feiner, S. K. and Hughes, J. F. JSTOR: · Zbl 0871.62036 · doi:10.2307/2291403
[73] . Computer Graphics: Principles and Practice. Addison-Wesley, Reading, MA. · Zbl 0875.68891
[74] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. · Zbl 0765.62064 · doi:10.1214/aos/1176347963
[75] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. To appear. JSTOR: · Zbl 0873.62040 · doi:10.2307/2291420
[76] Kneip, A. (1994). Ordered linear smoothers. Ann. Statist. 22 835-866. · Zbl 0815.62022 · doi:10.1214/aos/1176325498
[77] Kooperberg, C., Bose, S. and Stone, C. J. (1997). Poly chotomous regression. J. Amer. Statist. Assoc. To appear. Kooperberg, C., Stone, C. J. and Truong, Y. K. (1995a). Hazard regression. J. Amer. Statist. Assoc. 90 78-94. Kooperberg, C., Stone, C. J. and Truong, Y. K. (1995b). Logspline estimation of a possibly mixed spectral distribution. J. Time Ser. Anal. 16 359-388. · Zbl 0832.62083 · doi:10.1111/j.1467-9892.1995.tb00240.x
[78] Speckman, P. L. (1983). Spline smoothing and optimal rates of convergence in nonparametric regression models. Ann. Statist. 13 970-983. Stone, C. J., Hansen, M., Kooperberg, C. and Truong, Y. K. · Zbl 0585.62074 · doi:10.1214/aos/1176349650
[79] . Poly nomial splines and their tensor products in extended linear modeling. Ann. Statist. · Zbl 0924.62036 · doi:10.1214/aos/1031594728
[80] Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Statist. Soc. Ser. B 40 364-372. JSTOR: · Zbl 0407.62048
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.