×

Local polynomial regression with selection biased data. (English) Zbl 0969.62030

Summary: Let \(Y\) and \({\mathbf X}\) be real- and \(R^d\)-valued random variables. We consider the estimation of the nonparametric regression function \(m({\mathbf x})= E(Y\mid X={\mathbf x})\) when \(s\geq 1\) independent selection-biased samples of \((Y, {\mathbf X})\) are observed. This sampling scheme, which arises naturally in biological and epidemiological studies and many other fields, includes stratified samples, length-biased samples and other weighted distributions. A class of local polynomial estimators of \(m({\mathbf x})\) is derived by smoothing Y. Vardi’s [Ann. Stat. 10, 616-620 (1982; Zbl 0491.62034); ibid. 13, 178-205 (1985; Zbl 0578.62047)] nonparametric maximum likelihood estimator of the underlying distribution function. Large sample properties, such as asymptotic distributions and asymptotic mean squared risks, are derived explicitly.
Unlike local polynomial regression with i.i.d. direct samples, we show here that kernel choices are important and optimal kernel functions may be asymmetric and discontinuous when the weight functions of the biased samples have jumps. A cross-validation criterion is proposed for the selection of data-driven bandwidths. Through a simple comparison, we show that our estimators are superior to other intuitive estimators of \(m({\mathbf x})\).

MSC:

62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference
62G07 Density estimation
PDFBibTeX XMLCite