×

On smoothing sparse multinomial data. (English) Zbl 0628.62039

Let \(p(i)\), \(i=1,...,m\), be multinomial cell probabilities for a given model, and \(\hat p(i)\) the corresponding cell proportion estimators (unsmoothed). A smoothed kernel estimator can be defined by employing information from neighbouring cells. The multinomial is sparse if (in asymptotic arguments) \(\sup_{i}p(i)\leq C\delta\), and \[ \sup_{i}| p(i+j)- \sum^{t-1}_{k=0} \binom{j}{k} \Delta^ kp(i)| \leq C\delta | j\delta |^ t, \] for some constant C and all j; these conditions are used to define a smoothness class \({\mathcal P}_ t\). Optimality of estimation procedures is judged by minimizing mean summed square error, e.g., \(\sum^{m}_{i=1}E\{\hat p(i)-p(i)\}^ 2.\)
If the data is not too sparse \((n^{1/(2t+1)}\delta\) is bounded away from 0), then the optimal rate of convergence is that achieved by the unsmoothed cell proportions, namely \(0(n^{-1})\). Otherwise, the rate can be improved by smoothing. Explicit results, including formulae for the optimal smoothing parameter, are presented for a kernel-type estimator. The smoothing parameter is estimated from the data by “least- squares cross-validation”, in which one sample observation is omitted at a time, and the resulting procedure is shown to be asymptotically optimal.
Reviewer: R.Mentz

MSC:

62G05 Nonparametric estimation
62H99 Multivariate analysis
PDFBibTeX XMLCite
Full Text: DOI