×

A consistent procedure for determining the number of clusters in regression clustering. (English) Zbl 1074.62042

Summary: An information-based criterion for determining the number of clusters in the problem of regression clustering is proposed. It is shown that, under a probabilistically structured population, the proposed criterion selects the true number of regression hyperplanes with probability one among all class-growing sequences of classifications, when the number of observations \(n\) from the population increases to infinity. Results from a simulation study are also presented.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models
62F12 Asymptotic properties of parametric estimators

Software:

Algorithm 39
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bai, Z. D.; Rao, C. R.; Wu, Y., Model selection with data-oriented penalty, J. Statist. Plann. Inference, 77, 103-117 (1999) · Zbl 0926.62045
[2] Bock, H.H., 1996. Probability models and hypotheses testing in partitioning cluster analysis. In: Arabie, P., Hubert, L.J., De Soete, G. (Eds.), Clustering and Classification. World Scientific Publishing. River Edge, NJ. pp. 377-453.; Bock, H.H., 1996. Probability models and hypotheses testing in partitioning cluster analysis. In: Arabie, P., Hubert, L.J., De Soete, G. (Eds.), Clustering and Classification. World Scientific Publishing. River Edge, NJ. pp. 377-453. · Zbl 1031.62504
[3] Bock, H.H., 1999. Regression-type models for Kohonen’s self-organizing networks. In: Decker, R., Gaul, W. (Eds.), Classification and Information Processing at the Turn of the Millennium. Springer, New York-Heidelberg-Berlin. pp. 18-31.; Bock, H.H., 1999. Regression-type models for Kohonen’s self-organizing networks. In: Decker, R., Gaul, W. (Eds.), Classification and Information Processing at the Turn of the Millennium. Springer, New York-Heidelberg-Berlin. pp. 18-31.
[4] DeSarbo, W. S.; Cron, W. L., A maximum likelihood methodology for clusterwise linear regression, J. Classification, 5, 249-282 (1988) · Zbl 0692.62052
[5] Hannan, E. J.; Quinn, B. G., The determination of the order of an autoregression, J. Roy. Statist. Soc. B, 41, 190-195 (1979) · Zbl 0408.62076
[6] Lou, X.; Jiang, J.; Keng, K., Clustering objects generated by linear regression models, J. Amer. Statist. Assoc., 88, 1356-1362 (1993) · Zbl 0792.62053
[7] McClelland, R. L.; Kronmal, R. A., Regression-based variable clustering for data reduction, Statist. Med., 21, 921-941 (2002)
[8] Petrov, V. V., Limit Theorems of Probability Theory (1995), Oxford Science Publications: Oxford Science Publications Oxford · Zbl 0826.60001
[9] Rao, C. R.; Wu, Y., A strongly consistent procedure for model selection in a regression problem, Biometrika, 76, 369-374 (1989) · Zbl 0669.62051
[10] Shao, J., An asymptotic theory for linear model selection, Statist. Sinica, 7, 221-264 (1997) · Zbl 1003.62527
[11] Algorithm 48: a fast algorithm for clusterwise linear regression, Computing, 29, 175-181 (1982) · Zbl 0485.65030
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.