×

A robust EM clustering algorithm for Gaussian mixture models. (English) Zbl 1242.68260

Summary: Clustering is a useful tool for finding structure in a data set. The mixture likelihood approach to clustering is a popular clustering method, in which the EM algorithm is the most used method. However, the EM algorithm for Gaussian mixture models is quite sensitive to initial values and the number of its components needs to be given a priori. To resolve these drawbacks of the EM, we develop a robust EM clustering algorithm for Gaussian mixture models, first creating a new way to solve these initialization problems. We then construct a schema to automatically obtain an optimal number of clusters. Therefore, the proposed robust EM algorithm is robust to initialization and also different cluster volumes with automatically obtaining an optimal number of clusters. Some experimental examples are used to compare our robust EM algorithm with existing clustering methods. The results demonstrate the superiority and usefulness of our proposed method.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition

Software:

clusfind
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Jain, A. K.; Dubes, R. C., Algorithms for Clustering Data (1988), Prentice Hall: Prentice Hall Englewood Cliffs, NJ · Zbl 0665.62061
[2] McLachlan, G. J.; Basford, K. E., Mixture Models: Inference and Applications to clustering (1988), Marcel Dekker: Marcel Dekker New York · Zbl 0697.62050
[3] Kaufman, L.; Rousseeuw, P. J., Finding Groups in Data: An Introduction to Cluster Analysis (1990), Wiley: Wiley New York · Zbl 1345.62009
[4] Duda, R.; Hart, P., Pattern Classification and Scene Analysis (1973), Wiley: Wiley New York · Zbl 0277.68056
[5] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society-Series B, 39, 1-38 (1977) · Zbl 0364.62022
[6] J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281-297, University of California Press, 1967.; J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281-297, University of California Press, 1967. · Zbl 0214.46201
[7] Pollard, D., Quantization and the method of \(k\)-means, IEEE Transactions on Information Theory, 28, 199-205 (1982) · Zbl 0476.94010
[8] Guesta-Albertos, J. A.; Gordaliza, A.; Matran, C., Trimmed \(k\)-means: an attempt to robustify quantizers, Annals of Statistics, 25, 553-576 (1997) · Zbl 0878.62045
[9] Garcia-Escudero, L. A.; Gordaliza, A., Robustness properties of \(k\)-means and trimmed k-means, Journal of the American Statistical Association, 94, 956-969 (1999) · Zbl 1072.62547
[10] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms (1981), Plenum Press: Plenum Press New York · Zbl 0503.68069
[11] Wu, K. L.; Yang, M. S., Alternative \(c\)-means clustering algorithms, Pattern Recognition, 35, 2267-2278 (2002) · Zbl 1006.68876
[12] Cheng, Y., Mean shift, mode seeking, and clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790-799 (1995)
[13] Wu, K. L.; Yang, M. S., Mean shift-based clustering, Pattern Recognition, 40, 3035-3052 (2007) · Zbl 1118.68645
[14] Biernacki, C.; Celeux, G.; Govaert, G., Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Computational Statistic & Data Analysis, 41, 561-575 (2003) · Zbl 1429.62235
[15] Reddy, C. K.; Chiang, H. D.; Rajaratnam, B., TRUST-TECH-based expectation maximization for learning finite mixture models, IEEE Transactions on, Pattern Analysis and Machine Intelligence, 30, 1146-1157 (2008)
[16] Richardson, S.; Green, P. J., On Bayesian analysis of mixtures with an unknown number of components, Journal of the Royal Statistical Society-Series B, 59, 731-758 (1997) · Zbl 0891.62020
[17] Wu, C. F.J., On the convergence properties of the EM algorithm, Annals of Statistics, 11, 95-103 (1983) · Zbl 0517.62035
[18] Xu, L.; Jordan, M. I., On convergence properties of the EM algorithm for Gaussian mixtures, Neural Computation, 8, 129-151 (1996)
[19] Ma, J.; Xu, L.; Jordan, M. I., Asymptotic convergence rate of the EM algorithm for gaussian mixtures, Neural Computation, 12, 2881-2907 (2000)
[20] Figueiredo, M. A.T.; Jain, A. K., Unsupervised learning of finite Mixture models, IEEE Transactions on, Pattern Analysis and Machine Intelligence, 24, 381-396 (2002)
[21] Celeux, G.; Chrétien, S.; Forbes, F.; Mkhadri, A., A component-wise EM algorithm for mixtures, Journal of Computational and Graphical Statistics, 10, 697-712 (2001)
[22] Ueda, N.; Nakano, R., Deterministic annealing EM algorithm, Neural Networks, 11, 271-282 (1998)
[23] Geva, A. B., Hierarchical unsupervised fuzzy clustering, IEEE Transactions on, Fuzzy Systems, 7, 723-733 (1999)
[24] Lubischew, A. A., On the use of discriminant functions in taxonomy, Biometrics, 18, 455-477 (1962) · Zbl 0112.11602
[25] Reaven, G. M.; Miller, R. G., An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia, 16, 17-24 (1979)
[26] Peel, D.; MacLaren, G. J., Robust mixture modeling using the t-distribution, Statistics and Computing, 10, 339-348 (2000)
[27] Lo, K.; Gottardo, R., Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-\(t\) distribution, Statistics and Computing, 22, 33-52 (2012) · Zbl 1322.62173
[28] Basso, R. M.; Lachos, V. H.; Cabral, C. R.B.; Ghosh, P., Robust mixture modeling based on scale mixtures of skew-normal distributions, Computational Statistics and Data Analysis, 54, 2926-2941 (2010) · Zbl 1284.62193
[29] Sun, J.; Kaban, A.; Garibaldi, J. M., Robust mixture clustering using Pearson type VII distribution, Pattern Recognition Letters, 31, 2447-2454 (2010)
[30] Neykova, N.; Filzmoserb, P.; Dimovac, R.; Neytcheva, P., Robust fitting of mixtures using the trimmed likelihood estimator, Computational Statistics & Data Analysis, 52, 299-308 (2007) · Zbl 1328.62033
[31] Cuesta-Albertos, J. A.; Matrán, C.; Mayo-Iscar, A., Robust estimation in the normal mixture model based on robust clustering, Journal of the Royal Statistical Society B, 70, 779-802 (2008) · Zbl 05563369
[32] Andrews, J. L.; McNicholas, P. D., Mixtures of modified \(t\)-factor analyzers for model-based clustering, classification, and discriminant analysis, Journal of Statistical Planning and Inference, 141, 1479-1486 (2011) · Zbl 1204.62098
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.