×

Clustering of time series data – a survey. (English) Zbl 1077.68803

Summary: Time series clustering has been shown effective in providing useful information in various domains. There seems to be an increased interest in time series clustering as part of the effort in temporal data mining research. To provide an overview, this paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains. The basics of time series clustering are presented, including general-purpose clustering algorithms commonly used in time series clustering studies, the criteria for evaluating the performance of the clustering results, and the measures to determine the similarity/dissimilarity between two time series being compared, either in the forms of raw data, extracted features, or some model parameters. The past researchs are organized into three groups depending upon whether they work directly with the raw data either in the time or frequency domain, indirectly with features extracted from the raw data, or indirectly with models built from the raw data. The uniqueness and limitation of previous research are discussed and several possible topics for future research are identified. Moreover, the areas that time series clustering have been applied to are also summarized, including the sources of data used. It is hoped that this review will serve as the steppingstone for those interested in advancing this area of research.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence

Software:

AutoClass; clusfind
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Han, J.; Kamber, M., Data Mining: Concepts and Techniques (2001), Morgan Kaufmann: Morgan Kaufmann San Francisco, pp. 346-389
[2] J. MacQueen, Some methods for classification and analysis of multivariate observations, in: L.M. LeCam, J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281-297.; J. MacQueen, Some methods for classification and analysis of multivariate observations, in: L.M. LeCam, J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281-297. · Zbl 0214.46201
[3] Kaufman, L.; Rousseeuw, P. J., Finding Groups in Data: An Introduction to Cluster Analysis (1990), Wiley: Wiley New York · Zbl 1345.62009
[4] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms (1987), Plenum Press: Plenum Press New York and London
[5] Krishnapuram, R.; Joshi, A.; Nasraoui, O.; Yi, L., Low-complexity fuzzy relational clustering algorithms for web mining, IEEE Trans. Fuzzy Systems, 9, 4, 595-607 (2001)
[6] Krishnapuram, R.; Kim, J., A note on the Gustafson-Kessel and adaptive fuzzy clustering algorithms, IEEE Trans. Fuzzy Systems, 7, 4, 453-461 (1999)
[7] Krishna, K.; Murty, M. N., Genetic \(k\)-means algorithms, IEEE Trans. Syst. Man Cybernet.-BCybernet., 29, 3, 433-439 (1999)
[8] Meng, L.; Wu, Q. H.; Yong, Z. Z., A genetic hard \(c\)-means clustering algorithm, Dyn. Continuous Discrete Impulsive Syst. Ser. B: Appl. Algorithms, 9, 421-438 (2002) · Zbl 1001.68201
[9] V. Estivill-Castro, A.T. Murray, Spatial clustering for data mining with genetic algorithms, http://citeseer.nj.nec.com/estivill-castro97spatial.html; V. Estivill-Castro, A.T. Murray, Spatial clustering for data mining with genetic algorithms, http://citeseer.nj.nec.com/estivill-castro97spatial.html
[10] Hall, L. O.; Özyurt, B.; Bezdek, J. C., Clustering with a genetically optimized approach, IEEE Trans. Evolutionary Computat., 3, 2, 103-112 (1999)
[11] Karypis, G.; Han, E.-H.; Kumar, V., Chameleon: hierarchical clustering using dynamic modeling, Computer August, 68-75 (1999)
[12] S. Guha, R. Rastogi, K. Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data, Seattle, WA, June 1998, pp. 73-84.; S. Guha, R. Rastogi, K. Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data, Seattle, WA, June 1998, pp. 73-84. · Zbl 1006.68661
[13] T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM-SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996, pp. 103-114.; T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM-SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996, pp. 103-114.
[14] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases, Proceedings of the 1996 International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, OR, 1996, pp. 226-231.; M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases, Proceedings of the 1996 International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, OR, 1996, pp. 226-231.
[15] M. Ankerst, M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM-SIGMOD International Conference on Management of Data, Philadelphia, PA, June 1999, pp. 49-60.; M. Ankerst, M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM-SIGMOD International Conference on Management of Data, Philadelphia, PA, June 1999, pp. 49-60.
[16] W. Wang, J. Yang, R. Muntz, R., STING: a statistical information grid approach to spatial data mining, Proceedings of the 1997 International Conference on Very Large Data Base (VLDB’97), Athens, Greek, 1997, pp. 186-195.; W. Wang, J. Yang, R. Muntz, R., STING: a statistical information grid approach to spatial data mining, Proceedings of the 1997 International Conference on Very Large Data Base (VLDB’97), Athens, Greek, 1997, pp. 186-195.
[17] Cheeseman, P.; Stutz, J., Bayesian classification (AutoClass): theory and results, (Fayyard, U. M.; Piatetsky-Shapiro, G.; Smyth, P.; Uthurusamy, R., Advances in Knowledge Discovery and Data Mining (1996), AAAI/MIT Press: AAAI/MIT Press Cambridge, MA) · Zbl 0925.62119
[18] Carpenter, G. A.; Grossberg, S., A massively parallel architecture for a self-organizing neural pattern recognition machine, Comput. Vision Graphics Image Process., 37, 54-115 (1987) · Zbl 0634.68089
[19] Kohonen, T., The self organizing maps, Proc. IEEE, 78, 9, 1464-1480 (1990)
[20] Dunn, J. C., A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybernet., 3, 32-57 (1974) · Zbl 0291.68033
[21] Golay, X.; Kollias, S.; Stoll, G.; Meier, D.; Valavanis, A.; Boesiger, P., A new correlation-based fuzzy logic clustering algorithm for fMRI, Mag. Resonance Med., 40, 249-260 (1998)
[22] C.S. Möller-Levet, F. Klawonn, K.-H. Cho, O. Wolkenhauer, Fuzzy clustering of short time series and unevenly distributed sampling points, Proceedings of the 5th International Symposium on Intelligent Data Analysis, Berlin, Germany, August 28-30, 2003.; C.S. Möller-Levet, F. Klawonn, K.-H. Cho, O. Wolkenhauer, Fuzzy clustering of short time series and unevenly distributed sampling points, Proceedings of the 5th International Symposium on Intelligent Data Analysis, Berlin, Germany, August 28-30, 2003.
[23] M. Kumar, N.R. Patel, J. Woo, Clustering seasonality patterns in the presence of errors, Proceedings of KDD ’02, Edmonton, Alberta, Canada.; M. Kumar, N.R. Patel, J. Woo, Clustering seasonality patterns in the presence of errors, Proceedings of KDD ’02, Edmonton, Alberta, Canada.
[24] Kakizawa, Y.; Shumway, R. H.; Taniguchi, N., Discrimination and clustering for multivariate time series, J. Amer. Stat. Assoc., 93, 441, 328-340 (1998) · Zbl 0906.62060
[25] Dahlhaus, R., On the Kullback-Leibler information divergence of locally stationary processes, Stochastic Process. Appl., 62, 139-168 (1996) · Zbl 0849.60032
[26] Shumway, R. H., Time-frequency clustering and discriminant analysis, Stat. Probab. Lett., 63, 307-314 (2003) · Zbl 1116.62364
[27] Bezdek, J. C.; Pal, N. R., Some new indexes of cluster validity, IEEE Trans. Syst. Man Cybernet. B: Cybernet., 28, 3, 301-315 (1998)
[28] Maulik, U.; Bandyopadhyay, S., Performance evaluation of some clustering algorithms and validity indices, IEEE Trans. Pattern Anal. Mach. Intell., 24, 12, 1650-1654 (2002)
[29] Košmelj, K.; Batagelj, V., Cross-sectional approach for clustering time varying data, J. Classification, 7, 99-109 (1990)
[30] Baragona, R., A simulation study on clustering time series with meta-heuristic methods, Quad. Stat., 3, 1-26 (2001)
[31] Akaike, H., A new look at the statistical model identification, IEEE Trans. Automat. Control, 19, 716-723 (1974) · Zbl 0314.62039
[32] Schwartz, G., Estimating the dimension of a model, Ann. Stat., 6, 461-464 (1978) · Zbl 0379.62005
[33] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans. Pattern Anal. Mach. Intell., 22, 719-725 (2000)
[34] T.W. Liao, B. Bolt, J. Forester, E. Hailman, C. Hansen, R.C. Kaste, J. O’May, Understanding and projecting the battle state, 23rd Army Science Conference, Orlando, FL, December 2-5, 2002.; T.W. Liao, B. Bolt, J. Forester, E. Hailman, C. Hansen, R.C. Kaste, J. O’May, Understanding and projecting the battle state, 23rd Army Science Conference, Orlando, FL, December 2-5, 2002.
[35] J.J. van Wijk, E.R. van Selow, Cluster and calendar based visualization of time series data, Proceedings of IEEE Symposium on Information Visualization, San Francisco, CA, October 25-26, 1999.; J.J. van Wijk, E.R. van Selow, Cluster and calendar based visualization of time series data, Proceedings of IEEE Symposium on Information Visualization, San Francisco, CA, October 25-26, 1999.
[36] Wismüller, A.; Lange, O.; Dersch, D. R.; Leinsinger, G. L.; Hahn, K.; Pütz, B.; Auer, D., Cluster analysis of biomedical image time series, Int. J. Comput. Vision, 46, 2, 103-128 (2002) · Zbl 1012.68723
[37] Policker, S.; Geva, A. B., Nonstationary time series analysis by temporal clustering, IEEE Trans. Syst. Man Cybernet.-B: Cybernet., 30, 2, 339-343 (2000)
[38] Gath, I.; Geva, A. B., Unsupervised optimal fuzzy clustering, IEEE Trans. Pattern Anal. Mach. Intell., 7, 773-781 (1989) · Zbl 0709.62592
[39] T.W. Liao, Mining of vector time series by clustering, Working paper, 2005.; T.W. Liao, Mining of vector time series by clustering, Working paper, 2005.
[40] Wilpon, J. G.; Rabiner, L. R., Modified \(k\)-means clustering algorithm for use in isolated word recognition, IEEE Trans. Acoust. Speech Signal Process., 33, 3, 587-594 (1985)
[41] Shaw, C. T.; King, G. P., Using cluster analysis to classify time series, Physica D, 58, 288-298 (1992)
[42] Goutte, C.; Toft, P.; Rostrup, E., On clustering fMRI time series, Neuroimage, 9, 3, 298-310 (1999)
[43] Goutte, C.; Hansen, L. K.; Liptrot, M. G.; Rostrup, E., Feature-space clustering for fMRI meta-analysis, Hum. Brain Mapping, 13, 165-183 (2001)
[44] T.-C. Fu, F.-L. Chung, V. Ng, R. Luk, Pattern discovery from stock time series using self-organizing maps, KDD 2001 Workshop on Temporal Data Mining, August 26-29, San Francisco, 2001, pp. 27-37.; T.-C. Fu, F.-L. Chung, V. Ng, R. Luk, Pattern discovery from stock time series using self-organizing maps, KDD 2001 Workshop on Temporal Data Mining, August 26-29, San Francisco, 2001, pp. 27-37.
[45] Owsley, L. M.D.; Atlas, L. E.; Bernard, G. D., Self-organizing feature maps and hidden Markov models for machine-tool monitoring, IEEE Trans. Signal Process., 45, 11, 2787-2798 (1997)
[46] M. Vlachos, J. Lin, E. Keogh, D. Gunopulos, A wavelet-based anytime algorithm for \(k\); M. Vlachos, J. Lin, E. Keogh, D. Gunopulos, A wavelet-based anytime algorithm for \(k\)
[47] Piccolo, D., A distance measure for classifying ARMA models, J. Time Ser. Anal., 11, 2, 153-163 (1990) · Zbl 0691.62083
[48] Beran, J.; Mazzola, G., Visualizing the relationship between time series by hierarchical smoothing models, J. Comput. Graph. Stat., 8, 2, 213-238 (1999)
[49] Maharaj, E. A., Clusters of time series, J. Classification, 17, 297-314 (2000) · Zbl 1017.62079
[50] Ramoni, M.; Sebastiani, P.; Cohen, P., Bayesian clustering by dynamics, Mach. Learning, 47, 1, 91-121 (2002) · Zbl 1012.68154
[51] M. Ramoni, P. Sebastiani, P. Cohen, Multivariate clustering by dynamics, Proceedings of the 2000 National Conference on Artificial Intelligence (AAAI-2000), San Francisco, CA, 2000, pp. 633-638.; M. Ramoni, P. Sebastiani, P. Cohen, Multivariate clustering by dynamics, Proceedings of the 2000 National Conference on Artificial Intelligence (AAAI-2000), San Francisco, CA, 2000, pp. 633-638.
[52] K. Kalpakis, D. Gada, V. Puttagunta, Distance measures for effective clustering of ARIMA time-series, Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, CA, November 29-December 2, 2001, pp. 273-280.; K. Kalpakis, D. Gada, V. Puttagunta, Distance measures for effective clustering of ARIMA time-series, Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, CA, November 29-December 2, 2001, pp. 273-280.
[53] Y. Xiong, D.-Y. Yeung, Mixtures of ARMA models for model-based time series clustering, Proceedings of the IEEE International Conference on Data Mining, Maebaghi City, Japan, 9-12 December, 2002.; Y. Xiong, D.-Y. Yeung, Mixtures of ARMA models for model-based time series clustering, Proceedings of the IEEE International Conference on Data Mining, Maebaghi City, Japan, 9-12 December, 2002.
[54] D. Tran, M. Wagner, Fuzzy c-means clustering-based speaker verification, in: N.R. Pal, M. Sugeno (Eds.), AFSS 2002, Lecture Notes in Artificial Intelligence, 2275, 2002, pp. 318-324.; D. Tran, M. Wagner, Fuzzy c-means clustering-based speaker verification, in: N.R. Pal, M. Sugeno (Eds.), AFSS 2002, Lecture Notes in Artificial Intelligence, 2275, 2002, pp. 318-324. · Zbl 1053.68724
[55] T. Oates, L. Firoiu, P.R. Cohen, Clustering time series with hidden Markov models and dynamic time warping, Proceedings of the IJCAI-99 Workshop on Neural, Symbolic, and Reinforcement Learning Methods for Sequence Learning.; T. Oates, L. Firoiu, P.R. Cohen, Clustering time series with hidden Markov models and dynamic time warping, Proceedings of the IJCAI-99 Workshop on Neural, Symbolic, and Reinforcement Learning Methods for Sequence Learning.
[56] C. Li, G. Biswas, Temporal pattern generation using hidden Markov model based unsupervised classification, in: D.J. Hand, J.N. Kok, M.R. Berthold (Eds.), Lecture Notes in Computer Science, vol. 164, IDA ’99, Springer, Berlin, 1999, pp. 245-256.; C. Li, G. Biswas, Temporal pattern generation using hidden Markov model based unsupervised classification, in: D.J. Hand, J.N. Kok, M.R. Berthold (Eds.), Lecture Notes in Computer Science, vol. 164, IDA ’99, Springer, Berlin, 1999, pp. 245-256.
[57] C. Li, G. Biswas, M. Dale, P. Dale, Building models of ecological dynamics using HMM based temporal data clustering—a preliminary study, in: F. Hoffmann et al. (Eds.), IDA 2001, Lecture Notes in Computer Science, vol. 2189, 2001, pp. 53-62.; C. Li, G. Biswas, M. Dale, P. Dale, Building models of ecological dynamics using HMM based temporal data clustering—a preliminary study, in: F. Hoffmann et al. (Eds.), IDA 2001, Lecture Notes in Computer Science, vol. 2189, 2001, pp. 53-62. · Zbl 1029.68794
[58] Wang, L.; Mehrabi, M. G.; Kannatey-Asibu, E., Hidden Markov model-based wear monitoring in turning, J. Manufacturing Sci. Eng., 124, 651-658 (2002)
[59] Josien, K.; Liao, T. W., Simultaneous grouping of parts and machines with an integrated fuzzy clustering method, Fuzzy Sets Syst., 126, 1, 1-21 (2002) · Zbl 0987.91507
[60] Ho, T. K.; Hull, J. J.; Srihari, S. N., Decision combination in multiple classifier systems, IEEE Trans. Pattern Anal. Mach. Intell., 16, 1, 66-75 (1994)
[61] Ng, G. S.; Singh, H., Democracy in pattern classifications: combinations of votes from various pattern classifiers, Artif. Intell. Eng., 12, 189-204 (1998)
[62] R. Ng, J. Han, Efficient and effective clustering method for spatial data mining, Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB’94), Santiago, Chile, September 1994, pp. 144-155.; R. Ng, J. Han, Efficient and effective clustering method for spatial data mining, Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB’94), Santiago, Chile, September 1994, pp. 144-155.
[63] Ananthanarayana, V. S.; Murty, M. N.; Subramanian, D. K., Efficient clustering of large data sets, Pattern Recognition, 34, 2561-2563 (2001) · Zbl 1012.68895
[64] Keogh, E.; Chu, S.; Hart, D.; Pazzani, M., Segmenting time series: a survey and novel approach, (Last, M.; Kandel, A.; Bunke, H., Data Mining in Time Series Databases (2004), World Scientific: World Scientific Singapore)
[65] Roddick, J. F.; Spiliopoulou, M., A survey of temporal knowledge discovery paradigms and methods, IEEE Trans. Knowledge Data Eng., 14, 4, 750-767 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.