×

On the similarity metric and the distance metric. (English) Zbl 1168.68033

Summary: Similarity and dissimilarity measures are widely used in many research areas and applications. When a dissimilarity measure is used, it is normally required to be a distance metric. However, when a similarity measure is used, there is no formal requirement. In this article, we have three contributions. First, we give a formal definition of similarity metric. Second, we show the relationship between similarity metric and distance metric. Third, we present general solutions to normalize a given similarity metric or distance metric.

MSC:

68Q99 Theory of computing
68P10 Searching and sorting
68T10 Pattern recognition, speech recognition
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arslan, A. N.; Eğecioğlu, Oï; Pevzner, P. A., A new approach to sequence alignment: Normalized sequence alignment, Bioinformatics, 17, 4, 327-337 (2001)
[2] Bunke, H.; Shearer, K., A graph distance metric based on the maximal common subgraph, Pattern Recognition Letters, 19, 255-259 (1998) · Zbl 0905.68128
[3] Calude, C. S.; Salomaa, K.; Yu, S., Additive distances and quasi-distances between words, Journal of Universal Computer Science, 8, 2, 141-152 (2002) · Zbl 1258.68074
[4] S. Chen, B. Ma, K. Zhang, The normalized similarity metric and its applications, in: Proceedings of 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007, 2007, pp. 172-180; S. Chen, B. Ma, K. Zhang, The normalized similarity metric and its applications, in: Proceedings of 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007, 2007, pp. 172-180
[5] Horibe, Y., Entropy and correlation, IEEE Transactions on Systems, Man, and Cybernetics, 15, 641-642 (1985) · Zbl 0585.62008
[6] A.J. Knobbe, P.W. Adriaans, Analysing binary associations, in: E. Simoudis, J. Han, U. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 311-314; A.J. Knobbe, P.W. Adriaans, Analysing binary associations, in: E. Simoudis, J. Han, U. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 311-314
[7] Kvålseth, T. O., Entropy and correlation: Some comments, IEEE Transactions on Systems, Man, and Cybernetics, 17, 517-519 (1987)
[8] Li, M.; Chen, X.; Li, X.; Ma, B.; Vitányi, P. M.B., The similarity metric, IEEE Transactions on Information Theory, 50, 12, 3250-3264 (2004) · Zbl 1316.68052
[9] Linfoot, E. H., An informational measure of correlation, Information and Control, 1, 1, 85-89 (1957) · Zbl 0080.36001
[10] R. López de Mántaras, Id3 revisited: A distance-based criterion for attribute selection, in: Z. Ras (Ed.), Proceedings of the Fourth International Symposium on Methodologies for Intelligent Systems, 1989, pp. 342-350; R. López de Mántaras, Id3 revisited: A distance-based criterion for attribute selection, in: Z. Ras (Ed.), Proceedings of the Fourth International Symposium on Methodologies for Intelligent Systems, 1989, pp. 342-350
[11] B. Ma, K. Zhang, The similarity metric and the distance metric, in: Proceedings of the 6th Atlantic Symposium on Computational Biology and Genome Informatics, 2005, pp. 1239-1242; B. Ma, K. Zhang, The similarity metric and the distance metric, in: Proceedings of the 6th Atlantic Symposium on Computational Biology and Genome Informatics, 2005, pp. 1239-1242
[12] Malvestuto, F. M., Statistical treatment of the information content of a database, Information Systems, 11, 211-223 (1986) · Zbl 0617.68087
[13] Marzal, A.; Vidal, E., Computation of normalized edit distance and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 9, 926-932 (1993)
[14] Needleman, S. E.; Wunsch, C. D., A general method applicable to the search for similarities in the amino-acid sequences of two proteins, Journal of Molecular Biology, 48, 443-453 (1970)
[15] Oommen, B. J.; Zhang, K., The normalized string editing problem revisited, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 6, 669-672 (1996)
[16] Quinlan, J. R., Induction of decision trees, Machine Learning, 1, 1, 81-106 (1986)
[17] Rajski, C., A metric space of discrete probability distributions, Information and Control, 4, 4, 371-377 (1961) · Zbl 0103.35805
[18] S.C. Sahinalp, M. Tasan, J. Macker, Z.M. Ozsoyoglu, Distance based indexing for string proximity search, in Proceedings of the 19th International Conference on Data Engineering, 2003, pp. 125-136; S.C. Sahinalp, M. Tasan, J. Macker, Z.M. Ozsoyoglu, Distance based indexing for string proximity search, in Proceedings of the 19th International Conference on Data Engineering, 2003, pp. 125-136
[19] Saitou, N.; Nei, M., The neighbor-joining method: A new method for reconstructing phylogenetic trees, Molecular Biology and Evolution, 4, 406-425 (1987)
[20] Smith, T. F.; Waterman, M. S., Comparison of biosequences, Advances in Applied Mathematics, 2, 482-489 (1981) · Zbl 0489.92004
[21] Sokal, R. R.; Michener, C. D., A statistical method for evaluating systematic relationships, University of Kansas Scientific Bulletin, 28, 1409-1438 (1958)
[22] A. Stojmirovic, V. Pestov, Indexing schemes for similarity search in datasets of short protein fragments, ArXiv Computer Science e-prints (cs/0309005), September 2003; A. Stojmirovic, V. Pestov, Indexing schemes for similarity search in datasets of short protein fragments, ArXiv Computer Science e-prints (cs/0309005), September 2003
[23] Studier, J. A.; Keppler, K. J., A note on the neighbor-joining algorithm of Saitou and Nei, Molecular Biology and Evolution, 5, 729-731 (1988)
[24] Torsello, A.; Hidović-Rowe, D.; Pelillo, M., Polynomial-time metrics for attributed trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 7, 1087-1099 (2005)
[25] S.J. Wan, S.K.M. Wong, A measure for concept dissimilarity and its application in machine learning, in: Proceedings of the First International Conference on Computing and Information, 1989, pp. 267-273; S.J. Wan, S.K.M. Wong, A measure for concept dissimilarity and its application in machine learning, in: Proceedings of the First International Conference on Computing and Information, 1989, pp. 267-273
[26] Waterman, M. S.; Smith, T. F., Some biological sequence metrics, Advances in Mathematics, 20, 367-387 (1976) · Zbl 0342.92003
[27] Y.Y. Yao, S.K.M. Wong, C.J. Butz, On information-theoretic measures of attribute importance, in: N. Zhong (Ed.), Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1999, pp. 133-137; Y.Y. Yao, S.K.M. Wong, C.J. Butz, On information-theoretic measures of attribute importance, in: N. Zhong (Ed.), Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1999, pp. 133-137
[28] Zhang, K.; Shasha, D., Simple fast algorithms for the editing distance between trees and related problems, SIAM Journal on Computing, 18, 6, 1245-1262 (1989) · Zbl 0692.68047
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.