×

On the height of digital trees and related problems. (English) Zbl 0711.68035

Summary: This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of this is the height of digital trees associated with this set of words. A word is defined as a random sequence of (possibly infinite) symbols over a finite alphabet. A key notion of an alignment matrix \(\{C_{ij}\}^ n_{i,j=1}\) is introduced where \(C_{ij}\) is the length of the longest string that is a prefix of the ith and the jth word. It is proved that the height of an associated digital tree is simply related to the alignment matrix through some order statistics. In particular, using this observation and proving some inequalities for order statistics, we establish that the height of a digital trie under an independent model (i.e., all words are statistically independent) is asymptotically equal to 2 log\({}_{\alpha}n\) where n is the number of words stored in the trie and \(\alpha\) is a parameter of the probabilistic model. This result is generalized in three directions, namely we consider b-tries, Markovian model (i.e., dependency among letters in a word), and a dependent model (i.e., dependency among words). In particular, when consecutive letters in a word are Markov dependent (Markovian model), then we demonstrate that the height converges in probability to 2 log\({}_{\theta}n\) where \(\theta\) is a parameter of the underlying Markov chain. On the other hand, for suffix trees which fall into the dependent model, we show that the height does not exceed 2 log\({}_{\kappa}n\), where \(\kappa\) is a parameter of the probabilistic model. These results find plenty of applications in the analysis of data structures built over digital words.

MSC:

68P05 Data structures
68R05 Combinatorics in computer science
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aho, A. V.; Hopcroft, J. E.; Ullman, J. D., The Design and Analysis of Computer Algorithms (1974), Reading, MA: Addison-Wesley, Reading, MA · Zbl 0326.68005
[2] Knuth, D., The Art of Computer Programming. Sorting and Searching, vol. III (1973), Reading, MA: Addison-Wesley, Reading, MA · Zbl 0302.68010
[3] Apostolico, A., The Myriad Virtues of Suffix Trees, Combinatorial Algorithms on Words, 85-96 (1985), New York: Springer-Verlag, New York · Zbl 0572.68067
[4] Fagin, R.; Nievergelt, J.; Pippenger, N.; Strong, H., Extendible Hashing: A Fast Access Method for Dynamic Files, ACM TODS, 4, 315-344 (1979) · doi:10.1145/320083.320092
[5] Flajolet, P., On the Performance Evaluation of Extendible Hashing and Trie Searching, Acta Inform., 20, 345-369 (1983) · Zbl 0515.68048 · doi:10.1007/BF00264279
[6] Gallager, R., Information Theory and Reliable Communications (1968), New York: Wiley, New York · Zbl 0198.52201
[7] Capetanakis, J., Tree Algorithms for Packet Broadcast Channels, IEEE Trans. Inform. Theory, 25, 505-525 (1979) · Zbl 0414.94019 · doi:10.1109/TIT.1979.1056093
[8] IEEE Transaction Inform. Theory,31, 2 (1985).
[9] Jacquet, Ph.; Regnier, M., Trie Partitioning Process: Limiting Distributions, Lecture Notes in Computer Science, vol. 214, 196-210 (1986), Berlin: Springer-Verlag, Berlin · Zbl 0605.68057
[10] Devroye, L., A Probabilistic Analysis of the Height of Tries and of the Complexity of Trie Sort, Acta Inform., 21, 229-232 (1984) · Zbl 0555.68037 · doi:10.1007/BF00264248
[11] Pittel, B., Asymptotic Growth of a Class of Random Trees, Ann. Probab., 13, 414-427 (1985) · Zbl 0563.60010 · doi:10.1214/aop/1176993000
[12] Pittel, B., Path in a Random Digital Tree: Limiting Distributions, Adv. in Appl. Probab., 18, 139-155 (1986) · Zbl 0588.60012 · doi:10.2307/1427240
[13] Regnier, M., On the Average Height of Trees in Digital Searching and Dynamic Hashing, Inform. Process. Lett., 13, 64-66 (1981) · Zbl 0472.68058 · doi:10.1016/0020-0190(81)90033-8
[14] Yao, A., A Note on the Analysis of Extendible Hashing, Inform. Process. Lett., 11, 84-86 (1980) · Zbl 0447.68077 · doi:10.1016/0020-0190(80)90008-3
[15] W. Szpankowski, On the Analysis of the Average Height of a Digital Trie: Another Approach, CSD TR-646, Purdue University (1986).
[16] A. Apostolico and W. Szpankowski, Self-Alignments in Words and Their Applications, CSD TR-732, Purdue University (1987), submitted to a journal. · Zbl 0769.68039
[17] Szpankowski, W., Some Results onV-ary Asymmetric Tries, J. Algorithms, 9, 224-244 (1988) · Zbl 0637.68072 · doi:10.1016/0196-6774(88)90039-9
[18] Kirschenhofer, P.; Prodinger, H.; Szpankowski, W., On the Variance of the External Path Length in a Symmetric Digital Trie, Discrete Appl. Math., 25, 129-143 (1989) · Zbl 0685.68059 · doi:10.1016/0166-218X(89)90050-4
[19] David, H., Order Statistics (1980), New York: Wiley, New York
[20] Galambos, J., The Asymptotic Theory of Extreme Order Statistics (1978), New York: Wiley, New York · Zbl 0381.62039
[21] Lai, T.; Robbins, H., A Class of Dependent Random Variables and Their Maxima, Z. Wahrsch. Vern. Gebiete, 42, 89-111 (1978) · Zbl 0377.60022 · doi:10.1007/BF00536046
[22] Billingsley, P., Probability and Measures (1986), New York: Wiley, New York · Zbl 0649.60001
[23] W. Szpankowski, (Probably) Optimal Solutions to Some Problems NOT Only on Graphs, CSD TR-780, Purdue University (1988); revision TR-872 (1989).
[24] Knuth, D., Art of Computer Programming. Fundamental Algorithms, vol. I (1973), Reading, MA: Addison-Wesley, Reading, MA · Zbl 0302.68010
[25] Sankoff, D.; Kruskal, J., Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparisons (1983), Reading, MA: Addison-Wesley, Reading, MA
[26] Bull. Math. Biol,46, No. 4 (1984) (a special commemorative issue honoring Margonet O. Dayhoff).
[27] Aldous, D., Probability Approximations via the Poisson Clumping Heristic (1989), New York: Springer-Verlag, New York · Zbl 0679.60013
[28] Kamarkar, N.; Karp, R. M.; Lueker, G. S.; Odlyzko, A. D., Probabilistic Analysis of Optimum Partitioning, J. Appl. Probab., 23, 626-645 (1986) · Zbl 0611.60011 · doi:10.2307/3214002
[29] Karlin, S.; Ost, F., Counts of Long Aligned Word Matches Among Random Letter Sequences, Adv. Appl. Probab., 19, 293-351 (1987) · Zbl 0621.60074 · doi:10.2307/1427422
[30] Kingman, J. F. C., Subadditive Processes, Ecole d’Eté de Probabilités de Saint-Flour V—1975 (1976), Berlin: Springer-Verlag, Berlin · Zbl 0367.60030
[31] L. Devroye, W. Szpankowski, and B. Rais, A Note on the Height of Suffix Trees, CSD TR-905, Purdue University (1989).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.