Karlin, Samuel; Dembo, Amir; Kawabata, Tsutomu Statistical composition of high-scoring segments from molecular sequences. (English) Zbl 0711.92013 Ann. Stat. 18, No. 2, 571-581 (1990). New probabilistic formulas that provide a benchmark for discerning distributional properties of various data statistics or letter sequences (e.g., DNA) are presented. These include the asymptotical extremal distribution of high aggregate segment scores and the limiting letter composition of high-scoring segments, as well as a number of associated conditional Gaussian central limit laws. These formulas are derived with respect to a general scoring scheme with i.i.d. letter values. Reviewer: S.Karlin Cited in 11 Documents MSC: 92D20 Protein sequences, DNA sequences 60F05 Central limit and other weak theorems 60G50 Sums of independent random variables; random walks Keywords:molecular sequences; DNA; asymptotical extremal distribution of high aggregate segment scores; limiting letter composition of high-scoring segments; conditional Gaussian central limit laws; general scoring scheme; i.i.d. letter values PDFBibTeX XMLCite \textit{S. Karlin} et al., Ann. Stat. 18, No. 2, 571--581 (1990; Zbl 0711.92013) Full Text: DOI