×

Statistical composition of high-scoring segments from molecular sequences. (English) Zbl 0711.92013

New probabilistic formulas that provide a benchmark for discerning distributional properties of various data statistics or letter sequences (e.g., DNA) are presented. These include the asymptotical extremal distribution of high aggregate segment scores and the limiting letter composition of high-scoring segments, as well as a number of associated conditional Gaussian central limit laws. These formulas are derived with respect to a general scoring scheme with i.i.d. letter values.
Reviewer: S.Karlin

MSC:

92D20 Protein sequences, DNA sequences
60F05 Central limit and other weak theorems
60G50 Sums of independent random variables; random walks
PDFBibTeX XMLCite
Full Text: DOI