×

Finding words with unexpected frequencies in deoxyribonucleic acid sequences. (English) Zbl 0817.92012

Summary: Considering a Markov chain model for deoxyribonucleic acid sequences, this paper proposes two asymptotically normal statistics to test whether the frequency of a given word is concordant with the first-order Markov chain model or not. The problem is to choose estimates \(\widehat {\mu} (W)\) of the expectation of the frequency \(M_ W\) of a word \(W\) in the observed sequence such that the asymptotic variance of \(M_ W- \widehat {\mu} (W)\) is easily computable.
The first estimator is derived from the frequency of \(W^{[-1]}\), which is \(W\) with its last letter deleted. The second, following an idea of R. Cowan [J. Appl. Probab. 28, No. 4, 886-892 (1991; Zbl 0741.60071)], is the conditional expectation \(M_ W\) given the observed frequencies of all two-letter words. Two examples on phage lambda and phage T7 are shown.

MSC:

92D20 Protein sequences, DNA sequences
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
60G42 Martingales with discrete parameter
60F05 Central limit and other weak theorems
62M02 Markov processes: hypothesis testing
92C40 Biochemistry, molecular biology

Citations:

Zbl 0741.60071
PDFBibTeX XMLCite