Prum, Bernard; Rodolphe, François; De Turckheim, Élisabeth Finding words with unexpected frequencies in deoxyribonucleic acid sequences. (English) Zbl 0817.92012 J. R. Stat. Soc., Ser. B 57, No. 1, 205-220 (1995). Summary: Considering a Markov chain model for deoxyribonucleic acid sequences, this paper proposes two asymptotically normal statistics to test whether the frequency of a given word is concordant with the first-order Markov chain model or not. The problem is to choose estimates \(\widehat {\mu} (W)\) of the expectation of the frequency \(M_ W\) of a word \(W\) in the observed sequence such that the asymptotic variance of \(M_ W- \widehat {\mu} (W)\) is easily computable. The first estimator is derived from the frequency of \(W^{[-1]}\), which is \(W\) with its last letter deleted. The second, following an idea of R. Cowan [J. Appl. Probab. 28, No. 4, 886-892 (1991; Zbl 0741.60071)], is the conditional expectation \(M_ W\) given the observed frequencies of all two-letter words. Two examples on phage lambda and phage T7 are shown. Cited in 1 ReviewCited in 13 Documents MSC: 92D20 Protein sequences, DNA sequences 60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) 60G42 Martingales with discrete parameter 60F05 Central limit and other weak theorems 62M02 Markov processes: hypothesis testing 92C40 Biochemistry, molecular biology Keywords:central limit theorems; unexpected frequencies; words in deoxyribonucleic acid; DNA sequences; sequencing; tables; deoxyribonucleic acid sequences; asymptotically normal statistics; first-order Markov chain model; conditional expectation Citations:Zbl 0741.60071 PDFBibTeX XMLCite \textit{B. Prum} et al., J. R. Stat. Soc., Ser. B 57, No. 1, 205--220 (1995; Zbl 0817.92012)