×

Multiple hypothesis testing in microarray experiments. (English) Zbl 1048.62099

Summary: DNA microarrays are part of a new and promising class of biotechnologies that allow the monitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in DNA microarray experiments is the identification of differentially expressed genes, that is, genes whose expression levels are associated with a response or covariate of interest. The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses or covariates. As a typical microarray experiment measures expression levels for thousands of genes simultaneously, large multiplicity problems are generated. This article discusses different approaches to multiple hypothesis testing in the context of DNA microarray experiments and compares the procedures on microarray and simulated data sets.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
62F03 Parametric hypothesis testing

Software:

R; sma
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O. and Staudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403 503–511.
[2] Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96 6745–6750.
[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300. · Zbl 0809.62014
[4] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188. · Zbl 1041.62061 · doi:10.1214/aos/1013699998
[5] Beran, R. (1988). Balanced simultaneous confidence sets. J. Amer. Statist. Assoc. 83 679–686. · Zbl 0662.62031 · doi:10.2307/2289291
[6] Boldrick, J. C., Alizadeh, A. A., Diehn, M., Dudoit, S., Liu, C. L., Belcher, C. E., Botstein, D., Staudt, L. M., Brown, P. O. and Relman, D. A. (2002). Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc. Natl. Acad. Sci. U.S.A. 99 972–977.
[7] Braver, S. L. (1975). On splitting the tails unequally: A new perspective on one- versus two-tailed tests. Educational and Psychological Measurement 35 283–301.
[8] Brown, P. O. and Botstein, D. (1999). Exploring the new world of the genome with DNA microarrays. Nature Genetics 21 33–37.
[9] Buckley, M. J. (2000). The Spot User’s Guide . CSIRO Mathematical and Information Sciences, North Ryde, NSW, Australia. Available at http://www.cmis.csiro.au/IAP/Spot/ spotmanual.htm.
[10] Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P. and Rubin, E. M. (2000). Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Research 10 2022–2029.
[11] Chu, G., Goss, V., Narasimhan, B. and Tibshirani, R. (2000). SAM (Significance Analysis of Microarrays)—Users guide and technical document. Technical report, Stanford Univ.
[12] Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2002). Multiple hypothesis testing in microarray experiments. Technical Report 110, Division of Biostatistics, Univ. California, Berkeley. Available at http://www.bepress.com/ucbbiostat/ paper110/. · Zbl 1048.62099 · doi:10.1214/ss/1056397487
[13] Dudoit, S., Yang, Y. H., Callow, M. J. and Speed, T. P. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statist. Sinica 12 111–139. · Zbl 1004.62088
[14] Dunn, O. J. (1958). Estimation of the means of dependent variables. Ann. Math. Statist. 29 1095–1111. · Zbl 0092.36702 · doi:10.1214/aoms/1177706443
[15] Efron, B., Storey, J. D. and Tibshirani, R. (2001). Microarrays, empirical Bayes methods, and false discovery rates. Technical Report 2001-23B/217, Dept. Statistics, Stanford Univ. · Zbl 1073.62511
[16] Efron, B., Tibshirani, R., Goss, V. and Chu, G. (2000). Microarrays and their use in a comparative experiment. Technical Report 2000-37B/213, Dept. Statistics, Stanford Univ.
[17] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160. · Zbl 1073.62511 · doi:10.1198/016214501753382129
[18] Finner, H. (1999). Stepwise multiple test procedures and control of directional errors. Ann. Statist. 27 274–289. · Zbl 0978.62057 · doi:10.1214/aos/1018031111
[19] Gabriel, K. R. (1975). A comparison of some methods of simultaneous inference in MANOVA. In Multivariate Statistical Methods: Among-Groups Covariation (W. R. Atchley and E. H. Bryant, eds.) 61–80. Dowden, Hutchinson and Ross, Stroudsburg, PA.
[20] Ge, Y., Dudoit, S. and Speed, T. P. (2003). Resampling-based multiple testing for microarray data analysis. TEST . · Zbl 1056.62117 · doi:10.1007/BF02595811
[21] Genovese, C. and Wasserman, L. (2001). Operating characteristics and extensions of the FDR procedure. Technical Report 737, Dept. Statistics, Carnegie Mellon Univ. · Zbl 1090.62072
[22] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537. · Zbl 1047.65504
[23] Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75 800–802. · Zbl 0661.62067 · doi:10.1093/biomet/75.4.800
[24] Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures . Wiley, New York. · Zbl 0731.62125
[25] Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 65–70. · Zbl 0402.62058
[26] Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75 383–386. · Zbl 0639.62025 · doi:10.1093/biomet/75.2.383
[27] Hommel, G. and Bernhard, G. (1999). Bonferroni procedures for logically related hypotheses. J. Statist. Plann. Inference 82 119–128. · Zbl 1079.62526 · doi:10.1016/S0378-3758(99)00035-X
[28] Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5 299–314.
[29] Jogdeo, K. (1977). Association and probability inequalities. Ann. Statist. 5 495–504. JSTOR: · Zbl 0401.62028 · doi:10.1214/aos/1176343846
[30] Kerr, M. K., Martin, M. and Churchill, G. A. (2000). Analysis of variance for gene expression microarray data. Journal of Computational Biology 7 819–837.
[31] Krishnaiah, P. R. and Reising, J. M. (1985). Multivariate multiple comparisons. Encyclopedia of Statistical Sciences 6 88–95. Wiley, New York.
[32] Lehmann, E. L. (1986). Testing Statistical Hypotheses , 2nd ed. Wiley, New York. · Zbl 0608.62020
[33] Lipshutz, R. J., Fodor, S., Gingeras, T. R. and Lockhart, D. J. (1999). High density synthetic oligonucleotide arrays. Nature Genetics 21 20–24.
[34] Lönnstedt, I. and Speed, T. P. (2002). Replicated microarray data. Statist. Sinica 12 31–46. · Zbl 1004.62086
[35] Manduchi, E., Grant, G. R., McKenzie, S. E., Overton, G. C., Surrey, S. and Stoeckert Jr., C. J. (2000). Generation of patterns from gene expression data by assigning confidence to differentially expressed genes. Bioinformatics 16 685–698.
[36] Mayo, D. and Spanos, A. (2002). A severe testing interpretation of Neyman–Pearson tests. Unpublished. · Zbl 1098.03030
[37] Morrison, D. F. (1990). Multivariate Statistical Methods , 3rd ed. McGraw-Hill, New York. · Zbl 0183.20605
[38] National Reading Panel (1999). Teaching children to read. Report, National Institute of Child Health and Human Development, National Institutes of Health.
[39] Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R. and Tsui, K. W. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology 8 37–52.
[40] Pepe, M. S., Longton, G., Anderson, G. L. and Schummer, M. (2003). Selecting differentially expressed genes from microarray experiments. Biometrics 59 . · Zbl 1210.62200 · doi:10.1111/1541-0420.00016
[41] Perou, C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., Pergamenschikov, A., Williams, C. F., Zhu, S. X., Lee, J. C. F., Lashkari, D., Shalon, D., Brown, P. O. and Botstein, D. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc. Natl. Acad. Sci. 96 9212–9217.
[42] Pollack, J. R., Perou, C. M., Alizadeh, A. A., Eisen, M. B., Pergamenschikov, A., Williams, C. F., Jeffrey, S. S., Botstein, D. and Brown, P. O. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature Genetics 23 41–46.
[43] Pollard, K. S. and van der Laan, M. J. (2003). Resampling-based multiple testing with asymptotic strong control of type I error. Submitted.
[44] Ramsey, P. H. (1978). Power differences between pairwise multiple comparisons. J. Amer. Statist. Assoc. 73 479–485. · Zbl 0391.62051 · doi:10.2307/2286584
[45] Reiner, A., Yekutieli, D. and Benjamini, Y. (2001). Using resampling-based FDR controlling multiple test procedures for analyzing microarray gene expression data. Unpublished.
[46] Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika 77 663–665. · doi:10.2307/2337008
[47] Ross, D. T., Scherf, U., Eisen, M. B., Perou, C. M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S. S., van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J. C. F., Lashkari, D., Shalon, D., Myers, T. G., Weinstein, J. N., Botstein, D. and Brown, P. O. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24 227–234.
[48] Scheffé, H. (1959). The Analysis of Variance . Wiley, New York. · Zbl 0086.34603
[49] Seeger, P. (1968). A note on a method for the analysis of significances en masse. Technometrics 10 586–593.
[50] Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. J. Amer. Statist. Assoc. 81 826–831. · Zbl 0603.62087 · doi:10.2307/2289016
[51] Shaffer, J. P. (1995). Multiple hypothesis testing: A review. Annual Review of Psychology 46 561–584.
[52] Shaffer, J. P. (2002). Multiplicity, directional (Type III) errors, and the null hypothesis. Psychological Methods 7 356–369.
[53] Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62 626–633. · Zbl 0158.17705 · doi:10.2307/2283989
[54] Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751–754. · Zbl 0613.62067 · doi:10.1093/biomet/73.3.751
[55] Sorić, B. (1989). Statistical “discoveries” and effect-size estimation. J. Amer. Statist. Assoc. 84 608–610.
[56] Storey, J. D. (2001). The false discovery rate: A Bayesian interpretation and the q-value. Technical Report 2001-12, Dept. Statistics, Stanford Univ.
[57] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498. · Zbl 1090.62073 · doi:10.1111/1467-9868.00346
[58] Storey, J. D. and Tibshirani, R. (2001). Estimating the positive false discovery rate under dependence, with applications to DNA microarrays. Technical Report 2001-28, Dept. Statistics, Stanford Univ.
[59] Tibshirani, R., Hastie, T., Narasimhan, B., Eisen, M., Sherlock, G., Brown, P. and Botstein, D. (2002). Exploratory screening of genes and clusters from microarray experiments. Statist. Sinica 12 47–59. · Zbl 1004.62085
[60] Troendle, J. F. (1996). A permutational step-up method of testing multiple outcomes. Biometrics 52 846–859. · Zbl 1077.62517 · doi:10.2307/2533047
[61] Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98 5116–5121. · Zbl 1012.92014 · doi:10.1073/pnas.091062498
[62] van der Laan, M. J. and Bryan, J. (2001). Gene expression analysis with the parametric bootstrap. Biostatistics 2 445–461. · Zbl 1097.62571 · doi:10.1093/biostatistics/2.4.445
[63] Westfall, P. H. and Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for \(p\)-Value Adjustment . Wiley, New York. · Zbl 0850.62368
[64] Westfall, P. H., Zaykin, D. V. and Young, S. S. (2001). Multiple tests for genetic effects in association studies. In Biostatistical Methods (S. Looney, ed.) 143–168. Humana, Totowa, NJ.
[65] Wright, S. P. (1992). Adjusted \(p\)-values for simultaneous inference. Biometrics 48 1005–1013.
[66] Yang, Y. H., Buckley, M. J., Dudoit, S. and Speed, T. P. (2002). Comparison of methods for image analysis on cDNA microarray data. J. Comput. Graph. Statist. 11 108–136. · Zbl 04572920 · doi:10.1198/106186002317375640
[67] Yang, Y. H., Dudoit, S., Luu, P. and Speed, T. P. (2001). Normalization for cDNA microarray data. In Microarrays: Optical Technologies and Informatics (M. L. Bittner, Y. Chen, A. N. Dorsel and E. R. Dougherty, eds.) 141–152. SPIE, Bellingham, WA.
[68] Yekutieli, D. and Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Statist. Plann. Inference 82 171–196. · Zbl 1063.62563 · doi:10.1016/S0378-3758(99)00041-5
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.