×

Effect of example weights on prediction of protein-protein interactions. (English) Zbl 1119.92027

Summary: Protein-protein interactions (PPIs) prediction is an important issue in biology. Recently many computational methods have been proposed to determine PPIs. However, there is no golden standard dataset for these methods now. Furthermore, there exists different quality among training examples and the quality is always ignored by the current methods. In the condition of low-quality examples, the system should tolerate the data noise. An example weighting strategy is used in this paper to build a robust system and solve the problem of data noise. A training examples are investigated and a new example selecting/using strategy is proposed. Further, a training example weighting method based on confidence is proposed. Different weight setting strategies are discussed and the corresponding results are given in the experiment.
A new model integrating example weighting strategy, attraction-repulsion (AR) weight model, is proposed. Experimental results on Saccharomyces cerevisiae demonstrate that the new model outperforms the original AR model in the ROC score measure by over 8%. Furthermore, the example weighting strategy is applied to another domain-based PPIs prediction method, maximum likelihood estimation (MLE) method, and the modified MLE method obtains better performance than the original MLE method. At same time, our examples weighting strategy can be applied to any other training example based PPIs prediction methods.

MSC:

92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

UniProt
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Apweiler, R., The InterPro database, an integrated documentation resource for protein families, domains and functional sites, Nucleic Acids Res., 29, 37-40 (2001)
[2] Bader, G. D., BIND—the biomolecular interaction network database, Nucleic Acids Res., 29, 242-245 (2001)
[3] Bairoch, A., The universal protein resource (UniProt), Nucleic Acids Res., 33, D154-D159 (2005)
[4] Ben-Hur, A.; Noble, W. S., Choosing negative examples for the prediction of protein-protein interactions, BMC Bioinformatics., 7, S2 (2006)
[5] Ben-Hur, A.; Noble, W. S., Kernel methods for predicting protein-protein interactions, Bioinformatics, 21, i38-i46 (2005)
[6] Bock, J. R.; Gough, D. A., Predicting protein-protein interactions from primary structure, Bioinformatics, 17, 455-460 (2001)
[7] Chen, X.-W.; Liu, M., Prediction of protein-protein interactions using random decision forest framework, Bioinformatics, 21, 4394-4400 (2005)
[8] Dandekar, T., Conservation of gene order: a fingerprint of proteins that physically interact, Trends Biochem. Sci., 23, 324-328 (1998)
[9] Deng, M., Inferring domain-domain interactions from protein-protein interactions, Genome Res., 12, 1540-1548 (2002)
[10] Enright, A. J., Protein interaction maps for complete genomes based on gene fusion events, Nature, 402, 86-90 (1999)
[11] Goh, C.-S., Co-evolution of proteins with their interaction partners, J. Mol. Biol., 299, 283-293 (2000)
[12] Gomez, S. M., Learning to predict protein-protein interactions from protein sequences, Bioinformatics, 19, 1875-1881 (2003)
[13] Gribskov, M.; Robinson, N. L., Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching, Comput. Chem., 20, 25-33 (1996)
[14] Hanley, J. A.; McNeil, B. J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29-36 (1982)
[15] Jansen, R., A Bayesian networks approach for predicting protein-protein interactions from genomic data, Science, 302, 449-453 (2003)
[16] Lo, S. L., Effect of training datasets on support vector machine prediction of protein-protein interactions, Proteomics, 5, 876-884 (2005)
[17] Marcotte, E. M., Detecting protein function and protein-protein interactions from genome sequences, Science, 285, 751-753 (1999)
[18] Marcotte, E. M., A combined algorithm for genome-wide prediction of protein function, Nature, 402, 83-86 (1999)
[19] Martin, S., Predicting protein-protein interactions using signature products, Bioinformatics, 21, 218-226 (2005)
[20] Mewes, H. W., MIPS: analysis and annotation of proteins from whole genomes, Nucleic Acids Res., 32, D41-D44 (2004)
[21] Overbeek, R., The use of gene clusters to infer functional coupling, Proc. Natl. Acad. Sci., 96, 2896-2901 (1999)
[22] Pazos, F.; Valencia, A., Similarity of phylogenetic trees as indicator of protein-protein interaction, Protein Eng., 14, 609-614 (2001)
[23] Pazos, F.; Valencia, A., In silico two-hybrid system for the selection of physically interacting protein pairs, Proteins, 47, 219-227 (2002)
[24] Pellegrini, M., Assigning protein functions by comparative genome analysis: protein phylogenetic profiles, Proc. Natl. Acad. Sci., 96, 4285-4288 (1999)
[25] Qi, Y., Random forest similarity for protein-protein interaction prediction from multiple sources, (Proceedings of the Pacific Symposium on Biocomputing (PSB) (2005)), 531-542
[26] Ramani, A. K.; Marcotte, E. M., Exploiting the co-evolution of interacting proteins to discover interaction specificity, J. Mol. Biol., 327, 273-284 (2003)
[27] Sprinzak, E.; Margalit, H., Correlated sequence-signatures as markers of protein-protein interaction, J. Mol. Biol., 311, 681-692 (2001)
[28] von Mering, C., Comparative assessment of large-scale data sets of protein-protein interactions, Nature, 417, 399-403 (2002)
[29] Xenarios, I., DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions, Nucleic Acids Res., 30, 303-305 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.