×

Improved binary PSO for feature selection using gene expression data. (English) Zbl 1142.92319

Summary: Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. Compared to the number of genes involved, available training data sets generally have a fairly small sample size in cancer type classification. These training data limitations constitute a challenge to certain classification methodologies. A reliable selection method for genes relevant for sample classification is needed in order to speed up the processing rate, decrease the predictive error rate, and to avoid incomprehensibility due to the large number of genes investigated. Improved binary particle swarm optimization (IBPSO) is used in this study to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator of the IBPSO for gene expression data classification problems. Experimental results show that this method effectively simplifies feature selection and reduces the total number of features needed. The classification accuracy obtained by the proposed method has the highest classification accuracy in nine of the 11 gene expression data test problems, and is comparative to the classification accuracy of the two other test problems, as compared to the best results previously published.

MSC:

92C40 Biochemistry, molecular biology
92-08 Computational methods for problems pertaining to biology
92-04 Software, source code, etc. for problems pertaining to biology

Software:

GeneSrF; DistAl
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ancona, N.; Maglietta, R.; D’Addabbo, A.; Liuni, S.; Pesole, G., Regularized least squares cancer classifiers from DNA microarray data, Bioinformatics, 6, S2 (2005)
[2] Berrar, D.; Bradbury, I.; Dubitzky, W., Instance-based concept learning from multiclass DNA microarray data, Bioinformatics, 7, 73 (2006)
[3] Cover, T.; Hart, P., Nearest neighbor pattern classification, (Proceedings of the IEEE Transactions Information Theory (1967)), 21-27 · Zbl 0154.44505
[4] Crammer, K.; Singer, Y., On the learnability and design of output codes for multiclass problems, (Proceedings of the Thirteen Annual Conference on Computational Learning Theory (COLT 2000). Proceedings of the Thirteen Annual Conference on Computational Learning Theory (COLT 2000), Stanford University, Palo Alto, CA, June 28-July 1 (2000)) · Zbl 1012.68155
[5] Dasarathy, B. V., (Dasarathy, B. V., NN Concepts and Techniques, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques (1991), IEEE Computer Society Press), 1-30
[6] Diaz-Uriarte, R.; Alvarez de Andres, S., Gene selection and classification of microarray data using random forest, BMC Bioinformatics, 7, 3 (2006)
[7] Fix, E., Hodges, J.L., 1951. Discriminatory Analysis—Nonparametric Discrimination: Consistency Properties. Technical Report 21-49-004, Report no. 4, US Air Force School of Aviation Medicine, Randolph Field, pp. 261-279.; Fix, E., Hodges, J.L., 1951. Discriminatory Analysis—Nonparametric Discrimination: Consistency Properties. Technical Report 21-49-004, Report no. 4, US Air Force School of Aviation Medicine, Randolph Field, pp. 261-279. · Zbl 0715.62080
[8] Hsu, C.-W.; Lin, C.-J., A comparison of methods for multi-class support vector machines, IEEE Trans. Neural Netw., 12, 415-425 (2002)
[9] Kennedy, J.; Eberhart, R. C., Particle swarm optimization, (Proceedings of the 1995 IEEE International Conference on Neural Networks, vol. 4. Proceedings of the 1995 IEEE International Conference on Neural Networks, vol. 4, Perth, Australia (1995)), 1942-1948
[10] Kennedy, J.; Eberhart, R. C., A discrete binary version of the particle swarm algorithm. Systems, Man, and Cybernetics, 1997, (Proceedings of the IEEE International Conference on Computational Cybernetics and Simulation. Proceedings of the IEEE International Conference on Computational Cybernetics and Simulation, vol. 5, October 12-15 (1997)), 4104-4108
[11] Kennedy, J.; Eberhart, R. C.; Shi, Y., Swarm Intelligence (2001), Morgan Kaufman: Morgan Kaufman San Mateo, CA
[12] Kreßel, U., Pairwise classification and support vector machines, Advances in Kernel Methods: Support Vector Learning (1999), MIT Press: MIT Press Cambridge, MA, pp. 255-268
[13] Liu, X.; Krishnan, A.; Mondry, A., An entropy-based gene selection method for cancer classification using microarray data, BMC Bioinformatics, 6, 76 (2005)
[14] Mitchell, T. M., Machine Learning (1997), McGraw-Hill: McGraw-Hill New York, NY, USA · Zbl 0913.68167
[15] Narendra, P. M.; Fukunage, K., A branch and bound algorithm for feature subset selection, IEEE Trans. Comput., 6, 9, 917-922 (1997) · Zbl 0363.68059
[16] Oh, Hybrid genetic algorithm for feature selection, IEEE Trans. Pattern Anal. Mach. Intell., 26, 11, 2004 (2004), Nov.
[17] Palau, A. M.; Snapp, R., The labeled cell classifier: a fast approximation to k nearest neighbors, (Proceedings of the Fourteenth International Conference on Pattern Recognition, 1 (1998)), 823-827
[18] Platt, J. C.; Cristianini, N.; Shawe-Taylor, J., Large margin dags for multiclass classification, Advances in Neural Information Processing Systems 12 (2000), MIT Press, pp. 547-553
[19] Pudil, P.; Novovicova, J.; Kittler, J., Floating search methods in feature selection, Pattern Recognit. Lett., 15, 1119-1125 (1994)
[20] Raymer, M. L.; Punch, W. F.; Goodman, E. D.; Kuhn, L. A.; Jain, A. K., Dimensionality reduction using genetic algorithms, IEEE Trans. Evol. Comput., 4, 2, 164-171 (2000)
[21] Roberto, B., Using mutual information for selecting features in supervised neural net learning, IEEE Trans. Neural Netw., 5, 4, 537-550 (1994)
[22] Shi, X. H.; Liang, Y. C.; Lee, H. P.; Lu, C.; Wang, L. M., An improved ga and a novel pso-ga-based hybrid algorithm, Inf. Process. Lett., 93, 5, 255-261 (2005) · Zbl 1173.68828
[23] Shi, Y.; Eberhart, R. C., A Modified Particle Swarm Optimizer. IEEE International Conference on Evolutionary Computation (1998), Anchorage: Anchorage Alaska, pp. 69-73
[24] Specht, D. F., Probabilistic neural network, Neural Netw., 3, 109-118 (1990)
[25] Stacey, A.; Jancic, M.; Grundy, I., Particle swarm optimization with mutation, (Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2003). Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2003), Canbella, Australia (2003)), 1425-1430
[26] Statnikov, A.; Aligeris, C. F.; Tsamardinos, L.; Hardin, D.; Levy, S., A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis, Bioinformatics, 21, 5, 631-643 (2004), Sep
[27] Tang, E. K.; Suganthan, P.; Yao, X., Gene selection algorithms for microarray data based on least squares support vector machine, Bioinformatics, 7, 95 (2006)
[28] Weston, J.; Watkins, C., Support vector machines for multi-class pattern recognition, (Proceedings of the Seventh European Symposium on Artificial Neural Networks (ESANN 99). Proceedings of the Seventh European Symposium on Artificial Neural Networks (ESANN 99), Bruges (1999)), 21-23
[29] Yang, J. H.; Honavar, V., Feature subset selection using a genetic algorithm, IEEE Intell. Syst., 13, 2, 44-49 (1998)
[30] Yu, B.; Yuan, B., A more efficient branch and bound algorithm for feature selection, Pattern Recognit., 26, 6, 883-889 (1993)
[31] Zhang, H.; Sun, G., Feature selection using tabu search method, Pattern Recognit., 35, 701-711 (2002) · Zbl 0999.68231
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.