×

A hybrid filter/wrapper approach of feature selection using information theory. (English) Zbl 0997.68115

Summary: We focus on a hybrid approach of feature selection. We begin our analysis with a filter model, exploiting the geometrical information contained in the Minimum Spanning Tree (MST) built on the learning set. This model exploits a statistical test of relative certainty gain, used in a forward selection algorithm. In the second part of the paper, we show that the MST can be replaced by the 1 nearest-neighbor graph without challenging the statistical framework. This leads to a feature selection algorithm belonging to a new category of hybrid models (filter-wrapper). Experimental results on readily available synthetic and natural domains are presented and discussed.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence

Software:

bootstrap; C4.5; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] R. Nock, M. Sebban, Advances in adaptive prototype weighting and selection, Artif. Intell. Tools 10 (1-2) (2001), to appear.; R. Nock, M. Sebban, Advances in adaptive prototype weighting and selection, Artif. Intell. Tools 10 (1-2) (2001), to appear. · Zbl 1046.68555
[2] M. Sebban, R. Nock, Contribution of boosting in wrapper models, Proceedings of the 3rd European Conf. on Principles and Practice of KDD, 1999, pp. 214-222.; M. Sebban, R. Nock, Contribution of boosting in wrapper models, Proceedings of the 3rd European Conf. on Principles and Practice of KDD, 1999, pp. 214-222.
[3] M. Sebban, R. Nock, Prototype selection as an information-preserving problem, Proceedings of the 17th Int. Conf. on Machine Learning, 2000, pp. 855-862.; M. Sebban, R. Nock, Prototype selection as an information-preserving problem, Proceedings of the 17th Int. Conf. on Machine Learning, 2000, pp. 855-862. · Zbl 1046.68555
[4] R. Kohavi, Feature subset selection as search with probabilistic estimates, AAAI Fall Symp. on Relevance, 1994.; R. Kohavi, Feature subset selection as search with probabilistic estimates, AAAI Fall Symp. on Relevance, 1994.
[5] S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich, H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, J. Zhang, The MONK’s problems: a performance comparison of different learning algorithms, Technical Report CMU-CS-91-197, Carnegie Mellon University, 1991.; S.B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, S.E. Fahlman, D. Fisher, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R.S. Michalski, T. Mitchell, P. Pachowicz, Y. Reich, H. Vafaie, W. Van de Welde, W. Wenzel, J. Wnek, J. Zhang, The MONK’s problems: a performance comparison of different learning algorithms, Technical Report CMU-CS-91-197, Carnegie Mellon University, 1991.
[6] Quinlan, J. R., C4.5: Programs for Machine Learning (1994), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA
[7] Aha, D., Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms, Int. J. Man Mach. Studies, 36, 267-287 (1992)
[8] P. Langley, W. Iba, Average case analysis of a nearest-neighbor algorithm, Proceedings of the 13th Int. Joint Conf. on Artificial Intelligence, 1993, pp. 889-894.; P. Langley, W. Iba, Average case analysis of a nearest-neighbor algorithm, Proceedings of the 13th Int. Joint Conf. on Artificial Intelligence, 1993, pp. 889-894.
[9] Sebban, M.; Nock, R.; Chauchat, J.-H.; Rakotomalala, R., Impact of learning set quality and size on decision tree performances, Int. J. Comput. Systems Signals, 1, 85-105 (2001)
[10] Blum, A.; Langley, P., Selection of relevant features and examples in machine learning, Artif. Intell., 97, 245-272 (1997) · Zbl 0904.68142
[11] George H. John, Ron Kohavi, Karl Pfleger, Irrelevant features and the subset selection problem, Proceedings of the 11th Int. Conf. on Mach. Learning, 1994, pp. 121-129.; George H. John, Ron Kohavi, Karl Pfleger, Irrelevant features and the subset selection problem, Proceedings of the 11th Int. Conf. on Mach. Learning, 1994, pp. 121-129.
[12] R. Nock, M. Sebban, Sharper bounds for the hardness of prototype and feature selection, Proceedings of the 11th Int. Conf. on Algorithmic Learning Theory, 2000, pp. 224-237.; R. Nock, M. Sebban, Sharper bounds for the hardness of prototype and feature selection, Proceedings of the 11th Int. Conf. on Algorithmic Learning Theory, 2000, pp. 224-237. · Zbl 1046.68555
[13] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, Proceedings of the 15th Int. Joint Conf. on Artif. Intell. 1995, pp. 1137-1143.; R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, Proceedings of the 15th Int. Joint Conf. on Artif. Intell. 1995, pp. 1137-1143.
[14] Efron, B.; Tibshirani, R., An introduction to the bootstrap (1993), Chapman & Hall: Chapman & Hall New York · Zbl 0835.62038
[15] P. Langley, Selection of relevant features in machine learning, AAAI Fall Symp. on Relevance, 1994.; P. Langley, Selection of relevant features in machine learning, AAAI Fall Symp. on Relevance, 1994.
[16] D. Wettschereck, D. Aha, Weighting feature, Proceedings of the 1st Int. Conf. on Case-Based Reasoning, 1995, pp. 347-358.; D. Wettschereck, D. Aha, Weighting feature, Proceedings of the 1st Int. Conf. on Case-Based Reasoning, 1995, pp. 347-358.
[17] K. Kira, L. Rendell, A practical approach to feature selection, Proceedings of the 9th Int. Conf. on Machine Learning, 1992, pp. 249-256.; K. Kira, L. Rendell, A practical approach to feature selection, Proceedings of the 9th Int. Conf. on Machine Learning, 1992, pp. 249-256.
[18] I. Kononenko, Estimating attributes; analysis and extension of RELIEF, Proceedings of the 10th European Conf. on Machine Learning, 1995, pp. 171-182.; I. Kononenko, Estimating attributes; analysis and extension of RELIEF, Proceedings of the 10th European Conf. on Machine Learning, 1995, pp. 171-182.
[19] Rao, C., Linear Statistical Inference and Its Application (1965), Wiley: Wiley New York · Zbl 0137.36203
[20] Stanfill, C.; Waltz, D., Towards memory-based reasoning, Commun. ACM, 1213-1228 (1986)
[21] Wettschereck, D.; Dietterich, T., An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms, Mach. Learning, 19, 5-28 (1995)
[22] D. Koller, R.M. Sahami, Toward optimal feature selection, Proceedings of the 13th Int. Conf. on Machine Learning 13 (1996) 284-292.; D. Koller, R.M. Sahami, Toward optimal feature selection, Proceedings of the 13th Int. Conf. on Machine Learning 13 (1996) 284-292.
[23] Wilson, D.; Martinez, T., Improved heterogeneous distance functions, J. Arti. Intell. Res., 1-34 (1997) · Zbl 0894.68118
[24] Breiman, L.; Freidman, J. H.; Olshen, R. A.; Stone, C. J., Classification and Regression Trees. (1984), Wadsworth: Wadsworth Belmont, CA
[25] R.E. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions, Proceedings of the 11th Int. Conf. on Computational Learning Theory, 1998, pp. 80-91.; R.E. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions, Proceedings of the 11th Int. Conf. on Computational Learning Theory, 1998, pp. 80-91. · Zbl 0945.68194
[26] M.J. Kearns, Y. Mansour, On the boosting ability of top-down decision tree learning algorithms, Proceedings of the 28th Ann. ACM Symp. on the Theory of Computing, 1996, pp. 459-468.; M.J. Kearns, Y. Mansour, On the boosting ability of top-down decision tree learning algorithms, Proceedings of the 28th Ann. ACM Symp. on the Theory of Computing, 1996, pp. 459-468. · Zbl 0915.68142
[27] Light, R.; Margolin, B., An analysis of variance for categorical data, J. Amer. Stat. Assoc., 66, 534-544 (1971) · Zbl 0222.62035
[28] Langley, P., Relevance and insight in experimental studies, IEEE Expert, 11, 11-12 (1996)
[29] C.L. Blake, C.J. Merz, UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/∼mlearn/MLRepository.html.; C.L. Blake, C.J. Merz, UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/∼mlearn/MLRepository.html.
[30] Buntine, W.; Niblett, T., A further comparison of splitting rules for decision-tree induction, Mach. Learning, 8, 75-85 (1992)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.