×

Wrappers for feature subset selection. (English) Zbl 0904.68143

Summary: In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aha, D. W., Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms, Internat. J. Man-Machine Studies, 36, 267-287 (1992)
[2] Aha, D. W.; Bankert, R. L., Feature selection for case-based classification of cloud types: an empirical comparison, (Working Notes of the AAAI-94 Workshop on Case-Based Reasoning. Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, Seattle, WA (1994)), 106-112
[3] Aha, D. W.; Bankert, R. L., A comparative evaluation of sequential feature selection algorithms, (Rsher, D.; Lenz, H., Proceedings 5th International Workshop on Artificial Intelligence and Statistics. Proceedings 5th International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL (1995)), 1-7
[4] Aha, D. W.; Kibler, D.; Albert, M. K., Instance-based learning algorithms, Machine Learning, 6, 37-66 (1991)
[5] Almuallim, H.; Dietterich, T. G., Learning with many irrelevant features, (Proceedings AAAI-91. Proceedings AAAI-91, Anaheim, CA (1991), MIT Press: MIT Press Cambridge, MA), 547-552
[6] Almuallim, H.; Dietterich, T. G., Learning Boolean concepts in the presence of many irrelevant features, Artificial Intelligence, 69, 279-306 (1994) · Zbl 0942.68657
[7] Anderson, J. R.; Matessa, M., Explorations of an incremental, Bayesian algorithm for categorization, Machine Learning, 9, 275-308 (1992)
[8] Atkeson, C. G., Using locally weighted regression for robot learning, (Proceedings IEEE International Conference on Robotics and Automation. Proceedings IEEE International Conference on Robotics and Automation, Sacramento, CA (1991)), 958-963
[9] Bala, J.; Jong, K. A.D.; Haung, J.; Vafaie, H.; Wechsler, H., Hybrid learning using genetic algorithms and decision trees for pattern classification, (Mellish, C. S., Proceedings IJCAI-95. Proceedings IJCAI-95, Montreal, Que. (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 719-724
[10] Ben-Bassat, M., Use of distance measures, information measures and error bounds in feature evaluation, (Krishnaiah, P. R.; Kanal, L. N., Handbook of Statistics, Vol. 2 (1982), North-Holland: North-Holland Amsterdam), 773-791
[11] (Webber, B.; Nilsson, N. J., Readings in Artificial Intelligence (1981), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 79-87, reprinted in: · Zbl 0498.68054
[12] Blum, A. L.; Rivest, R. L., Training a 3-node neural network is NP-complete, Neural Networks, 5, 117-127 (1992)
[13] Boddy, M.; Dean, T., Solving time-dependent planning problems, (Sridharan, N. S., Proceedings IJCAI-89. Proceedings IJCAI-89, Detroit, MI (1989), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 979-984
[14] Brazdil, P.; Gama, J.; Henery, B., Characterizing the applicability of classification algorithms using meta-level learning, (Bergadano, F.; Raedt, L. D., Proceedings European Conference on Machine Learning (1994))
[15] Breiman, L., Bagging predictors, Machine Learning, 24, 123-140 (1996) · Zbl 0858.68080
[16] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., (Classification and Regression Trees (1984), Wadsworth: Wadsworth Belmont, CA) · Zbl 0541.62042
[17] Buntine, W., Learning classification trees, Statist. and Comput., 2, 63-73 (1992)
[18] Cardie, C., Using decision trees to improve case-based learning, (Proceedings 10th International Conference on Machine Learning. Proceedings 10th International Conference on Machine Learning, Amherst, MA (1993), Morgan Kaufmann: Morgan Kaufmann Los Altos), 25-32
[19] Caruana, R.; Freitag, D., Greedy attribute selection, (Cohen, W. W.; Hirsh, H., Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 28-36
[20] Cestnik, B., Estimating probabilities: a crucial task in machine learning, (Aiello, L. C., Proceedings ECAI-90. Proceedings ECAI-90, Stockholm, Sweden (1990)), 147-149
[21] Cover, T. M.; Campenhout, J. M.V., On the possible orderings in the measurement selection problem, IEEE Trans. Systems Man Cybernet., 7, 657-661 (1977) · Zbl 0371.62036
[22] Dasarathy, B. V., (Nearest Neighbor (AW) Norms: NN Pattern Classification Techniques (1990), IEEE Computer Society Press: IEEE Computer Society Press Los Alamitos, CA)
[23] De Mántaras, R. L., A distance-based attribute selection measure for decision tree induction, Machine Learning, 6, 81-92 (1991)
[24] Devijver, P. A.; Kittler, J., (Pattern Recognition: A Statistical Approach (1982), Prentice-Hall: Prentice-Hall Englewood, Cliffs, NJ) · Zbl 0542.68071
[25] Doak, J., An evaluation of feature selection methods and their application to computer security, (Tech. Rept. CSE-92-18 (1992), University of California: University of California Davis, CA)
[26] Domingos, P.; Pazzani, M., Beyond independence: conditions for the optimality of the simple Bayesian classifier, (Saitta, L., Proceedings 13th International Conference on Machine Learning. Proceedings 13th International Conference on Machine Learning, Bari, Italy (1996), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 105-112
[27] Dougherty, J.; Kohavi, R.; Sahami, M., Supervised and unsupervised discretization of continuous features, (Prieditis, A.; Russell, S., Proceedings 12th International Conference on Machine Learning. Proceedings 12th International Conference on Machine Learning, Lake Tahoe, CA (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 194-202
[28] Draper, N. R.; Smith, H., (Applied Regression Analysis (1981), Wiley: Wiley New York) · Zbl 0548.62046
[29] Duda, R.; Hart, P., (Pattern Classification and Scene Analysis (1973), Wiley: Wiley New York) · Zbl 0277.68056
[30] Fayyad, U. M., On the induction of decision trees for multiple concept learning, (Ph.D. Thesis (1991), EECS Department, Michigan University)
[31] Fayyad, U. M.; Irani, K. B., The attribute selection problem in decision tree generation, (Proceedings AAAI-92. Proceedings AAAI-92, San Jose, CA (1992), MIT Press: MIT Press Cambridge, MA), 104-110
[32] Fong, P. W.L., A quantitative study of hypothesis selection, (Prieditis, A.; Russell, S., Proceedings 12th International Conference on Machine Learning. Proceedings 12th International Conference on Machine Learning, Lake Tahoe, CA (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 226-234
[33] Freund, Y., Boosting a weak learning algorithm by majority, (Proceedings 3rd Annual Workshop on Computational Learning Theory. Proceedings 3rd Annual Workshop on Computational Learning Theory, San Francisco, CA (1990)), 202-216
[34] also: Inform. and Comput., to appear.; also: Inform. and Comput., to appear.
[35] Freund, Y.; Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, (Proceedings 2nd European Conference on Computational Learning Theory (1995), Springer: Springer Berlin), 23-37
[36] Furnival, G. M.; Wilson, R. W., Regression by leaps and bounds, Technometrics, 16, 499-511 (1974) · Zbl 0294.62079
[37] Geman, S.; Bienenstock, E.; Doursat, R., Neural networks and the bias/variance dilemma, Neural Comput., 1-48 (1992)
[38] Gennari, J. H.; Langley, P.; Fisher, D., Models of incremental concept formation, Artificial Intelligence, 40, 11-61 (1989)
[39] Ginsberg, M. L., (Essentials of Artificial Intelligence (1993), Morgan Kaufmann: Morgan Kaufmann Los ALtos, CA)
[40] Goldberg, D. E., (Genetic Algorithms in Search, Optimization and Machine Learning (1989), Addison-Wesley: Addison-Wesley Reading, MA) · Zbl 0721.68056
[41] Good, I. J., (The Estimation of Probabilities: An Essay on Modern Bayesian Methods (1965), MIT Press: MIT Press Cambridge, MA) · Zbl 0168.39603
[42] Greiner, R., Probabilistic hill climbing: theory and applications, (Glasgow, J.; Hadley, R., Proceedings 9th Canadian Conference on Artificial Intelligence. Proceedings 9th Canadian Conference on Artificial Intelligence, Vancouver, BC (1992), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 60-67
[43] Hancock, T. R., On the difficulty of finding small consistent decision trees (1989), Harvard University: Harvard University Cambridge, MA, Unpublished manuscript
[44] Hoeffding, I. W., Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58, 13-30 (1963) · Zbl 0127.10602
[45] Holland, J. H., (Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence (1992), MIT Press: MIT Press Cambridge, MA)
[46] Hyafil, L.; Rivest, R. L., Constructing optimal binary decision trees is NP-complete, Inform. Process. Lett., 5, 15-17 (1976) · Zbl 0333.68029
[47] John, G. H., Enhancements to the data mining process, (Ph.D. Thesis (1997), Computer Science Department, Stanford University: Computer Science Department, Stanford University CA)
[48] John, G.; Kohavi, R.; Pfleger, K., Irrelevant features and the subset selection problem, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 121-129
[49] Judd, S., On the complexity of loading shallow neural networks, J. Complexity, 4, 177-192 (1988) · Zbl 0648.68086
[50] Kaelbling, L. P., (Learning in Embedded Systems (1993), MIT Press: MIT Press Cambridge, MA)
[51] Kira, K.; Rendell, L. A., The feature selection problem: Traditional methods and a new algorithm, (Proceedings AAAI-92. Proceedings AAAI-92, San Jose, CA (1992), MIT Press: MIT Press Cambridge, MA), 129-134
[52] Kira, K.; Rendell, L. A., A practical approach to feature selection, (Proceedings 9th International Conference on Machine Learning. Proceedings 9th International Conference on Machine Learning, Aberdeen, Scotland (1992), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[53] Kittler, J., Une généralisation de quelques algorithms sous-optimaux de recherche d’ensembles d’attributs, (Proceedings Congrés Reconnaissance des Formes et Traitement des Images (1978))
[54] Kittler, J., (Feature Selection and Extraction (1986), Academic Press: Academic Press New York), 59-83, Chapter 3
[55] Kohavi, R., Feature subset selection as search with probabilistic estimates, (Proceedings AAAI Fall Symposium on Relevance. Proceedings AAAI Fall Symposium on Relevance, New Orleans, LA (1994)), 122-126
[56] Kohavi, R., The power of decision tables, (Lavrac, N.; Wrobel, S., Proceedings European Conference on Machine Learning. Proceedings European Conference on Machine Learning, Lecture Notes in Artificial Intelligence, Vol. 914 (1995), Springer: Springer Berlin), 174-189
[57] Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, (Mellish, C. S., Proceedings IJCAI-95. Proceedings IJCAI-95, Montreal, Que. (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 1137-1143
[58] Kohavi, R., Wrappers for performance enhancement and oblivious decision graphs, (STAN-CS-TR-95-1560. STAN-CS-TR-95-1560, Ph.D. Thesis, Stanford University, Computer Science Department (1995))
[59] Kohavi, R.; Frasca, B., Useful feature subsets and rough set reducts, (Proceedings 3rd International Workshop on Rough Sets and Soft Computing (1994)), 310-317
[60] also: in: Soft Computing; also: in: Soft Computing
[61] Kohavi, R.; John, G., Automatic parameter selection by minimizing estimated error, (Prieditis, A.; Russell, S., Proceedings 12th International Conference on Machine Learning. Proceedings 12th International Conference on Machine Learning, Lake Tahoe, CA (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 304-312
[62] Kohavi, R.; Sommerfield, D., Feature subset selection using the wrapper model: overfilling and dynamic search space topology, (Proceedings 1st International Conference on Knowledge Discovery and Data Mining (1995)), 192-197
[63] Kohavi, R.; Wolpert, D. H., Bias plus variance decomposilion for zero-one loss funclions, (Saitta, L., Proceedings 13th International Conference on Machine Learning. Proceedings 13th International Conference on Machine Learning, Ban, Italy (1996), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 275-283, available at:
[64] Kohavi, R.; Sommerfield, D.; Dougherty, J., Data mining using MLC++: A machine learning library in C++, (Tools with Artificial Intelligence (1996), IEEE Computer Society Press: IEEE Computer Society Press Rockville, MD), 234-245
[65] Kononenko, I.; De Raedt, L., Estimating attributes: analysis and extensions of Relief, (Bergadano, F., Proceedings European Conference on Machine Learning (1994))
[66] Kononenko, I., On biases in estimating multi-valued attributes, (Mellish, C. S., Proceedings IJCAI-95. Proceedings IJCAI-95, Montreal, Que. (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 1034-1040
[67] Koza, J., (Genetic Programming: On the Programming of Computers by Selection Means of Natural Selection (1992), MIT Press: MIT Press Cambridge, MA) · Zbl 0850.68161
[68] Krogh, A.; Vedelsby, J., Neural network ensembles, cross validation, and active learning, (Advances in Neural Information Processing Systems, Vol. 7 (1995), MIT Press: MIT Press Cambridge, MA)
[69] Kwok, S. W.; Carter, C., Multiple decision trees, (Schachter, R. D.; Levitt, T. S.; Kanal, L. N.; Lemmer, J. F., Uncertainty in Artificial Intelligence (1990), Elsevier: Elsevier Amsterdam), 327-335
[70] Laarhoven, P.; Aarts, E., (Simulated annealing: Theory and Applications (1987), Kluwer Academic Publishers: Kluwer Academic Publishers Dordrecht) · Zbl 0643.65028
[71] Langley, P., Selection of relevant features in machine learning, (Proceedings AAAI Fall Symposium on Relevance. Proceedings AAAI Fall Symposium on Relevance, New Orleans, LA (1994)), 140-144
[72] Langley, P.; Sage, S., Sage, Induction of selective Bayesian classifiers, (Proceedings 10th Conference on Uncertainty in Artificial Intelligence. Proceedings 10th Conference on Uncertainty in Artificial Intelligence, Seattle, WA (1994), Morgan Kaufmann: Morgan Kaufmann San Maleo, CA), 399-406
[73] Langley, P.; Sage, S., Oblivious decision trees and abslracl cases, (Working Notes of the AAAI-94 Workshop on Case-Based Reasoning. Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, Seattle, WA (1994), AAAI Press), 113-117
[74] Langley, P.; Iba, W.; Thompson, K., An analysis of Bayesian classifiers, (Proceedings AAAI-94. Proceedings AAAI-94, Seattle, WA (1992), AAAI Press and MIT Press), 223-228
[75] Linhart, H.; Zucchini, W., (Model Selection (1986), Wiley: Wiley New York) · Zbl 0665.62003
[76] Littlestone, N.; Warmuth, M. K., The weighted majority algorithm, Inform. and Comput., 108, 212-261 (1994) · Zbl 0804.68121
[77] Mallows, C. L., Some comments on \(c_p\), Technometrics, 15, 661-675 (1973) · Zbl 0269.62061
[78] Marill, T.; Green, D. M., On the effectiveness of receptors in recognition systems, IEEE Trans. Inform. Theory, 9, 11-17 (1963)
[79] Maron, O.; Moore, A. W., Hoeffding races: accelerating model selection search for classification and function approximation, (Advances in Neural Information Processing Systems, Vol. 6 (1994), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[80] Merz, C. J.; Murphy, P. M., UCI repository of machine learning databases (1996)
[81] Miller, A. J., Selection of subsets of regression variables, J. Roy. Statist. Soc. A, 147, 389-425 (1984) · Zbl 0584.62106
[82] Miller, A. J., (Subset Selection in Regression (1990), Chapman and Hall: Chapman and Hall London)
[83] Minsky, M. L.; Papert, S., (Perceptrons: an Introduction to Computational Geometry (1988), MIT Press: MIT Press Cambridge, MA), expanded ed. · Zbl 0197.43702
[84] Mladenić, D., Automated model selection, (ECML Workshop on Knowledge Level Modeling and Machine Learning (1995))
[85] Modrzejewski, M., Feature selection using rough sets theory, (Brazdil, P. B., Proceedings European Conference on Machine Learning (1993), Springer: Springer Berlin), 213-226
[86] Moore, A. W.; Lee, M. S., Efficient algorithms for minimizing cross validation error, (Cohen, W. W.; Hirsh, H., Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[87] Moret, B. M.E., Decision trees and diagrams, ACM Comput. Surveys, 14, 593-623 (1982)
[88] Murthy, S.; Salzberg, S., Lookahead and pathology in decision tree induction, (Mellish, C. S., Proceedings IJCAI-95. Proceedings IJCAI-95, Montreal, Que. (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 1025-1031
[89] Narendra, M. P.; Fukunaga, K., A branch and bound algorithm for feature subset selection, IEEE Trans. Comput., 26, 917-922 (1977) · Zbl 0363.68059
[90] Neter, J.; Wasserman, W.; Kutner, M. H., (Applied Linear Statistical Models (1990), Irwin: Irwin Homewood, IL)
[91] Pawlak, Z., (Rough Sets (1991), Kluwer Academic Publishers: Kluwer Academic Publishers Dordrecht)
[92] Pawlak, Z., Rough sets: present state and the future, Found. Comput. Decision Sci., 18, 157-166 (1993) · Zbl 0803.04007
[93] Pazzani, M. J., Searching for dependencies in Bayesian classifiers, (Fisher, D.; Lenz, H., Proceedings 5th International Workshop on Artificial Intelligence and Statistics. Proceedings 5th International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL (1995)) · Zbl 0728.68098
[94] Perrone, M., Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization, (Ph.D. Thesis (1993), Physics Department, Brown University: Physics Department, Brown University Providence, RI)
[95] Provan, G. M.; Singh, M., Learning Bayesian networks using feature selection, (Fisher, D.; Lenz, H., Proceedings 5th International Workshop on Artificial Intelligence and Statistics. Proceedings 5th International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL (1995)), 450-456
[96] Provost, F. J., Policies for the selection of bias in inductive machine learning, (Computer Science Department, University of Pittsburgh, Rept. No. 92-34. Computer Science Department, University of Pittsburgh, Rept. No. 92-34, Ph.D. Thesis (1992))
[97] Provost, F. J.; Buchanan, B. G., Inductive policy: the pragmatics of bias selection, Machine Learning, 20, 35-61 (1995)
[98] Quinlan, J. R., Induction of decision trees, Machine Learning. (Shavlik, J. W.; Dietterich, T. G., Readings in Machine Learning (1986), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 1, 81-106 (1986), reprinted in
[99] Quinlan, J. R., (C4.5: Programs for Machine Learning (1993), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[100] Quinlan, J. R., Oversearching and layered search in empirical learning, (Mellish, C. S., Proceedings IJCAI-95. Proceedings IJCAI-95, Montreal, Que. (1995), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 1019-1024
[101] Rendell, L.; Seshu, R., Learning hard concepts through constructive induction: Framework and rationale, Comput. Intell., 6, 247-270 (1990)
[102] Rosenblatt, F., The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review, 65, 386-408 (1958)
[103] Russell, S. J.; Norvig, P., (Artificial Intelligence: A Modern Approach (1995), Prentice Hall: Prentice Hall Englewood Cliffs, NJ) · Zbl 0835.68093
[104] Schaffer, C., Selecting a classification method by cross-validation, Machine Learning, 13, 135-143 (1993)
[105] Schapire, R. E., The strength of weak learnability, Machine Learning, 5, 197-227 (1990)
[106] Siedlecki, W.; Sklansky, J., On automatic feature selection, Internat. J. Pattern Recognition and Artificial Intelligence, 2, 197-220 (1988)
[107] Singh, M.; Provan, G. M., A comparison of induction algorithms for selective and non-selective Bayesian classifiers, (Proceedings 12th International Conference on Machine Learning. Proceedings 12th International Conference on Machine Learning, Lake Tahoe, CA (1995), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 497-505
[108] Skalak, D. B., Prototype and feature selection by sampling and random mutation hill climbing algorithms, (Cohen, W. W.; Hirsh, H., Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[109] Street, W. N.; Mangasarian, O. L.; Wolberg, W. H., An inductive learning approach to prognostic prediction, (Proceedings 12th International Conference on Machine Learning. Proceedings 12th International Conference on Machine Learning, Lake Tahoe, CA (1995), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[110] Taylor, C.; Michie, D.; Spiegelhalter, D., (Machine Learning, Neural and Statistical Classification (1994), Paramount Publishing International) · Zbl 0827.68094
[111] Thrun, The Monk’s problems: a performance comparison of different learning algorithms, (Tech. Rept. CMU-CS-91-197 (1991), Carnegie Mellon University: Carnegie Mellon University Pittsburgh, PA)
[112] Turney, P. D., Exploiting context when learning to classify, (Brazdil, P. B., Proceedings European Conference on Machine Learning (ECML) (1993)), 402-407
[113] Turney, P. D., The identification of context-sensitive features, a formal definition of context for concept learning, (Kubat, M.; Widmer, G., Proceedings Workshop on Learning in Context-Sensitive Domains (1996)), 53-59, also available as: National Research Council of Canada Tech. Rept. #39222
[114] Utgoff, P. E., An improved algorithm for incremental induction of decision trees, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 318-325
[115] Utgoff, P. E., Decision tree induction based on efficient tree restructuring, (Tech. Rept. 05-18 (1995), University of Massachusetts: University of Massachusetts Amherst, MA) · Zbl 0886.68105
[116] Vafai, H.; De Jong, K., Genetic algorithms as a tool for feature selection in machine learning, (Proceedings 4th International Conference on Tools with Artificial Intelligence (1992), IEEE Computer Society Press: IEEE Computer Society Press Rockville, MD), 200-203
[117] Vafai, H.; De Jong, K., Robust feature selection algorithms, (Proceedings 5th International Conference on Tools with Artificial Intelligence (1993), IEEE Computer Society Press: IEEE Computer Society Press Rockville, MD), 356-363
[118] Wolpert, D. H., On the connection between in-sample testing and generalization error, Complex Systems, 6, 47-94 (1992) · Zbl 0792.68144
[119] Wolpert, D. H., Stacked generalization, Neural Networks, 5, 241-259 (1992)
[120] Xu, L.; Yan, P.; Chang, T., Best first strategy for feature selection, (Proceedings 9th International Conference on Pattern Recognition (1989), IEEE Computer Society Press: IEEE Computer Society Press Rockville, MD), 706-708
[121] Yan, D.; Mukai, H., Stochastic discrete optimization, SIAM J. Control and Optimization, 30, 594-612 (1992) · Zbl 0764.90066
[122] Yu, B.; Yuan, B., A more efficient branch and bound algorithm for feature selection, Pattern Recognition, 26, 883-889 (1993)
[123] Ziarko, W., The discovery, analysis and representation of data dependencies in databases, (Piatetsky-Shapiro, G.; Frawley, W., Knowledge Discovery in Databases (1991), MIT Press: MIT Press Cambridge, MA)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.