×

Selection of relevant features and examples in machine learning. (English) Zbl 0904.68142

Summary: In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

C4.5
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aha, D., A study of instance-based algorithms for supervised learning tasks: mathematical, empirical and psychological evaluations, (Doctoral Dissertation (1990), Department of Information and Computer Science, University of California: Department of Information and Computer Science, University of California Irvine, CA)
[2] Aha, D. W.; Bankert, R. L., A comparative evaluation of sequential feature selection algorithms, (Fisher, D.; Lenz, J.-H., Artificial Intelligence and Statistics V (1996), Springer: Springer New York)
[3] Almuallim, H.; Dietterich, T. G., Learning with many irrelevant features, (Proceedings AAAI-91. Proceedings AAAI-91, Anaheim, CA (1991), AAAI Press), 547-552
[4] Angluin, D., Learning regular sets from queries and counterexamples, Inform. and Comput., 75, 87-106 (1987) · Zbl 0636.68112
[5] Angluin, D.; Hellerstein, L.; Karpinski, M., Learning read-once formulas with queries, J. ACM, 40, 185-210 (1993) · Zbl 0764.68139
[6] Armstrong, R.; Freitag, D.; Joachims, T.; Mitchell, T., Webwatcher: a learning apprentice for the World Wide Web, (Proceedings AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments (1993))
[7] Baluja, S.; Pomerleau, D., Dynamic relevance: vision-based focus of attention using artificial neural networks (Technical Note), Artificial Intelligence, 97, 381-395 (1997), (this issue) · Zbl 0904.68140
[8] Blum, A., Learning Boolean functions in an infinite attribute space, Machine Learning, 9, 373-386 (1992) · Zbl 0766.68108
[9] Blum, A., Empirical support for winnow and weighted-majority based algorithms: results on a calendar scheduling domain, (Proceedings 12th International Conference on Machine Learning. Proceedings 12th International Conference on Machine Learning, Lake Tahoe, CA (1995), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 64-72
[10] Blum, A.; Furst, M.; Jackson, J.; Kearns, M.; Mansour, Y.; Rudic, S., Weakly learning DNF and characterizing statistical query learning using Fourier analysis, (Proceedings 26th Annual ACM Symposium on Theory of Computing. Proceedings 26th Annual ACM Symposium on Theory of Computing, Montreal, Que. (1994)), 253-262 · Zbl 1345.68186
[11] Blum, A.; Hellerstein, L.; Littlestone, N., Learning in the presence of finitely or infinitely many irrelevant attributes, J. Comput. System Sci., 50, 32-40 (1995) · Zbl 0826.68100
[12] Blum, A.; Kannan, R., Learning an intersection of \(k\) halfspaces over a uniform distribution, (Proceedings 34th Annual IEEE Symposium on Foundations of Computer Science. Proceedings 34th Annual IEEE Symposium on Foundations of Computer Science, Palo Alto, CA (1993), IEEE), 312-320 · Zbl 0832.68088
[13] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K., Occam’s razor, Inform. Process. Lett., 24, 377-380 (1987) · Zbl 0653.68084
[14] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K., Learnability and the Vapnik-Chervonenkis dimension, J. ACM, 36, 929-965 (1989) · Zbl 0697.68079
[15] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., (Classification and Regression Trees (1984), Wadsworth: Wadsworth Belmont, CA) · Zbl 0541.62042
[16] Bshouty, N. H., Exact learning via the monotone theory, (Proceedings IEEE Symposium on Foundations of Computer Science. Proceedings IEEE Symposium on Foundations of Computer Science, Palo Alto, CA (1993), IEEE), 302-311
[17] Cardie, C., Using decision trees to improve case-based learning, (Proceedings 10th International Conference on Machine Learning. Proceedings 10th International Conference on Machine Learning, Amherst, MA (1993), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 25-32
[18] Caruana, R. A.; Freitag, D., Greedy attribute selection, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 28-36
[19] Caruana, R. A.; Freitag, D., How useful is relevance?, (Working Notes of the AAAI Fall Symposium on Relevance. Working Notes of the AAAI Fall Symposium on Relevance, New Orleans, LA (1994), AAAI Press), 25-29
[20] Catlett, J., Peepholing: choosing attributes efficiently for megainduction, (Proceedings 9th International Conference on Machine Learning. Proceedings 9th International Conference on Machine Learning, Aberdeen, Scotland (1992), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 49-54
[21] Cesa-Bianchi, N.; Freund, Y.; Helmbold, D. P.; Haussler, D.; Schapire, R. E.; Warmuth, M. K., How to use expert advice, (Proceedings Annual ACM Symposium on the Theory of Computing (1993)), 382-391 · Zbl 1310.68177
[22] Clark, P.; Niblett, T., The CN2 induction algorithm, Machine Learning, 3, 261-284 (1989)
[23] Cohn, D. A.; Ghahramani, Z.; Jordan, M. I., Active learning with statistical models, J. Artif. Intell. Research, 4, 129-145 (1996) · Zbl 0900.68366
[24] Comon, P., Independent component analysis: a new concept, Signal Process., 36, 287-314 (1994) · Zbl 0791.62004
[25] Cover, T. M.; Hart, P. E., Nearest neighbor pattern classification, IEEE Trans. Inform. Theory, 13, 21-27 (1967) · Zbl 0154.44505
[26] Daelemans, W.; Gillis, S.; Durieux, G., The acquisition of stress: a data-oriented approach, Comput. Linguistics, 20, 3, 421-451 (1994)
[27] Devijver, P. A.; Kittler, J., (Pattern Recognition: A Statistical Approach (1982), Prentice-Hall: Prentice-Hall Englewood Cliffs, NJ) · Zbl 0542.68071
[28] Dhagat, A.; Hellerstein, L., PAC learning with irrelevant attributes, (Proceedings IEEE Symposium on Foundations of Computer Science (1994), IEEE), 64-74
[29] Doak, J., An evaluation of feature-selection methods and their application to computer security, (Tech. Rept. CSE-92-18 (1992), Department of Computer Science, University of California: Department of Computer Science, University of California Davis)
[30] Drucker, H.; Schapire, R.; Simard, P., Improving performance in neural networks using a boosting algorithm, (Advances in Neural Information Processing Systems, Vol. 4 (1992), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[31] Drucker, H.; Cortes, C.; Jackel, L. D.; LeCun, Y.; Vapnik, V., Boosting and other machine learning algorithms, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 53-61
[32] Dyer, M.; Frieze, A.; Kannan, R., A random polynomial time algorithm for approximating the volume of convex bodies, (Proceedings Annual ACM Symposium on the Theory of Computing (1989)), 375-381
[33] Freund, Y., Boosting a weak learning algorithm by majority, (Proceedings 3rd Annual Workshop on Computational Learning Theory. Proceedings 3rd Annual Workshop on Computational Learning Theory, San Francisco, CA (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 202-216
[34] Freund, Y., An improved boosting algorithm and its implications on learning complexity, (Proceedings 5th Annual ACM Workshop on Computational Learning Theory. Proceedings 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA (1992), ACM Press), 391-398
[35] Garey, M.; Johnson, D., (Computers and Intractability: A Guide to the Theory of NP-Completeness (1979), W.H. Freeman: W.H. Freeman San Francisco, CA) · Zbl 0411.68039
[36] Gil, Y., Efficient domain-independent experimentation, (Proceedings 10th International Conference on Machine Learning. Proceedings 10th International Conference on Machine Learning, Amherst, MA (1993), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 128-134
[37] Greiner, R.; Grove, A. J.; Kogan, A., Knowing what doesn’t matter: exploiting the omission of irrelevant data, Artificial Intelligence, 97, 345-380 (1997), (this issue) · Zbl 0904.68141
[38] Gross, K. P., Concept acquisition through attribute evolution and experiment selection, (Doctoral Dissertation (1991), School of Computer Science, Carnegie Mellon University: School of Computer Science, Carnegie Mellon University Pittsburgh, PA)
[39] Haussler, D., Quantifying the inductive bias in concept learning, (Proceedings AAAI-86. Proceedings AAAI-86, Philadelphia, PA (1986), AAAI Press), 485-489
[40] Holte, R., Very simple classification rules perform well on most commonly used domains, Machine Learning, 11, 63-91 (1993) · Zbl 0850.68278
[41] Jackson, J., An efficient membership-query algorithm for learning DNF with respect to the uniform distribution, (Proceedings IEEE Symposium on Foundations of Computer Science (1994), IEEE) · Zbl 0897.68051
[42] John, G. H.; Kohavi, R.; Pfleger, K., Irrelevant features and the subset selection problem, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 121-129
[43] John, G. H.; Langley, P., Static vs. dynamic sampling for data mining, (Proceedings 2nd International Conference of Knowledge Discovery and Data Mining. Proceedings 2nd International Conference of Knowledge Discovery and Data Mining, Portland, OR (1996), AAAI Press), 367-370
[44] Jolliffe, I. T., (Principal Component Analysis (1986), Springer: Springer New York)
[45] Johnson, D. S., Approximation algorithms for combinatorial problems, J. Comput. System Sci., 9, 256-278 (1974) · Zbl 0296.65036
[46] Kearns, M. J.; Vazirani, U. V., (An Introduction to Computational Learning Theory (1994), MIT Press: MIT Press Cambridge, MA)
[47] Kira, K.; Rendell, L., A practical approach to feature selection, (Proceedings 9th International Conference on Machine Learning. Proceedings 9th International Conference on Machine Learning, Aberdeen, Scotland (1992), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 249-256
[48] Kivinen, J.; Warmuth, M. K., Additive versus exponentiated gradient updates for linear prediction, (Proceedings 27th Annual ACM Symposium on Theory of Computing. Proceedings 27th Annual ACM Symposium on Theory of Computing, New York (1995), ACM Press), 209-218 · Zbl 0978.68559
[49] Kivinen, J.; Warmuth, M. K.; Auer, P., The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant (Technical Note), Artificial Intelligence, 97, 325-343 (1997), (this issue) · Zbl 0904.68112
[50] Kohavi, R., The power of decision tables, (Proceedings 8th European Conference on Machine Learning (1995))
[51] Kohavi, R.; John, G. H., Wrappers for feature subset selection, Artificial Intelligence, 97, 273-324 (1997), (this issue) · Zbl 0904.68143
[52] Kohavi, R.; Langley, P.; Yun, Y., The utility of feature weighting in nearest-neighbor algorithms, (Proceedings 9th European Conference on Machine Learning. Proceedings 9th European Conference on Machine Learning, Prague (1997), Springer: Springer Berlin)
[53] Knobe, B.; Knobe, K., A method for inferring context-free grammars, Inform. and Control, 31, 129-146 (1977) · Zbl 0333.68065
[54] Koller, D.; Sahami, M., Toward optimal feature selection, (Proceedings 13th International Conference on Machine Learning. Proceedings 13th International Conference on Machine Learning, Bari, Italy (1996), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[55] Kononenko, I., Estimating attributes: analysis and extensions of RELIEF, (Proceedings 7th European Conference on Machine Learning (1994))
[56] Kubat, M.; Flotzinger, D.; Pfurtscheller, G., Discovering patterns in EEG signals: comparative study of a few methods, (Proceedings 1993 European Conference on Machine Learning. Proceedings 1993 European Conference on Machine Learning, Vienna (1993), Springer: Springer Berlin), 367-371
[57] Kulkarni, D.; Simon, H. A., Experimentation in machine discovery, (Shrager, J.; Langley, P., Computational Models of Scientific Discovery and Theory Formation (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[58] Langley, P.; Iba, W., Average-case analysis of a nearest neighbor algorithm, (Proceedings IJCAI-93. Proceedings IJCAI-93, Chambery, France (1993)), 889-894
[59] Langley, P.; Sage, S., Oblivious decision trees and abstract cases, (Working Notes of the AAAI-94 Workshop on Case-Based Reasoning. Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, Seattle, WA (1994), AAAI Press), 113-117
[60] Langley, P.; Sage, S., Induction of selective Bayesian classifiers, (Proceedings 10th Conference on Uncertainty in Artificial Intelligence. Proceedings 10th Conference on Uncertainty in Artificial Intelligence, Seattle, WA (1994), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 399-406
[61] Langley, P.; Sage, S., Scaling to domains with many irrelevant features, (Greiner, R., Computational Learning Theory and Natural Learning Systems, Vol. 4 (1997), MIT Press: MIT Press Cambridge, MA)
[62] Lewis, D. D., Representation and learning in information retrieval, (Doctoral Dissertation, Tech. Rept. UM-CS-1991-093 (1992), Department of Computer Science, University of Massachusetts: Department of Computer Science, University of Massachusetts Amherst, MA)
[63] Lewis, D. D., Feature selection and feature extraction for text categorization, (Proceedings Speech and Natural Language Workshop. Proceedings Speech and Natural Language Workshop, San Francisco (1992), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 212-217
[64] Lewis, D. D.; Catlett, J., Heterogeneous uncertainty sampling, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 148-156
[65] Lin, L. J., Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning, 8, 293-321 (1992)
[66] Littlestone, N., Learning quickly when irrelevant attributes abound: a new linear threshold algorithm, Machine Learning, 2, 285-318 (1988)
[67] Littlestone, N.; Long, P. M.; Warmuth, M. K., On-line learning of linear functions, (Proceedings 23rd Annual ACM Symposium on Theory of Computing. Proceedings 23rd Annual ACM Symposium on Theory of Computing, New Orleans, LA (1991), ACM Press), 465-475
[68] Littlestone, N.; Mesterharm, C., An apobayesian relative of winnow, (Mozer, M. C.; Jordan, M. I.; Petsche, T., Advances in Neural Information Processing Systems, Vol. 9 (1997), MIT Press: MIT Press Cambridge, MA)
[69] Littlestone, N.; Warmuth, M. K., The weighted majority algorithm, Inform. and Comput., 108, 212-261 (1994) · Zbl 0804.68121
[70] Lovász, L.; Simonovits, M., On the randomized complexity of volume and diameter, (Proceedings IEEE Symposium on Foundations of Computer Science (1992), IEEE), 482-491 · Zbl 0942.68583
[71] Lund, C.; Yannakakis, M., On the hardness of approximating minimization problems, (Proceedings Annual ACM Symposium on the Theory of Computing (1993)), 286-293 · Zbl 1310.68094
[72] Matheus, C. J.; Rendell, L. A., Constructive induction on decision trees, (Proceedings IJCAI-89. Proceedings IJCAI-89, Detroit, MI (1989), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 645-650 · Zbl 0708.68050
[73] Michalski, R. S., Pattern recognition as rule-guided inductive inference, IEEE Trans. Pattern Anal. Machine Intell., 2, 349-361 (1980) · Zbl 0471.68068
[74] Minsky, M.; Papert, S., (Perceptrons: An Introduction to Computational Geometry (1969), MIT Press: MIT Press Cambridge, MA) · Zbl 0197.43702
[75] Mitchell, T. M., Generalization as search, Artificial Intelligence. (Shavlik, J. W.; Dietterich, T. G., Readings in Machine Learning (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 18, 203-226 (1982), reprinted in:
[76] Moore, A. W.; Lee, M. S., Efficient algorithms for minimizing cross validation error, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 190-198
[77] Norton, S. W., Generating better decision trees, (Proceedings IJCAI-89. Proceedings IJCAI-89, Detroit, MI (1989), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 800-805 · Zbl 0709.68085
[78] Pazzani, M. J.; Sarrett, W., A framework for the average case analysis of conjunctive learning algorithms, Machine Learning, 9, 349-372 (1992)
[79] Pagallo, G.; Haussler, D., Boolean feature discovery in empirical learning, Machine Learning, 5, 71-99 (1990)
[80] Quinlan, J. R., Learning efficient classification procedures and their application to chess end games, (Michalski, R. S.; Carbonell, J. G.; Mitchell, T. M., Machine Learning: An Artificial Intelligence Approach (1983), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[81] Quinlan, J. R., (C4.5: Programs for Machine Learning (1993), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[82] Rajamoney, S., A computational approach to theory revision, (Shrager, J.; Langley, P., Computational Models of Scientific Discovery and Theory Formation (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[83] Rivest, R. L.; Schapire, R. E., Inference of finite automata using homing sequences, Inform. and Comput., 103, 299-347 (1993) · Zbl 0786.68082
[84] Rumelhart, D. E.; Hinton, G.; Williams, R. J., Learning internal representations by error propagation, (Rumelhart, D. E.; McClelland, J. L.; the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1 (1986), MIT Press: MIT Press Cambridge, MA)
[85] Sammut, C.; Banerji, R. B., Learning concepts by asking questions, (Michalski, R. S.; Carbonell, J. G.; Mitchell, T. M., Machine Learning: An Artificial Intelligence Approach, Vol. 2 (1986), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[86] Schapire, R. E., The strength of weak learnability, Machine Learning, 5, 197-227 (1990)
[87] Schlimmer, J. C., Efficiently inducing determinations: a complete and efficient search algorithm that uses optimal pruning, (Proceedings 10th International Conference on Machine Learning. Proceedings 10th International Conference on Machine Learning, Amherst, MA (1993), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 284-290
[88] Scott, P. D.; Markovitz, S., Representation generation in an exploratory learning system, (Fisher, D. H.; Pazzani, M. J.; Langley, P., Concept Formation: Knowledge and Experience in Unsupervised Learning (1991), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[89] Seung, H. S.; Opper, M.; Sompolinsky, H., Query by committee, (Proceedings 5th Annual Workshop on Computational Learning Theory. Proceedings 5th Annual Workshop on Computational Learning Theory, Pittsburgh, PA (1992), ACM Press: ACM Press New York), 287-294
[90] Shen, W. M.; Simon, H. A., Rule creation and rule learning through environmental exploration, (Proceedings IJCAI-89. Proceedings IJCAI-89, Detroit, MI (1989), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 675-680
[91] Sinclair, A.; Jerrum, M., Approximate counting, uniform generation and rapidly mixing Markov chains, Inform. and Comput., 82, 93-133 (1989) · Zbl 0668.05060
[92] Singh, M.; Provan, G. M., A comparison of induction algorithms for selective and non-selective Bayesian classifiers, (Proceedings 12th International Conference on Machine Learning. Proceedings 12th International Conference on Machine Learning, Lake Tahoe, CA (1995), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 497-505
[93] Singh, M.; Provan, G. M., Efficient learning of selective Bayesian network classifiers, (Proceedings 13th International Conference on Machine Learning. Proceedings 13th International Conference on Machine Learning, Bari, Italy (1996), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[94] Skalak, D. B., Prototype and feature selection by sampling and random mutation hill-climbing algorithms, (Proceedings 11th International Conference on Machine Learning. Proceedings 11th International Conference on Machine Learning, New Brunswick, NJ (1994), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 293-301
[95] Stanfill, C. W., Memory-based reasoning applied to English pronunciation, (Proceedings AAAI-87. Proceedings AAAI-87, Seattle, WA (1987), AAAI Press), 577-581
[96] Ting, K. M., Discretization of continuous-valued attributes and instance-based learning, (Tech. Rept. No. 491 (1994), Basser Department of Computer Science, University of Sydney)
[97] Townsend-Weber, T.; Kibler, D., Instance-based prediction of continuous values, (Working Notes of the AMI-94 Workshop on Case-Based Reasoning. Working Notes of the AMI-94 Workshop on Case-Based Reasoning, Seattle, WA (1994), AAAI Press), 30-35
[98] Verbeurgt, K., Learning DNF under the uniform distribution in polynomial time, (Proceedings 3rd Annual Workshop on Computational Learning Theory. Proceedings 3rd Annual Workshop on Computational Learning Theory, San Francisco, CA (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 314-325
[99] Vere, S. A., Induction of concepts in the predicate calculus, (Proceedings IJCAI-75. Proceedings IJCAI-75, Tbilisi, Georgia (1975), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 281-287
[100] Vovk, V., Aggregating strategies, (Proceedings 3rd Annual Workshop on Computational Learning Theory. Proceedings 3rd Annual Workshop on Computational Learning Theory, San Francisco, CA (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 371-383
[101] Widrow, B.; Hoff, M. E., Adaptive switching circuits, (IRE WESCON Convention Record (1960)), 96-104
[102] Winston, P. H., Learning structural descriptions from examples, (Winston, P. H., The Psychology of Computer Vision (1975), McGraw-Hill: McGraw-Hill New York) · Zbl 0518.68050
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.