×

Feature selection for knowledge discovery and data mining. (English) Zbl 0908.68127

The Kluwer International Series in Engineering and Computer Science. 454. Dordrecht: Kluwer Academic Publishers. xx, 214 p. (1998).
Suppose a lot of data records (instances) being given, each one consisting of a tuple with values of some properties, called features, plus one feature describing classes into which the instances are classified. Try to find a function, called classifier, which maps feature-value tuples to classes in such a way, that it is able to predict the classes for new instances. Discovering knowledge from data this way can be hard due to a large number of features or instances or both. The objective of feature selection is to select a minimal subset of features according to some criteria so that this task can be achieved equally well or better. Thereby irrelevant or redundant features should be removed, noise ignored. This book offers an overview of the various methods and provides a framework in order to examine and categorize them.
The first chapter is introductory and gives background knowledge on knowledge discovery, machine learning and feature selection. Chapter 2 gives a unified view of several models of feature selection. While differing in search strategies, selection measures and manner of feature selection, they all have feature generation, feature evaluation, stopping criteria and testing as their parts. In Chapter 3 search direction, search strategy and evaluation measure as major aspects of feature selection are studied, the last aspect comes with a detailed example.
A categorization of feature selection algorithms now being established Chapter 4 presents some 10 typical example algorithms. Chapter 5 introduces ways to evaluate the methods, thereafter testing the presented algorithms extensively using several data sets with differing degrees of noise, redundancy, correlation and largeness. Chapter 6 is devoted to feature transformation, dimensionality reduction and data without class information.
Throughout the book relatively litte mathematical background is needed, even a quick course of statistics is provided. Every method or algorithm is given as pseudo-code, if not downloadable from the internet. Each Chapter has lots of references, there is even an appendix giving several dozens of relevant links to internet sites. The book is written in a manner so that nearly everyone should be able to understand the essence of feature selection, while at the same time being a useful reference book.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to computer science
PDFBibTeX XMLCite