×

Cluster-Formation und -Analyse. Theorie, FORTRAN-Programme und Beispiele. (English) Zbl 0536.62048

München-Wien: R. Oldenbourg Verlag. 236 S. DM 84.00 (1983).
This is a book on numerical classification and clustering techniques. Given m (data) points \(x_ 1,...,x_ m\) of \({\mathbb{R}}^ s\) which, in practice, represent the measurements of s variables obtained for m objects 1,...,m under investigation, the problem is to find a partition \({\mathcal C}=\{C_ 1,...,C_ n\}\) of the set of objects \(\{\) 1,...,\(m\}\) comprising n classes such that all objects in the same class are ’similar’ one to another, whilst ’dissimilar’ to the members of other classes. The resulting classes (’clusters’) will be interpreted as ’types’, ’natural groups’, or ’useful dissection parts’, depending on the application.
As a rule, in this book, the problem is treated by minimizing various clustering criteria of the type \(W({\mathcal C}):=\sum_{i}\sum_{k\in C_ i}d(x_ k,z_ i)\) over all \({\mathcal C}\), where \(d(x_ k,z_ i)\) is some quadratic distance between \(x_ k\) and \(z_ i\), a characteristic representative of the class \(C_ i\) (e.g. a mean vector or a class- specific regression hyperplane), or by minimizing the determinantal criterion. A solution \({\mathcal C}\) is obtained (approximated) either by minimum-distance (k-means) algorithms or by an iterative exchange of objects (thereby using updating formula for matrix inversions, Cholesky and QR decompositions). Adaptive distance measures are considered, too, but neither hierarchical classification methods nor probabilistic models or investigations are included.
The book introduces the mathematical concepts and algorithms (Part I, 106 p.), it presents a series of corresponding Fortran programs (Part II, 42 p.), and finally gives some illustrative numerical examples for comparing and evaluating the various methods (Part III, 70 p.). It concentrates on the mathematical and algorithmic aspects (i.e. without discussing real life problems or the interpretation of results) and contains some exercises at the end of each chapter. Actually, I know no other book where the topic is presented with the same degree of clarity and internal consistency between the three stages I,II, and III.
Given that only matrix algebra is needed as a prerequisite, the book is to be highly recommended not only as an introductory text for students and research workers in statistics or data analysis, but also for practitioners from all fields of applications and concerned with clustering problems.
Reviewer: H.H.Bock

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to statistics
62-04 Software, source code, etc. for problems pertaining to statistics
62-02 Research exposition (monographs, survey articles) pertaining to statistics