×

Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers. (English) Zbl 1059.68101

Summary: S. Mika, G. Rätsch, J. Weston, B. Schölkopf and K.-R. Müller [in: Neural Network for Signal Processing, Vol. IX, IEEE Press, New York, 41–48 (1999)] apply the “kernel trick” to obtain a non-linear variant of Fisher’s linear discriminant analysis method, demonstrating state-of-the-art performance on a range of benchmark data sets. We show that leave-one-out cross-validation of kernel Fisher discriminant classifiers can be implemented with a computational complexity of only \(O(\ell^3)\) operations rather than the \(O(\ell^4)\) of a naïve implementation, where \(\ell\) is the number of training patterns. Leave-one-out cross-validation then becomes an attractive means of model selection in large-scale applications of kernel Fisher discriminant analysis, being significantly faster than conventional \(k\)-fold cross-validation procedures commonly used.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence

Software:

alr3
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, K.-R. Müller, Fisher discriminant analysis with kernels, in: Y.H. Hu, J. Larsen, E. Wilson, S. Douglas (Eds.), Neural Networks for Signal Processing, Vol. IX, IEEE Press, New York, 1999, pp. 41-48.; S. Mika, G. Rätsch, J. Weston, B. Schölkopf, K.-R. Müller, Fisher discriminant analysis with kernels, in: Y.H. Hu, J. Larsen, E. Wilson, S. Douglas (Eds.), Neural Networks for Signal Processing, Vol. IX, IEEE Press, New York, 1999, pp. 41-48.
[2] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines (and other kernel-based learning methods) (2000), Cambridge University Press: Cambridge University Press Cambridge, UK
[3] Schölkopf, B.; Smola, A. J., Learning with Kernels—Support Vector Machines, Regularization, Optimization and Beyond (2002), MIT Press: MIT Press Cambridge, MA
[4] Saunders, C.; Gammermann, A.; Vovk, V., Ridge regression in dual variables, (Shavlik, J., Proceedings of the 15th International Conference on Machine Learning (ICML-1998) (1998), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[5] Schölkopf, B.; Smola, A. J.; Müller, K., Kernel principal component analysis, (Gerstner, W.; Germond, A.; Hasler, M.; Nicoud, J.-D., Proceedings of the International Conference on Artificial Neural Networks (ICANN-1997). Proceedings of the International Conference on Artificial Neural Networks (ICANN-1997), Lecture Notes in Computer Science (LNCS), Vol. 1327 (1997), Springer: Springer Berlin), 583-588
[6] Lai, P. L.; Fyfe, C., Kernel and nonlinear canonical correlation analysis, Int. J. Neural Systems, 10, 5, 365-377 (2000)
[7] B.E. Boser, I.M. Guyon, V. Vapnik, A training algorithm for optimal margin classifiers, in: D. Haussler (Ed.), Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, July 1992, pp. 144-152.; B.E. Boser, I.M. Guyon, V. Vapnik, A training algorithm for optimal margin classifiers, in: D. Haussler (Ed.), Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, July 1992, pp. 144-152.
[8] Cortes, C.; Vapnik, V., Support vector networks, Mach. Learning, 20, 273-297 (1995) · Zbl 0831.68098
[9] Fisher, R. A., The use of multiple measurements in taxonomic problems, Ann. Eugenics, 7, 179-188 (1936)
[10] Mercer, J., Functions of positive and negative type and their connection with the theory of integral equations, Philos. Trans. R. Soc. London, A, 209, 415-446 (1909) · JFM 40.0408.02
[11] Tikhonov, A. N.; Arsenin, V. Y., Solutions of Ill-posed Problems (1977), Wiley: Wiley New York · Zbl 0354.65028
[12] Stone, M., Cross-validatory choice and assessment of statistical predictions, J. R. Stat. Soc. B, 36, 1, 111-147 (1974) · Zbl 0308.62063
[13] V. Vapnik, O. Chapelle, Bounds on error expectation for SVM, in: A.J. Smola, P.L. Bartlett, B. Schölkopf, D. Schuurmans (Eds.), Advances in Large Margin Classifiers, MIT Press, Cambridge, Massachusetts, USA, 2000, pp. 261-280 (Chapter 14).; V. Vapnik, O. Chapelle, Bounds on error expectation for SVM, in: A.J. Smola, P.L. Bartlett, B. Schölkopf, D. Schuurmans (Eds.), Advances in Large Margin Classifiers, MIT Press, Cambridge, Massachusetts, USA, 2000, pp. 261-280 (Chapter 14).
[14] Chapelle, C.; Vapnik, V.; Bousquet, O.; Mukherjee, S., Choosing multiple parameters for support vector machines, Mach. Learning, 46, 1, 131-159 (2002) · Zbl 0998.68101
[15] Luntz, A.; Brailovsky, V., On estimation of characters obtained in statistical procedure of recognition, Techicheskaya Kibernetica, 3 (1969), (in Russian)
[16] Vapnik, V., Statistical Learning Theory (1998), Wiley: Wiley New York · Zbl 0935.62007
[17] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proceedings of the 14th International Conference on Artificial Intelligence (IJCAI), San Mateo, CA, Morgan Kaufmann, Los Altos, CA, 1995, pp. 1137-1143.; R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proceedings of the 14th International Conference on Artificial Intelligence (IJCAI), San Mateo, CA, Morgan Kaufmann, Los Altos, CA, 1995, pp. 1137-1143.
[18] Aronszajn, N., Theory of reproducing kernels, Trans. Amer. Math. Soc., 68, 337-404 (1950) · Zbl 0037.20701
[19] Kimeldorf, G. S.; Wahba, G., Some results on Tchebycheffian spline functions, J. Math. Anal. Appl., 33, 82-95 (1971) · Zbl 0201.39702
[20] J. Xu, X. Zhang, Y. Li, Kernel MSE algorithm: a unified framework for KFD, LS-SVM and KRR, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN-2001), Washington, DC, July 2001, pp. 1486-1491.; J. Xu, X. Zhang, Y. Li, Kernel MSE algorithm: a unified framework for KFD, LS-SVM and KRR, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN-2001), Washington, DC, July 2001, pp. 1486-1491.
[21] Suykens, J. A.K.; Vandewalle, J., Least squares support vector machine classifiers, Neural Process. Lett., 9, 3, 293-300 (1990)
[22] Weisberg, S., Applied Linear Regression (1985), Wiley: Wiley New York · Zbl 0646.62058
[23] Sherman, J.; Morrison, W. J., Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix, Ann. Math. Stat., 20, 4, 621 (1949)
[24] Sherman, J.; Morrison, W. J., Adjustment of an inverse matrix corresponding to a change in one element of a given matrix, Ann. Math. Stat., 21, 1, 124-127 (1950) · Zbl 0037.00901
[25] M. Woodbury, Inverting modified matrices, Memorandum Report 42, Princeton University, Princeton, USA, 1950.; M. Woodbury, Inverting modified matrices, Memorandum Report 42, Princeton University, Princeton, USA, 1950.
[26] Bartlett, M. S., An inverse matrix adjustment arising in discriminant analysis, Ann. Math. Stat., 22, 1, 107-111 (1951) · Zbl 0042.38203
[27] Hager, W. W., Updating the inverse of a matrix, SIAM Rev., 31, 2, 221-239 (1989) · Zbl 0671.65018
[28] Golub, G. H.; Van Loan, C. F., Matrix Computations (1996), The Johns Hopkins University Press: The Johns Hopkins University Press Baltimore · Zbl 0865.65009
[29] Allen, D. M., The relationship between variable selection and prediction, Technometrics, 16, 125-127 (1974) · Zbl 0286.62044
[30] Cook, R. D.; Weisberg, S., Residuals and Influence in Regression, Monographs on Statistics and Applied Probability (1982), Chapman & Hall: Chapman & Hall New York · Zbl 0564.62054
[31] S.D. Bay, The UCI KDD archive, University of California, Department of Information and Computer Science, Irvine, CA, 1999 (; S.D. Bay, The UCI KDD archive, University of California, Department of Information and Computer Science, Irvine, CA, 1999 (
[32] Rätsch, G.; Onoda, T.; Müller, K.-R., Soft margins for AdaBoost, Mach. Learning, 42, 3, 287-320 (2001) · Zbl 0969.68128
[33] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A.J. Smola, K.-R. Müller, Invariant feature extraction and classification in feature spaces, in: S.A. Solla, T.K. Leen, K.-R. Müller (Eds.), Advances in Neural Information Processing Systems, Vol. 12, MIT Press, Cambridge, MIT, 2000, pp. 526-532.; S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A.J. Smola, K.-R. Müller, Invariant feature extraction and classification in feature spaces, in: S.A. Solla, T.K. Leen, K.-R. Müller (Eds.), Advances in Neural Information Processing Systems, Vol. 12, MIT Press, Cambridge, MIT, 2000, pp. 526-532.
[34] Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the 13th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, pp. 148-156.; Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the 13th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, pp. 148-156.
[35] Bishop, C. M., Neural Networks for Pattern Recognition (1995), Oxford University Press: Oxford University Press Oxford
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.