×

Model selection in neural networks: some difficulties. (English) Zbl 1085.90006

Summary: This paper considers two related issues regarding feedforward Neural Networks (NNs). The first involves the question of whether the network weights corresponding to the best fitting network are unique. Our empirical tests suggest an answer in the negative, whether using standard Backpropagation algorithm or our preferred direct (non-gradient-based) search procedure. We also offer a theoretical analysis which suggests that there will almost inevitably be functional relationships between network weights. The second issue concerns the use of standard statistical approaches to testing the significance of weights or groups of weights. Treating feedforward NNs as an interesting way to carry out nonlinear regression suggests that statistical tests should be employed. According to our results, however, statistical tests can in practice be indeterminate. It is rather difficult to choose either the number of hidden layers or the number of nodes on this basis.

MSC:

90B10 Deterministic network models in operations research
92B20 Neural networks for/in biological studies, artificial life and related topics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Anders, U.; Korn, O., Model selection in neural networks, Neural Networks, 12, 309-333 (1999)
[2] Barron, A. R., Universal approximation bounds for superpositions of a sigmoid function, IEE Transactions on Information Theory, 39, 930-945 (1993) · Zbl 0818.68126
[3] Chester, D.L., 1990. Why two hidden layers are better than one. In: International Joint Conference on Neural Networks, Washington, DC, vol. 1. Lawrence Erlbaum, pp. 265-268.; Chester, D.L., 1990. Why two hidden layers are better than one. In: International Joint Conference on Neural Networks, Washington, DC, vol. 1. Lawrence Erlbaum, pp. 265-268.
[4] Cottrell, M.; Girard, B.; Girard, Y.; Mangeas, M.; Muller, C., Neural modelling for TimeSeries: A statistical stepwise method for weight elimination, IEEE Transactions on Neural Networks, 6, 6, 1355-1364 (1995)
[5] Curry, B.; Morgan, P., Neural networks: A need for caution, OMEGA, 25, 1, 123-133 (1997)
[6] Curry, B.; Morgan, P.; Beynon, M., Neural networks and flexible approximations, IMA Journal of Mathematics Applied in Business and Industry, 11, 1, 19-35 (2000) · Zbl 0997.62074
[7] Curry, B.; Peel, M. J., Neural networks and business forecasting: An application to cross sectional audit fee data, International Journal of Commerce and Management, 8, 2, 94-120 (1998)
[8] Curry, B.; Morgan, P. H.; Silver, M., Neural networks and non linear statistical methods: An application to modelling of price quality relationships, Computers and Operations Research, 29, 951-969 (2002) · Zbl 1003.91021
[9] Cybenko, G., Approximation by superposition of a sigmoidal function, Mathematics of Control, Systems and Signals, 3, 303-314 (1989) · Zbl 0679.94019
[10] Gorr, W. L.; Nagin, D.; Szczypula, J., Comparative study of artificial neural network and statistical models for predicting student grade point averages, International Journal of Forecasting, 10, 17-34 (1994)
[11] Hecht-Neilson, R., 1989. Theory of the Backpropagation Network. In: International Joint Conference on Neural Networks, Washington, DC, vol. 1. IEEE, New York, pp. 593-605.; Hecht-Neilson, R., 1989. Theory of the Backpropagation Network. In: International Joint Conference on Neural Networks, Washington, DC, vol. 1. IEEE, New York, pp. 593-605.
[12] Kurkova, V., Kolmogorov’s theorem and multilayer neural networks, Neural Networks, 5, 501-506 (1992)
[13] Kurkova, V.; Kainen, P. C., Functionally equivalent feedforward neural networks, Neural Computation, 543-558 (1994)
[14] Lee, T.-H.; White, H.; Grainger, C. W.J., Testing for neglected non-linearity in time series models, Journal of Econometrics, 56, 269-290 (1993)
[15] Moody, J.E., 1992. The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems, NIPS4, 847-854.; Moody, J.E., 1992. The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems, NIPS4, 847-854.
[16] Morgan, P.; Curry, B.; Beynon, M., Comparing neural network approximations for different functional forms, Expert Systems, 16, 2 (1999)
[17] Morgan, P. H.; Curry, B.; Beynon, M., Pruning neural networks by minimisation of the estimated variance, European Journal of Economic and Social Systems, 14, 1, 1-16 (2000) · Zbl 1073.90503
[18] Moutinho, L.; Davies, F.; Curry, B., The impact of gender on car buyer satisfaction and loyalty, Journal of Retailing and Consumer Services, 3, 3, 135-144 (1996)
[19] Phillips, P. C.B., Partially identified econometric models, Econometric Theory, 5, 181-240 (1989)
[20] Ripley, B. D., Pattern recognition and neural networks (1996), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0853.62046
[21] Robbins, H.; Monro, S., A stochastic approximation method, Annals of Mathematical Statistics, 22 (1951) · Zbl 0054.05901
[22] Rumelhart, D. E.; Williams, R. J.; Hinton, G. E., Learning internal representations by error propagation, (McLelland, D. J.; Rumelhart, D. E., Parallel Distributed Processes (1986), MIT Press: MIT Press Cambridge, MA)
[23] Tamura, S.; Tateishi, M., Capabilities of a four layered feedforward neural network: Four layers versus three, IEEE Transactions on Neural Networks, 8, 2 (1997)
[24] Teräsvirta, T.; Lin, C. F.; Granger, C. W.J., The power of the neural network test, Journal of Time Series Analysis, 14, 2, 209-220 (1993)
[25] White, H., Asymptotic results for learning in single hidden layer feedforward network models, Journal of the American Statistical Association, 84, 1003-1008 (1989) · Zbl 0721.62081
[26] White, H., Learning in artificial neural networks: A statistical perspective, Neural Computation, 1, 425-464 (1989)
[27] Williamson, R. C.; Helmke, U., Existence and uniqueness results for neural network approximations, IEEE Transactions on Neural Networks, 6, 1, 2-13 (1995)
[28] Zhang, G.; Patuwo, B. E.; Hu, M. Y., Forecasting with artificial neural networks, International Journal of Forecasting, 14, 45-62 (1998)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.