×

An analysis of temporal-difference learning with function approximation. (English) Zbl 0914.93075

The authors consider an irreducible aperiodic Markov chain with a cost-to-go function of the form \[ J^{*}(i) := [E \sum\limits_{t=0}^{\infty} \alpha^{t} g(i_{t}, i_{t+1})| i_{0} =i]. \] \( \{ i_{t}\mid t = 0, 1 \ldots \} \) is a sequence of states visited by the Markov chain; \(g(i, j)\) denotes a scalar function representing the cost of a transition from \(i\) to \(j; \) \(\alpha \in (0, 1)\) corresponds to a discount factor. The state space is considered in the form \( S = \{1, 2, \ldots, n\}, \) where \(n\) can take finite as well as infinite values.
The aim of the paper is to investigate approximations \(\overline{J}: S \times {\mathcal R}^{K} \rightarrow {\mathcal R}\) of the function \(J^{*}: S \rightarrow {\mathcal R}.\) Of course, the approximations are evaluated by some distance (metric). Special attention is paid to convergence results in the case of linear function approximators. The cases of a finite state space, the nonlinear approximations and the importance of on-line sampling are discussed separately.
Reviewer: V.Kankova (Praha)

MSC:

93E35 Stochastic learning and adaptive control
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
PDFBibTeX XMLCite
Full Text: DOI