The authors study a generalization of Gitting’s bandit problem in which one-step returns are products of a reward associated with an active arm multiplied by functions of the states of other arms. This generalization was introduced by {\it P. Nash} [J. R. Stat. Soc., Ser. B 42, 165-169 (1980; Zbl 0459.90087)]. The expected total reward criterion is considered. The authors introduce a notion of a dual generalized bandit problem and use it to develop index-based suboptimality bounds for policies.
Reviewer:
Eugene A.Feinberg (Stony Brook)