The main purpose of this paper is to review the methods for Language Model (LM) adaptation. The basic paradigms, methods and theories are introduced: maximum a posteriori estimation, mixture based adaptation, Minimum Discrimination Information (MDI), as well as the models to cope with long distance dependencies. Approaches and experimental results are also surveyed. Here there are some of the authors’ conclusions: (1) Adaptation of the statistical parameters of a LM may provide an important step for improving the system robustness. Dynamic LMs can be implemented by switching between pre-determined, fixed models, depending on the truth of some preconditions by adapting in time the model parameters. The problem with LM adaptation is the difficulty to predict modification of probabilities for the words which do not appear in the adaptation data. (2) Linear combination models use interpolation of probabilities obtained with different LMs. Among the nonlinear combination methods, backing-off models appear to be promising. (3) MDI or maximum entropy-based methods have the advantage of being intuitive and general, while the GIS (generalized iterative scaling) algorithm is guaranteed to converge for consistent constraints. However, this class of methods is computationally expensive. (4) LM sources of knowledge can be $n$-gram frequencies from different topics, word-trigger statistics, cache memory $n$-grams, etc. All sources can be used with linear and nonlinear methods. In general, simple methods based on linear combination of models (some of them having time-varying, statistical parameters) seem to be suitable for most application areas.
Reviewer:
Neculai Curteanu (Iaşi)