×

Monte Carlo methods in pagerank computation: When one iteration is sufficient. (English) Zbl 1146.60056

There are several advantages of the probabilistic Monte Carlo methods (MCM) over the deterministic power iteration method: MCM already provide good estimation of the PageRank for relatively important pages after one iteration; MCM have natural parallel implementation; MCM allow one to perform continuous update of PageRank as the structure of the Web changes. In the present paper Monte Carlo-type methods for the PageRank computation are proposed and analyzed.
In the Introduction the mathematical model of PageRank as a stationary distribution of a Markov chain whose state space is the set of all Web pages with given transition matrix is developed. The original Google approach is reminded. A survey of the different manners of interpretation of the PageRank through expectations is made. A short description of a simple power iteration method to compute the PageRank is realized. The principle advantages of the probabilistic MC-type methods over the deterministic methods are discussed.
In Section 2 the concepts of five algorithms for simulation of random walks are described. Algorithm 1 is MC end-point with random start. Algorithm 2 is MC end-point with cyclic start. Algorithm 3- MC complete path is a complete path version of the MCM. Algorithm 4- MC complete path stopping at dangling nodes is another version of the complete path method. Algorithm 5- MC complete path with random start is a complete version of the MCM which admits a random start.
In Section 3 the convergence of the considered algorithms in the terms of confidence intervals is analyzed and compared. A random variable \((W_{i,j})\) distributed as a number of visits to pages is considered. The probability, the expectation and the variance of \(W_{i,j}\) are studied. The empirical mean \(\overline{W}_{i,j}\) of \(W_{i,j}\) is considered. In Theorem 3.1 the relative error of the distribution \(\pi\) of a random walk of the end-point and the relative error of the empirical mean \(\overline{W}_{i,j}\) are given. The precision of the end-point version and the complete path version of the MCM are compared.
In Section 4 numerical experiments with Web site of INRIA Sophia Antipolis confirm the theoretical analysis of the paper.
In Section 5 the considered MC algorithms for PageRank computation are discussed.
In Appendix the proof of Theorem 3.1 is exposed.

MSC:

60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
65C05 Monte Carlo methods
60J05 Discrete-time Markov processes on general state spaces
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
65C40 Numerical analysis or methods applied to Markov chains
PDFBibTeX XMLCite
Full Text: DOI