SP

Markov Chains and Principal Eigenvectors in Network Centrality: PageRank

10/3/2025

The pipeline for training a Graph Neural Network can be schematized as follows: Input graph → Structural features → Learning algorithm → Prediction One of the most challenging aspects of this pipeline is the design of structural features. These features aim to capture the essential patterns and relationships within the graph. Automating this process is a key goal in graph machine learning, and it is commonly referred to as feature engineering.

These are few of the possible node level features:

  • node degree (treats all neighbours equally)
  • node centrality
  • clustering coefficient
  • graphlets

Different ways to enstablish these features from a graph are:

  • engienvector centrality
  • betweenness centrality
  • closeness centrality (how close to the network)
  • and many others

Engienvector centrality

A node v\large v is important if surrounded by important neighboring nodes

cv=1λuN(v)cu\large c_v=\frac{1}{\lambda}\sum_{u \in N(v)} c_u

λ\lambda is a positive constant, it is used to normalize. The basic notion of this feature engineering task is “The more important my friends the more important I am”.

The last equation can be expressed in a matrix form:

λc=Ac(1)\large \lambda \cdot c= A \cdot c \quad (1)

This is an eigenvector/eigenvalue equation.

As we will see later, if the graph is undirected and connected, by Perron-Frobenius Theorem the largest eigenvalue is always positive and unique, and the corresponding eigenvector is strictly positive.

The leading eigenvector cmax\large c_{max} is used as a centrality score.

It’s not about how many nodes are connected with you, but how important are the nodes that are connected with you.

Eigenvector centrality measures a node’s importance based on of its neighbors, assigning higher scores to nodes connected to other influential nodes. PageRank extends this idea by introducing a damping factor, which simulates random jumps across the network. This adjustment makes PageRank more robust to dangling nodes (nodes with no outgoing links) and spider traps (subgraphs that can trap random walks), which can distort standard eigenvector centrality. By computing PageRank, one effectively obtains a principal eigenvector of the Google matrix, and the resulting scores can be used as features that capture both the structural connectivity and influence propagation in the network. Compared to pure eigenvector centrality, PageRank provides more stable and meaningful importance values, making it a better choice for feature extraction in graph-based learning tasks.

The web is a directed graph

Link analysis algorithms:

  • Pagerank
  • Personalized page rank
  • Random walk with restarst

The PageRank idea:

  • All in-links are not equal: some are more important than others
  • Recursive nature, the importance of a node is carried by the importance of other nodes.
  • A “vote” from an important page is worth more:

If a page i\large i with importance ri\large r_i has di\large d_i out-links, each link gets ridi\Large \frac{r_i}{d_i} votes. Page j\large j‘s own importance rj\large r_j is the sum of the votes n its in-links.

So we can define “rank” rj\large r_j for a node j\large j to be:

rj=ijridi(2)\large r_j = \sum_{i \to j} \frac{r_i}{d_i} \quad (2)

did_i … out degree of node i\large i.

It’s possible to solve this as a system using gaussian elimination, but it’s not scalable.

We can represent the last equation using the matrix form, we have to introduce the Markov matrix or also called stochastic adjacency matrix. More theory about the Markov stochastic matrix later.

Let page j\large j have dj\large d_j out-links. If ji\large j \to i, then Mij=1/dj\large M_{ij} = 1/d_j.

M\large M is a column stochastic matrix, this means that every column sums to 1. Now we can write the equation (2) in matrix form as:

r=Mr(3)r = M \cdot r \quad(3)

Solving systems of linear equations is a frequent necessity for Web search applications, but the magnitude of computations is usually too large for direct solution methods based on Gaussian elimination to be effective. Consequently, iterative techniques are often the only choice, and, because of size, sparsity, and memory considerations, the preferred algorithms are the simpler methods based on matrix-vector products that require no additional storage beyond that of the original data. Linear stationary iterative methods are the most common.

Appreciate the similarity between this equation and the (1) equation, the only difference is that A\large A is the adjacency matrix, while M\large M is the Markov matrix or stochastic matrix.

Connection to random walk:

Imagine a random surfer that at some time t\large t is on some page i\large i. at time t+1\large t+1 te surfer follow an out-link from i\large i uniformly at random.

Eventually it ends up in a page j\large j linked from i\large i.

let p(t)\large p(t) be the vector whose ith\large i^{th} coordinates is the probability that the surfer is at page i\large i at time t\large t. So p(t)\large p(t) is a probability distribution over pages.

To compute where the surfer will be at the time t+1\large t+1 we can apply it to the left the column stochastic matrix.

p(t+1)=Mp(t)(4) \large p(t+1) = M \cdot p(t) \quad (4)

The entry pj(t+1)\large p_j(t+1) gives the probability that the surfer is on node j\large j at time t+1\large t+1. It is computed by summing the contributions from all nodes that link to j\large j. Formally:

pj(t+1)=ijMijpi(t) \large p_j(t+1) = \sum_{i \to j} M_{ij} \, p_i(t)

Suppose the random walk reaches a state where:

p(t+1)=Mp(t)=p(t)p(t+1) = M \cdot p(t) = p(t)

Then p(t)\large p(t) is stationary distribution of random walk. Our original rank vector r\large r satisfies r=Mr\large r = M \cdot r, that means that our vector r\large r is a stationary distribution for the random walk!

How do we solve the equation:

1r=Mr\large 1\cdot r = M r

Starting from any vector u u, the limit M(M(...M(Mu))) M(M(... M(Mu))) is the long-term distribution of the surfers. When applying a linear map uMu u \to Mu again and again, we obtain a discrete dynamical system. We want to understand what happens with the orbit u1=Mu,u2=MMu,u3=MMMu u_{1} = M u, u_{2} = MMu , u_{3}=MMMu, … and find a closed formula for Mnu M^n u.

For example:

The one-dimensional discrete dynamical system xax x \to ax or xn+1=axn x_{n+1} = ax_n has the solution xn=anx0 x_n = a^nx_{0}. The value 1.03201000=1806.11 1.0320 · 1000 = 1806.11 for example is the balance on a bank account which had 1000 1000 dollars 20 years ago if the interest rate was a constant 33 percent.

If u u is an eigenvector with eigenvalue λ\lambda, then Mu=λu,M2u=M(Mu))=Mλu=λMu=λ2u Mu = λu, M^2u = M(Mu)) = Mλu = λMu = λ^2u and more generally Mnu=λnu M^n u = λ^n u.

PageRank and the Flow Equation

The PageRank vector is defined as the principal eigenvector of the transition matrix MM.

If rr is the limit of the power sequence MkuM^k u as kk \to \infty, then rr satisfies the flow equation:

r=Mrr = M r

This confirms that rr is the principal eigenvector of MM associated with the eigenvalue 11. To efficiently solve for rr in high-dimensional systems (like the web graph), we typically employ the Power Iteration Method.

Summary:

  • PageRank measures importance of nodes in a graph using the link structure of the web
  • Yo can simulate a random web surfer using stochastic adjacency matrix M
  • PageRank solves r=Mr\large r = M \cdot r where r\large r can be viewed as both the principle eigenvector of M\large M and as the stationary distribution of a random walk over the graph.

Important questions

At this point we have an intuitive knowledge of how this algorithm works, but to properly understand the essence of it, it’s necessary to dive into the mathematics that lies behind the algorithm. We ought to ask ourself these important questions:

  • Will this iterative process continue indefinitely or will it converge?
  • Under what circumstances or properties of H is it guaranteed to converge?
  • Will it converge to something that makes sense in the context of the PageRank problem?
  • Will it converge to just one vector or multiple vectors?
  • Does the convergence depend on the starting vector π(0)T ?
  • If it will converge eventually, how long is “eventually”? That is, how many iterations can we expect until convergence? In the following part of the article, I’m going to answer this questions and provide and example of this algorithm give an instance.

Power iteration

Iterative procedure that will update over time our rank vector r\large r. We start initiating every node with an initial state, continue the process until the states stabilizes.

Repeat until convergence:

rjt+1=ijrjtdir_j^{t+1}=\sum_{i \to j} \frac{r_j^t}{d_i}

Initialize r0=[1N,,1N]T\large r^0 = [\frac{1}{N}, \dots, \frac{1}{N}]^T Iterate: rt+1=Mrt\large r^{t+1} = M r^t that is the same of rjt+1=ijrjtdi\large r_j^{t+1}=\sum_{i \to j} \frac{r_j^t}{d_i} Stop when rt+1rt<ϵ\large |r^{t+1}-r^t| < \epsilon

About 50 iterations is sufficient to estimate the limiting solution.


Markov matrix

A discrete-time Markov chain is a sequence of random variables X1,X2,X3,X_1, X_2, X_3, \ldots with the Markov property, namely that the probability of moving to the next state depends only on the present state and not on the previous states:

Pr ⁣(Xn+1=x  |  X1=x1,X2=x2,,Xn=xn)=Pr ⁣(Xn+1=x  |  Xn=xn),\Pr\!\left( X_{n+1} = x \;\middle|\; X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n \right) = \Pr\!\left( X_{n+1} = x \;\middle|\; X_n = x_n \right),

If both conditional probabilities are well defined, that is, if

Pr ⁣(X1=x1,,Xn=xn)>0.\Pr\!\left( X_1 = x_1, \ldots, X_n = x_n \right) > 0.

The possible values of XiX_i form a countable set SS called the state space of the chain.

The matrix describing the markov chain is called transition matrix, but it’s also very common to call it stochastic matrix.

A stochastic matrix is an n×n\large n \times n matrix where all entries are nonnegative and each row adds up to 1.

  • The ROWS represent NOW, or FROM (Xt)(X_t);
  • The COLUMNS represent NEXT, or TO (Xt+1)(X_t+1);
  • Entry (i,j)(i, j) is the CONDITIONAL probability that NEXT=jNEXT = j, given that NOW=iNOW = i: the probability of going FROM state ii TO state jj.
pij=P(Xt+1=jXt=i)p_{ij} = \mathbb{P}(X_{t+1} = j | X_t=i)

probability vector or stochastic vector is a vector with non-negative entries that add up to one.

Spectral Properties of Markov Matrices

A Markov matrix AA always has an eigenvalue λ=1\lambda = 1. All other eigenvalues λi\lambda_i satisfy:

λi1|\lambda_i| \le 1

This property ensures the stability of the stochastic process over time.

Proof:

Consider the transpose matrix ATA^T. Since AA is row-stochastic, the sum of the entries in each row of AA is 11, which means that the sum of the entries in each column of ATA^T is 11. Thus

AT[111]=[111],A^T \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix},

so ATA^T has eigenvalue 11 with eigenvector 1=(1,1,,1)T\mathbf{1} = (1,1,\ldots,1)^T.
Since AA and ATA^T have the same characteristic polynomial, they share the same eigenvalues. Therefore, AA also has eigenvalue 11.

Now suppose vv is an eigenvector of AA with eigenvalue λ\lambda such that λ>1|\lambda| > 1. Then

Anv=λnv,A^n v = \lambda^n v,

which grows in length exponentially as nn \to \infty. In particular, for large nn, some entry of AnvA^n v must have magnitude larger than 11.

However, AnA^n is also row-stochastic (this can be shown by induction), so all its entries are nonnegative and each row sums to 11. Hence every entry of AnA^n is bounded by 11, and multiplying AnA^n by any probability vector produces another probability vector. This contradicts the assumption that an eigenvalue with λ>1|\lambda| > 1 exists.

Therefore, all eigenvalues λ\lambda of AA satisfy λ1|\lambda| \leq 1, with 11 always being an eigenvalue.


Perron–Frobenius Theory

In a presentation held by Hans Schneider titled “Why I Love Perron–Frobenius”, he addressed how the Perron-Frobenius theory of nonnegative matrices is not only extremely useful, but it is also among the most beautiful theories in mathematics.

The applications involving PageRank, HITS, and other ranking schemes help to underscore this principle. A matrix A is said to be nonnegative when each entry is a nonnegative number (denote this by writing A0\large A ≥ 0). Similarly, A\large A is a positive matrix when each aij\large a_{ij} > 0 (write A>0\large A > 0). For example, the hyperlink matrix H and the stochastic matrix M that are at the foundation of PageRank are nonnegative matrices, and the Google matrix G is a positive matrix. Consequently, properties of positive and nonnegative matrices govern the behavior of PageRank, and the Perron–Frobenius theory reveals these properties by describing the nature of the dominant eigenvalues and eigenvectors of positive and nonnegative matrices.

For a matrix AA, the spectral radius is defined as

ρ(A)=max{λ:λσ(A)},\rho(A) = \max \{ |\lambda| : \lambda \in \sigma(A) \},

that is, the maximum absolute value among the eigenvalues of AA.

In Perron’s theorem, we set r=ρ(A)r = \rho(A), meaning that the eigenvalue under consideration is precisely the one with the largest modulus.
For positive matrices (A>0A > 0), this eigenvalue rr is real, positive, simple, and associated with an eigenvector having all positive components.

Perron’s Theorem for Positive Matrices

If An×n>0A_{n \times n} > 0 with r=ρ(A)r = \rho(A), then:

  1. r>0r > 0.
  2. rσ(A)r \in \sigma(A) (the Perron root).
  3. alg multA(r)=1\text{alg mult}_A(r) = 1 (simple root).
  4. There exists an eigenvector x>0x > 0 such that Ax=rxAx = rx.
  5. The Perron vector pp is unique: Ap=rp,p>0,p1=1Ap = rp, \quad p > 0, \quad \|p\|_1 = 1
  6. rr is the only eigenvalue on the spectral circle.
  7. Collatz–Wielandt formula: r=maxxNmin1inxi0[Ax]ixir = \max_{x \in N} \min_{\substack{1 \leq i \leq n \\ x_i \neq 0}} \frac{[Ax]_i}{x_i}

Perron’s theorem for positive matrices is a powerful result, so it’s only natural to ask what happens when zero entries appear. Not all is lost if we are willing to be flexible. The next theorem says that part of Perron’s theorem for positive matrices can be extended to nonnegative matrices by sacrificing the existence of a positive eigenvector for a nonnegative one.

Perron’s Theorem for Nonnegative Matrices

Perron’s Theorem for Nonnegative Matrices

If An×n0A_{n \times n} \geq 0 with r=ρ(A)r = \rho(A), the following statements are true:

  • rσ(A)r \in \sigma(A) (note that r=0r = 0 is possible).
  • There exists an eigenvector x0x \geq 0 such that Ax=rxAx = rx.
  • The Collatz–Wielandt formula remains valid.

Irreducibility and Connectivity

Statement: Irreducibility

A square matrix AA is irreducible if and only if its directed graph is strongly connected.

Equivalently, for each pair of indices i,ji, j there exist t1t \ge 1 and intermediate indices k1,,kt1k_1, \dots, k_{t-1} such that:

aik1ak1k2akt1j>0a_{i k_1} \, a_{k_1 k_2} \cdots a_{k_{t-1} j} > 0

Another equivalent characterization is: AA is irreducible if and only if there does not exist a permutation matrix PP for which:

PAP=[XY0Z]P^{\top} A P = \begin{bmatrix} X & Y \\[4pt] 0 & Z \end{bmatrix}

where XX and ZZ are square blocks. In other words, AA cannot be put into a nontrivial block upper-triangular form.

Perron-Frobenius Theorem

Frobenius’s contribution was to realize that while properties 1, 3, 4, and 6 in Perron’s theorem for positive matrices can be lost when zeros creep into the picture (i.e., for nonnegative matrices), the trouble is not simply the existence of zero entries, but rather the problem is the location of the zero entries. In other words, Frobenius realized that the lost properties 1, 3, and 4 are in fact not lost when the zeros are in just the right locations—namely the locations that ensure that the matrix is irreducible. Unfortunately irreducibility alone still does not save property 6— it remains lost.

Perron–Frobenius Theorem

Let ARn×nA \in \mathbb{R}^{n \times n} be an irreducible matrix with A0A \ge 0. Then:

  1. r=ρ(A)>0r = \rho(A) > 0.
  2. rσ(A)r \in \sigma(A) (the Perron root).
  3. algmultA(r)=1\operatorname{algmult}_A(r) = 1 (the Perron root is simple).
  4. There exists an eigenvector x>0x > 0 such that: Ax=rxAx = rx
  5. The Perron vector is the unique vector defined by: Ap=rp,p>0,p1=1Ap = rp, \quad p > 0, \quad \|p\|_1 = 1 Except for positive multiples of pp, there are no other nonnegative eigenvectors of AA, regardless of the eigenvalue.
  6. rr need not be the only eigenvalue on the spectral circle of AA.
  7. Collatz–Wielandt formula: r=maxxNf(x)r = \max_{x \in N} f(x) where
    f(x)=min1inxi0[Ax]ixi,N={xx0,x0}f(x) = \min_{\substack{1 \leq i \leq n \\ x_i \neq 0}} \frac{[Ax]_i}{x_i}, \quad N = \{ x \mid x \ge 0, \, x \neq 0 \}

Irreducible Markov Chains

Analyzing limiting properties of Markov chains requires that the class of stochastic matrices (and hence the class of stationary Markov chains) be divided into four mutually exclusive categories.

  1. M is irreducible with limkMk\displaystyle \lim_{k \to \infty} M^k existing
    (i.e., M is primitive).

  2. M is irreducible with limkMk\displaystyle \lim_{k \to \infty} M^k not existing
    (i.e., M is imprimitive).

  3. M is reducible with limkMk\displaystyle \lim_{k \to \infty} M^k existing.

  4. M is reducible with limkMk\displaystyle \lim_{k \to \infty} M^k not existing.

Let P\large P be the transition probability matrix for an irreducible Markov chain on states {S1,S2,,Sn}\{S_1, S_2, \dots, S_n\}, and let πT\large \pi^T be the left-hand Perron vector for P\large P (i.e., πTP=πT\large \pi^T P = \pi^T, π1=1\|\pi\|_1 = 1).

The following hold for every initial distribution pT(0)p^T(0):

  • The kthk^{th} step transition matrix is PkP^k.
    The (i,j)\large (i,j)-entry in PkP^k is the probability of moving from SiS_i to SjS_j in exactly kk steps.

  • The kkth step distribution vector is given by

    pT(k)=pT(0)Pkp^T(k) = p^T(0) P^k
  • If PP is primitive (aperiodic), and if e\large e is the column vector of all 1’s, then

    limkPk=eπTandlimkpT(k)=πT\lim_{k \to \infty} P^k = e \pi^T \quad \text{and} \quad \lim_{k \to \infty} p^T(k) = \pi^T
  • If PP is imprimitive (periodic), then

    limkI+P++Pk1k=eπT\lim_{k \to \infty} \frac{I + P + \cdots + P^{k-1}}{k} = e \pi^T

    and

    limkpT(0)+pT(1)++pT(k1)k=πT\lim_{k \to \infty} \frac{p^T(0) + p^T(1) + \cdots + p^T(k-1)}{k} = \pi^T
  • Regardless of whether PP is primitive or imprimitive, the jthj^{th} component πj\pi_j of πT\pi^T represents the long-run fraction of time that the chain is in SjS_j.

  • The vector πT\pi^T is the unique stationary distribution vector for the chain, because it is the unique probability distribution satisfying

    πTP=πT \large \pi^T P = \pi^T

So we have seen that a unique positive PageRank vector exists when the Google matrix is stochastic and irreducible. Further, with the additional property of aperiodicity, the power method will converge to this PageRank vector, regardless of the starting vector for the iterative process.

Implementation

Problems to handle: If there are:

  • pages with dead ends (have no out-links)
  • Spider traps (all out-links are within the group) Our random walk could fail. The solution for spider traps is teleport, at each time step, the random surfer has two options:
  • with probability β\beta, follows a link at random
  • with probability 1β1-\beta, jump to a random page
  • Common value for β\beta are 0.8 to 0.9

For dead ends, the solution is teleport with probability 1\large 1. The spider traps are not a mathematical problem, simply the result is not what we want. On the contrary, Dead-ends are a mathematical problems since our matrix wouldn’t be anymore a column stochastic matrix.

This allows us to the final matrix, also called the google matrix, it has the shape of:

Gij={βAijiAij+(1β)1N,if iAij0,1N,otherwise.with β=0.85G_{ij} = \begin{cases} \beta \dfrac{A_{ij}}{\sum\limits_{i} A_{ij}} + (1-\beta)\dfrac{1}{N}, & \text{if } \sum\limits_{i} A_{ij} \neq 0, \\[1.2em] \dfrac{1}{N}, & \text{otherwise.} \end{cases} \quad \text{with } \beta = 0.85

Where in this case AijiAij\large \dfrac{A_{ij}}{\sum\limits_{i} A_{ij}} is just the stochastic matrix M\large M. Here we don’t simulate the random walk, we think of it of being run infinitely long, computing this is equal to solve this recurvie equation by computing the leading eigevector of G.

So random walk is just an intuition that we never truly simulate.

Example:

Take this graph, the goal will be to extract the feature that better represent the importance of each node in the over all graph.

The first step through the process is to represent the graph as an adjacency matrix:

The matrix is going to be a 6×6\large 6 \times 6 matrix. We set Ai,j=1\large A_{i,j} = 1 if page j\large j has a link to page i\large i and Ai,j=0\large A_{i,j} = 0 otherwise.

A=[011000100010001110010000000001000010]\large A = \begin{bmatrix} 0 & 1 & 1 & 0 & 0 & 0 \\[4pt] % from 1 1 & 0 & 0 & 0 & 1 & 0 \\[4pt] % from 2 0 & 0 & 1 & 1 & 1 & 0 \\[4pt] % from 3 0 & 1 & 0 & 0 & 0 & 0 \\[4pt] % from 4 0 & 0 & 0 & 0 & 0 & 1 \\[4pt] % from 5 0 & 0 & 0 & 0 & 1 & 0 % from 6 \end{bmatrix}

Note that the 1\large 1 in the diagonal happen only if there is a self loop.

Next we are going to assume that every link in that column are equally likely (equal probability out-links), so we divide every single entry of a column by the number of 1\large 1 in that column. Since the columns must summ up to 1.

M=[01/21/200010001/30001/211/3001/2000000000100001/30]\large M = \begin{bmatrix} 0 & 1/2 & 1/2 & 0 & 0 & 0 \\[4pt] 1 & 0 & 0 & 0 & 1/3 & 0 \\[4pt] 0 & 0 & 1/2 & 1 & 1/3 & 0 \\[4pt] 0 & 1/2 & 0 & 0 & 0 & 0 \\[4pt] 0 & 0 & 0 & 0 & 0 & 1 \\[4pt] 0 & 0 & 0 & 0 & 1/3 & 0 \end{bmatrix}

Now we need to eliminate spider-traps and dead-ends.

Do have this result we have to use the definition of google matrix I provided before:

Gij={βAijiAij+(1β)1N,if iAij0,1N,otherwise.with β=0.85 G_{ij} = \begin{cases} \beta \dfrac{A_{ij}}{\sum\limits_{i} A_{ij}} + (1-\beta)\dfrac{1}{N}, & \text{if } \sum\limits_{i} A_{ij} \neq 0, \\[1.2em] \dfrac{1}{N}, & \text{otherwise.} \end{cases} \quad \text{with } \beta = 0.85 G=0.85[01/21/200010001/30001/211/3001/2000000000100001/30]+0.15[1/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/61/6] \large G = 0.85 \cdot \begin{bmatrix} 0 & 1/2 & 1/2 & 0 & 0 & 0 \\[4pt] % from 1 1 & 0 & 0 & 0 & 1/3 & 0 \\[4pt] % from 2 0 & 0 & 1/2 & 1 & 1/3 & 0 \\[4pt] % from 3 0 & 1/2 & 0 & 0 & 0 & 0 \\[4pt] % from 4 0 & 0 & 0 & 0 & 0 & 1 \\[4pt] % from 5 0 & 0 & 0 & 0 & 1/3 & 0 % from 6 \end{bmatrix} + 0.15 \cdot \begin{bmatrix} 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\[4pt] 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\[4pt] 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\[4pt] 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\[4pt] 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\[4pt] 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \end{bmatrix}

The middle step for computing this summation of matrixes is this:

G=[017/4017/4000017/2000017/6000017/4017/2017/600017/4000000000017/20000017/600]+[1/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/401/40]G = \begin{bmatrix} 0 & 17/40 & 17/40 & 0 & 0 & 0 \\[4pt] % from 1 17/20 & 0 & 0 & 0 & 17/60 & 0 \\[4pt] % from 2 0 & 0 & 17/40 & 17/20 & 17/60 & 0 \\[4pt] % from 3 0 & 17/40 & 0 & 0 & 0 & 0 \\[4pt] % from 4 0 & 0 & 0 & 0 & 0 & 17/20 \\[4pt] % from 5 0 & 0 & 0 & 0 & 17/60 & 0 % from 6 \end{bmatrix} + \begin{bmatrix} 1/40 & 1/40 & 1/40 & 1/40 & 1/40 & 1/40 \\[4pt] 1/40 & 1/40 & 1/40 & 1/40 & 1/40 & 1/40 \\[4pt] 1/40 & 1/40 & 1/40 & 1/40 & 1/40 & 1/40 \\[4pt] 1/40 & 1/40 & 1/40 & 1/40 & 1/40 & 1/40 \\[4pt] 1/40 & 1/40 & 1/40 & 1/40 & 1/40 & 1/40 \\[4pt] 1/40 & 1/40 & 1/40 & 1/40 & 1/40 & 1/40 \\[4pt] \end{bmatrix}

As the final result, you can see how the property of stochastic matrix remained, but we have a full matrix that avoid problems like spider-traps and dead-ends.

G=[1/4018/4018/401/401/401/4035/401/401/401/4037/1201/401/401/4018/4035/4037/1201/401/4018/401/401/401/401/401/401/401/401/401/4035/401/401/401/401/4037/1201/40]G = \begin{bmatrix} 1/40 & 18/40 & 18/40 & 1/40 & 1/40 & 1/40 \\[4pt] 35/40 & 1/40 & 1/40 & 1/40 & 37/120 & 1/40 \\[4pt] 1/40 & 1/40 & 18/40 & 35/40 & 37/120 & 1/40 \\[4pt] 1/40 & 18/40 & 1/40 & 1/40 & 1/40 & 1/40 \\[4pt] 1/40 & 1/40 & 1/40 & 1/40 & 1/40 & 35/40 \\[4pt] 1/40 & 1/40 & 1/40 & 1/40 & 37/120 & 1/40 \\[4pt] \end{bmatrix}

Now we can iterate until we find a stable state:

  • Initialize r0=[1N,,1N]T\large r^0 = [\frac{1}{N}, \dots, \frac{1}{N}]^T
  • Iterate: rt+1=Mrt\large r^{t+1} = M \cdot r^t
  • Stop when rt+1rt<ϵ\large \lvert r^{t+1} - r^t \rvert < \epsilon

After few iteration (according to the epsilon you chose), you will arrive in a stable solution that in my case is:

r=[0.245340.251360.268190.131470.061280.04236]  Node 1Node 2Node 3Node 4Node 5Node 6 \large r = \begin{bmatrix} 0.24534 \\ 0.25136 \\ 0.26819 \\ 0.13147 \\ 0.06128 \\ 0.04236 \end{bmatrix} \; \begin{array}{l} \text{Node 1} \\ \text{Node 2} \\ \text{Node 3} \\ \text{Node 4} \\ \text{Node 5} \\ \text{Node 6} \end{array}

You can try this for yourself just executing the pythong code you will find below.

from fractions import Fraction

G = [
    [Fraction(1,40), Fraction(18,40), Fraction(18,40), Fraction(1,40), Fraction(1,40), Fraction(1,40)],

    [Fraction(35,40), Fraction(1,40), Fraction(1,40), Fraction(1,40), Fraction(37,120), Fraction(1,40)],

    [Fraction(1,40), Fraction(1,40), Fraction(18,40), Fraction(35,40), Fraction(37,120), Fraction(1,40)],

    [Fraction(1,40), Fraction(18,40), Fraction(1,40), Fraction(1,40), Fraction(1,40), Fraction(1,40)],

    [Fraction(1,40), Fraction(1,40), Fraction(1,40), Fraction(1,40), Fraction(1,40), Fraction(35,40)],

    [Fraction(1,40), Fraction(1,40), Fraction(1,40), Fraction(1,40), Fraction(37,120), Fraction(1,40)]
]

# Initial rank vector (uniform)

r = [Fraction(1,6) for _ in range(6)]

# Stopping threshold epsilon
epsilon = Fraction(1,1000) 

def multiply_matrix_vector(M, vec):

    """Multiply matrix M by column vector vec (both using Fraction)"""
    result = []

    for row in M:

        s = sum(r*c for r,c in zip(row, vec))

        result.append(s)

    return result


def vector_diff(v1, v2):

    """Max absolute difference between two vectors"""

    return max(abs(a-b) for a,b in zip(v1, v2)) 

# Iterative multiplication until convergence

iteration = 0

while True:

    r_next = multiply_matrix_vector(G, r)

    diff = vector_diff(r, r_next)

    iteration += 1

    print(f"Iteration {iteration}: {r_next}  | diff = {diff}")

    if diff < epsilon:

        break

    r = r_next

r_final = [round(float(x), 5) for x in r_next]

print("\nConverged PageRank:")

for i, val in enumerate(r_final, 1):

    print(f"Node {i}: {val}")

This vector represents the importance of the nodes in the graph. A representative image of the importance of nodes in the graph is provided below.

Although node 5 receives links from nodes 2 and 3, its PageRank is relatively small. This is because PageRank distributes a node’s “importance” proportionally across all of its outgoing links. Node 2 and node 3 each link to multiple nodes, so the portion of their rank that reaches node 5 is only a fraction of their total importance. In other words, PageRank measures the probability that a random surfer lands on a node, taking into account both the importance of linking nodes and how their rank is split among their outgoing links. Therefore, even if a node is linked by important nodes, it may still have a low rank if those nodes distribute their rank widely.

References 📚

Research Papers

  • Jure Leskovec et al., Pixie: System for Large-Scale Graph Mining, PDF
  • Bindel, Lecture Notes CS6210, Cornell University, PDF
  • Vesak, Stochastic Processes Notes, PDF

Books

  • Horn, R.A. & Johnson, C.R., Matrix Analysis, 2nd Edition, PDF, p. 549
  • Langville & Meyer, Google’s PageRank and Beyond, PDF, p. 186
  • Horn & Johnson, Matrix Analysis, Chapter 8, PDF

Markov Chains & Linear Algebra

  • Knill, Lecture Notes on Markov Chains, Harvard University, Lecture 33, Lecture 34
  • Knill, Linear Algebra Probability Summary, Lecture 36
  • MathOverflow, Examples on Non-Diagonalizable Stochastic Matrices, Link

Exercises

  • Knight, Markov Chains Exercise Sheet with Solutions, PDF
  • Probability Course, Solved Problems, Link
  • Gordon, Solved Problems on Markov Chains, PDF

Reference Notes

  • Fewster, Transition Matrix Definitions, PDF
  • Stanford EE363, Perron-Frobenius Theory, PDF
  • YouTube, Perron-Frobenius Theory Lecture, Watch