top of page
Keet Malin Sugathadasa

Website Ranking using Google PageRank Algorithm


Google Searching is something that we do everyday. But have you ever wondered how the ordering of the websites happen? It is merely a collection of different algorithms used by Google to give the most relevant set of documents to suit the user's information need. I described how the TF-IDF algorithm works in a previous blog post. In this blog, i will be talking about the PageRank algorithm that Google Search uses for their result set relevance ranking.

This is only one algorithm that Google uses to order their search results. But PageRank is the first algorithm used by Google Search and it is the best known algorithm as well. With this blog, you will get a clear idea on what this algorithm is all about, and understand practical scenarios as to why this algorithm is considered to be a prominent one. Are you a web developer? Then investing some time to understand what PageRank is all about, will be very important for you to get your website in the top listings in Google Search.

These are the contents being addresses in this blog.

  1. What is Page Rank

  2. The PageRank Algorithm

  3. How the PageRank Algorithm works

  4. Important points to Understand in Google PageRank

1) What is PageRank?

According to Google's formal definition from the Google Paper, PageRank is simply a voting platform which considers links to be votes. If there are many links from different websites to "Website A", then "Website A" is considered to be important. This importance is assigned by different weights, where each of the weights are determined by various factors.

This was initially developed by Larry Page and Serge Brin to Rank webpages in Search Engines. This is capable of ranking pages based on the number of links pointing to them. Even at present, this is considered as the basis for all modern search engines. Initially, this is what made Google's Search Results so successful over many other search engines in that day. Although Google was not the first to look at links and create rankings, they were the ones who did it in a meaningful way.

Google PageRank comprises of both its algorithms as well as the score given by the algorithm. PageRank is not the only technique that Google uses to rank its webpages, but the mix of techniques that makes it successful in the web search industry. What we know, and what is being released by Google, regarding the PageRank algorithm is merely a smaller version of it. Google does not clearly mention anything regarding the value that the algorithm outputs. But studies show that this value is a logarithmic scale with a base of a bout 16.

For example:

  • PageRank 2 is 16 times bigger than a PageRank 1

  • PageRank 3 is 256 times bigger than a PageRank 1

  • PageRank 4 is 4,096 times better than PageRank 1

  • PageRank 5 is 65,536 times better than PageRank 1

  • PageRank 7 is 4,294,967,296 (over 4 billion) times better than PageRank 1

2) The PageRank Algorithm

The PageRank algorithm gives a rating of a page's importance in the given context. This ranking is measured recursively, because the importance of one page, refers back to the importance of other pages that link to it. The PageRank algorithm given below, gives a probability distribution as to how likely a certain webpage is, for a person to randomly click on links and arrive at that page.

Once you have look at the below section, you will get a clear idea on how to apply the above algorithm to come up with Page Ranks.

3) How the PageRank Algorithm Works

Let's try and understand how this algorithm works by a step by step procedure and see how the ranks of each of the pages change with each step. Initially let us consider the graph given below. Each node in this graph is considered to be a webpage and arrows show the reference from one page to another.

For example, Website A references Website B. Also we can see that Website A references Website D and vice versa.

Step 1 (Initialization)

Initialize all the webpages to have an equal probability. This is the PageRank that every node would have, if there were no edges in the graph. It is basically like the probability for a user to randomly reach a node by a click. So, this will be the zeroth iteration.

Initial Probability = 1 / n (n is the total number of nodes or webpages in the network)

In the above graph, the initial probability = 1/4 = 0.25 for all nodes

Step 2 (Iteration 1)

Now let's apply the PageRank Algorithm

NODE A

The only node pointing at A is D. D has 3 outgoing links.

Out(D) = 3, PR(D) = 0.25

So PR(A) = PR(D)/Out(D) = 0.25/3 = 0.083333

NODE B

The nodes pointing at B are, A and D.

Out(A) = 2, PR(A) = 0.25

Out(D) = 3, PR(D) = 0.25

So PR(B) = PR(A)/Out(A) + PR(D)/Out(D) = 0.25/2 + 0.25/3 = 0.208333

NODE C

The nodes pointing at C are, B and D

Out(B) = 1 , PR(B) = 0.25

Out(D) = 3, PR(D) = 0.25

So PR(C) = PR(B)/Out(B) + PR(D)/Out(D) = 0.25/1 + 0.25/3 = 0.3333

NODE D

The nodes pointing at D are, A and C

Out(A) = 2, PR(A) = 0.25

Out(C) = 1, PR(C) = 0.25

So PR(D) = PR(A)/Out(A) + PR(C)/Out(C) = 0.25/2 + 0.25/1 = 0.375

At the end of the first iteration, the page rank values are as follows.

Step 3 (Iteration 2)

Now let's apply the PageRank Algorithm again.

NODE A

The only node pointing at A is D. D has 3 outgoing links.

Out(D) = 3, PR(D) = 0.375

So PR(A) = PR(D)/Out(D) = 0.375/3 = 0.125

NODE B

The nodes pointing at B are, A and D.

Out(A) = 2, PR(A) = 0.08333

Out(D) = 3, PR(D) = 0.375

So PR(B) = PR(A)/Out(A) + PR(D)/Out(D) = 0.08333/2 + 0.375/3 = 0.166665

NODE C

The nodes pointing at C are, B and D

Out(B) = 1 , PR(B) = 0.208333

Out(D) = 3, PR(D) = 0.375

So PR(C) = PR(B)/Out(B) + PR(D)/Out(D) = 0.208333/1 + 0.375/3 = 0.3333

NODE D

The nodes pointing at D are, A and C

Out(A) = 2, PR(A) = 0.08333

Out(C) = 1, PR(C) = 0.333

So PR(D) = PR(A)/Out(A) + PR(C)/Out(C) = 0.08333/2 + 0.333/1 = 0.374665

At the end of the second iteration, the page rank values are as follows.

Step 4 (PageRanks)

After the above iteration, we can rank the webpages as given below. Higher the score, higher the rank. Higher than rank, higher the importance.

The importance of the webpages increases from A, B, C, to D.

We can see that D is a very important Website. But why is C the second most important website? Intuitively, the graph says that C is not very important. But, node C becomes important, because a very important node such as Node D is pointing towards node C. So, this makes Node C important as well. It's very important to understand the basic concepts behind Google's PageRank algorithm.

4) Important points to Understand in Google PageRank

If you a web developer, you must be very keen on knowing how to get your Website amongst the top results in Google. You must have tried various mechanisms like using tags, refreshing content periodically, adding metadata or even trying to have catchy names are some of them. But as shown in the above example, if we can get a reference from at least one important website, this will boost out website's rankings rapidly.

Try and understand how this works with an analogy.

Let's say you are applying for a job interview and you collect a set of recommendation letters. But if the recommendation letters were collected from unimportant persons, none of them will be accepted as a proper recommendation letter. So having many recommendation letters from unimportant people is not valuable.

Similarly, just by having many unimportant websites to reference your website, does not add any value towards the Page Ranking of your website. There are many people who try this, by replicating dummy websites to references their main website. But if you have a closer look at the algorithm, you would realize that there is very less impact on the final ranking of the main website.

Hence, it is important to note that, rather than having many unimportant websites referencing yours, it is always much more valuable to have at least one very important website referencing yours.

References

619 views0 comments
bottom of page