|
|
The Implementation of PageRank in the Google Search
Engine |
|
|
|
Regarding the implementation of PageRank,
first of all, it is important how PageRank is integrated into
the general ranking of web pages by the Google search engine.
The proceedings have been described by Lawrencec Page and
Sergey Brin in several publications. Initially, the ranking of
web pages by the Google search engine was determined by three
factors:
- Page specific factors
- Anchor text of inbound links
- PageRank
Page specific factors are, besides the
body text, for instance the content of the title tag or the
URL of the document. It is more than likely that since the
publications of Page and Brin more factors have joined the
ranking methods of the Google search engine. But this shall
not be of interest here.
In order to provide search
results, Google computes an IR score out of page specific
factors and the anchor text of inbound links of a page, which
is weighted by position and accentuation of the search term
within the document. This way the relevance of a document for
a query is determined. The IR-score is then combined with
PageRank as an indicator for the general importance of the
page. To combine the IR score with PageRank the two values are
multiplicated. It is obvious that they cannot be added, since
otherwise pages with a very high PageRank would rank high in
search results even if the page is not related to the search
query.
Especially for queries consisting of two or
more search terms, there is a far bigger influence of the
content related ranking criteria, whereas the impact of
PageRank is mainly visible for unspecific single word queries.
If webmasters target search phrases of two or more words it is
possible for them to achieve better rankings than pages with
high PageRank by means of classical search engine
optimisation.
If pages are optimised for highly
competitive search terms, it is essential for good rankings to
have a high PageRank, even if a page is well optimised in
terms of classical search engine optimisation. The reason
therefore is that the increase of IR score deminishes the more
often the keyword occurs within the document or the anchor
texts of inbound links to avoid spam by extensive keyword
repetition. Thereby, the potentialities of classical search
engine optimisation are limited and PageRank becomes the
decisive factor in highly competitive areas. |
|
|
The PageRank Display of the Google Toolbar
|
|
|
|
PageRank became widely known by the PageRank display of
the Google Toolbar. The Google Toolbar is a browser plug-in
for Microsoft Internet Explorer which can be downloaded from
the Google web site. The Google Toolbar provides some features
for searching Google more comfortably.
The Google Toolbar displays PageRank on a scale from
0 to 10. First of all, the PageRank of an actually visited
page can be estimated by the width of the green bar within the
display. If the user holds his mouse over the display, the
Toolbar also shows the PageRank value.
Caution: The
PageRank display is one of the advanced features of the Google
Toolbar. And if those advanced features are enabled, Google
collects usage data. Additionally, the Toolbar is
self-updating and the user is not informed about updates. So,
Google has access to the user's hard drive.
If we take
into account that PageRank can theoretically have a maximum
value of up to dN+(1-d), where N is the total number of web
pages and d is usually set to 0.85, PageRank has to be scaled
for the display on the Google Toolbar. It is generally assumed
that the scalation is not linearly but logarithmically. At a
damping factor of 0.85 and, therefore, a minimum PageRank of
0.15 and at an assumed logaritmical basis of 6 we get a
scalation as follows:
Toolbar PageRank |
Real PageRank |
0/10 |
0.15 |
- |
0.9 |
1/10 |
0.9 |
- |
5.4 |
2/10 |
5.4 |
- |
32.4 |
3/10 |
32.4 |
- |
194.4 |
4/10 |
194.4 |
- |
1,166.4 |
5/10 |
1,166.4 |
- |
6,998.4 |
6/10 |
6,998.4 |
- |
41,990.4 |
7/10 |
41,990.4 |
- |
251,942.4 |
8/10 |
251,942.4 |
- |
1,511,654.4 |
9/10 |
1,511,654.4 |
- |
9,069,926.4 |
10/10 |
9,069,926.4 |
- |
0.85 × N +
0.15 |
It is uncertain
if in fact a logarithmical scalation in a strictly
mathematical sense takes place. There is likely a manual
scalation which follows a logarithmical scheme, so that Google
has control over the number of pages within the single Toolbar
PageRank ranges. The logarithmical basis for this scheme
should be between 6 and 7, which can for instance be
rudimentary deduced from the number of inbound links of pages
with a high Toolbar PageRank from pages with a Toolbar
PageRank higher than 4, which are shown by Googe using the
link command. |
|
|
The Toolbar's PageRank Files |
|
|
|
Even webmasters who do not want to use the Google
Toolbar or the Internet Explorer permanently for security and
privacy concerns have the possibility to check the PageRank
values of their pages. Google submits PageRank values in
simple text files to the Toolbar. In former times, this
happened via XML. The switch to text files occured in August
2002.
The PageRank files can be requested directly
from the domain www.google.com. Basically, the URLs for those
files look like follows (without line breaks):
http://www.google.com/search?
client=navclient-auto&
ch=0123456789&
features=Rank&
q=info:http://www.domain.com/
There is only one line of text in the PageRank files.
The last cipher in this line is PageRank.
The
parameters incorporated in the above shown URL are inevitable
for the display of the PageRank files in a browser. The value
"navclient-auto" for the parameter "client" identifies the
Toolbar. Via the parameter "q" the URL is submitted. The value
"Rank" for the parameter "features" determines that the
PageRank files are requested. If it is omitted, Google's
servers still transmit XML files. The parameter "ch" transfers
a checksum for the URL to Google, whereby this checksum can
only change when the Toolbar version is updated by Google.
Thus, it is necessary to install the Toolbar at least
once to find out about the checksum of one's URLs. To track
the communication between the Toolbar and Google, often the
use of packet sniffers, local proxies an similar tools is
suggested. But this is not necessarily needed, since the
PageRank files are cached by the Internet Explorer. So, the
checksums can simply been found out by having a look at the
folder Temporary Internet Files. Knowing the checksums of your
URLs, you can view the PageRank files in your browser and you
do not have to accept Google's 36 years lasting cookies.
Since the PageRank files are kept in the browser cache
and, thus, are clearly visible, and as long as requests are
not automated, watching the PageRank files in a browser should
not be a violation of Google's Terms of Service. However, you
should be cautious. The Toolbar submits its own User-Agent to
Google. It is:
Mozilla/4.0 (compatible; GoogleToolbar
1.1.60-deleon; OS SE 4.10)
1.1.60-deleon is a
Toolbar version which may of course change. OS is the
operating system that you have installed. So, Google is able
to identify requests by browsers, if they do not go out via a
proxy and if the User-Agent is not modified accordingly.
Taking a look at IE's cache, one will normally notice
that the PageRank files are not requested from the domain
www.google.com but from IP addresses like 216.239.33.102.
Additionally, the PageRank files' URLs often contain a
parameter "failedip" that is set to values like
"216.239.35.102;1111" (Its function is not absolutely clear).
The IP addresses are each related to one of Google's seven
data centers and the reason for the Toolbar querying
IP-addresses is most likely to control the PageRank display in
a better way, especially in times of the "Google Dance".
|
|
|
The PageRank Display at the Google Directory
|
|
|
|
Webmasters who do not want to check the PageRank
files that are used by the toolbar have another possibility to
receive information about the PageRank of their sites by means
of the Google Directory (directory.google.com).
The Google Directory is a dump of the
Open Directory Project (dmoz.org), which shows the PageRank
for listed documents similarly to the Google Toolbar display
scaled and by means of a green bar. In contrast to the
Toolbar, the scale is from 1 to 7. The exact value is not
displayed, but it can be determined by the divided bar
respectively the width of the single graphics in the source
code of the page if one is not sure by looking at the bar.
By comparing the Toolbar PageRank of a document with
its Directory PageRank, a more exact estimation of a pages
PageRank can be deduced, if the page is listed with the ODP.
This connection was mentioned first by Chris Raimondi
(www.searchnerd.com/pagerank).
|
Especially for
pages with a Toolbar PageRank of 5 or 6, one can appraise if
the page is on the upper or the lower end of its Toolbar
scale. It shall be noted that for the comparison the Toolbar
PageRank of 0 was not taken into account. It can easily be
verified that this is appropriate by looking at pages with a
Toolbar PageRank of 3. However, it has to be considered that
for a verification pages of the Google Directory respectively
the ODP with a Toolbar PageRank of 4 or lower have to be
chosen, since otherwise no pages linked from there with a
Toolbar PageRank of 3 will be found. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PageRank and Google are trademarks of Google Inc.,
Mountain View CA, USA.
PageRank is protected by US Patent
6,285,999.
The content of this document may be
reproduced on the web provided that a copyright notice is
included and that there is a straight HTML hyperlink to the
corresponding page at pr.efactory.de in direct
context. |
|
|
|
|
|