|
|
The Effect of Outbound Links |
|
|
|
Since PageRank is based on the linking
structure of the whole web, it is inescapable that if the
inbound links of a page influence its PageRank, its outbound
links do also have some impact. To illustrate the effects of
outbound links, we take a look at a simple example.
We regard a web consisting of to websites, each
having two web pages. One site consists of pages A and B, the
other constists of pages C and D. Initially, both pages of
each site solely link to each other. It is obvious that each
page then has a PageRank of one. Now we add a link which
points from page A to page C. At a damping factor of 0.75, we
therefore get the following equations for the single pages'
PageRank values:
PR(A) = 0.25 + 0.75 PR(B)
PR(B) =
0.25 + 0.375 PR(A)
PR(C) = 0.25 + 0.75 PR(D) + 0.375
PR(A)
PR(D) = 0.25 + 0.75 PR(C)
Solving the
equations gives us the following PageRank values for the first
site:
PR(A) = 14/23
PR(B) = 11/23
We
therefore get an accumulated PageRank of 25/23 for the first
site. The PageRank values of the second site are given by
PR(C) = 35/23
PR(D) = 32/23
So, the
accumulated PageRank of the second site is 67/23. The total
PageRank for both sites is 92/23 = 4. Hence, adding a link has
no effect on the total PageRank of the web. Additionally, the
PageRank benefit for one site equals the PageRank loss of the
other. |
|
|
The Actual Effect of Outbound Links
|
|
|
|
As it has already been shown, the PageRank benefit for
a closed system of web pages by an additional inbound link is
given by
(d / (1-d)) × (PR(X) / C(X)),
where X
is the linking page, PR(X) is its PageRank and C(X) is the
number of its outbound links. Hence, this value also
represents the PageRank loss of a formerly closed system of
web pages, when a page X within this system of pages now
points by a link to an external page.
The validity of
the above formula requires that the page which receives the
link from the formerly closed system of pages does not link
back to that system, since it otherwise gains back some of the
lost PageRank. Of course, this effect may also occur when not
the page that receives the link from the formerly closed
system of pages links back directly, but another page which
has an inbound link from that page. Indeed, this effect may be
disregarded because of the damping factor, if there are enough
other web pages in-between the link-recursion. The validity of
the formula also requires that the linking site has no other
external outbound links. If it has other external outbound
links, the loss of PageRank of the regarded site diminishes
and the pages already receiving a link from that page lose
PageRank accordingly.
Even if the actual PageRank
values for the pages of an existing web site were known, it
would not be possible to calculate to which extend an added
outbound link diminishes the PageRank loss of the site, since
the above presented formula regards the status after adding
the link. |
|
|
Intuitive Justification of the Effect of Outbound
Links
|
|
|
|
The intuitive justification for the loss of PageRank by
an additional external outbound link according to the Random
Surfer Modell is that by adding an external outbound link to
one page the surfer will less likely follow an internal link
on that page. So, the probability for the surfer reaching
other pages within a site diminishes. If those other pages of
the site have links back to the page to which the external
outbound link has been added, also this page's PageRank will
deplete.
We can conclude that external outbound links
diminish the totalized PageRank of a site and probably also
the PageRank of each single page of a site. But, since links
between web sites are the fundament of PageRank and
indespensable for its functioning, there is the possibility
that outbound links have positive effects within other parts
of Google's ranking criteria. Lastly, relevant outbound links
do constitute the quality of a web page and a webmaster who
points to other pages integrates their content in some way
into his own site. |
|
|
Dangling Links
|
|
|
|
An important aspect of outbound links is the lack of
them on web pages. When a web page has no outbound links, its
PageRank cannot be distributed to other pages. Lawrence Page
and Sergey Brin characterise links to those pages as dangling
links.
The effect of dangling links shall be illustrated by
a small example website. We take a look at a site consisting
of three pages A, B and C. In our example, the pages A and B
link to each other. Additionally, page A links to page C. Page
C itself has no outbound links to other pages. At a damping
factor of 0.75, we get the following equations for the single
pages' PageRank values:
PR(A) = 0.25 + 0.75
PR(B)
PR(B) = 0.25 + 0.375 PR(A)
PR(C) = 0.25 + 0.375
PR(A)
Solving the equations gives us the following
PageRank values:
PR(A) = 14/23
PR(B) =
11/23
PR(C) = 11/23
So, the accumulated PageRank of
all three pages is 36/23 which is just over half the value
that we could have expected if page A had links to one of the
other pages. According to Page and Brin, the number of
dangling links in Google's index is fairly high. A reason
therefore is that many linked pages are not indexed by Google,
for example because indexing is disallowed by a robots.txt
file. Additionally, Google meanwhile indexes several file
types and not HTML only. PDF or Word files do not really have
outbound links and, hence, dangling links could have major
impacts on PageRank.
In order to prevent PageRank from the negative
effects of dangling links, pages wihout outbound links have to
be removed from the database until the PageRank values are
computed. According to Page and Brin, the number of outbound
links on pages with dangling links is thereby normalised. As
shown in our illustration, removing one page can cause new
dangling links and, hence, removing pages has to be an
iterative process. After the PageRank calculation is finished,
PageRank can be assigned to the formerly removed pages based
on the PageRank algorithm. Therefore, as many iterations are
needed as for removing the pages. Regarding our illustration,
page C could be processed before page B. At that point, page B
has no PageRank yet and, so, page C will not receive any
either. Then, page B receives PageRank from page A and during
the second iteration, also page C gets its PageRank.
Regarding our example website for dangling links,
removing page C from the database results in page A and B each
having a PageRank of 1. After the calculations, page C is
assigned a PageRank of 0.25 + 0.375 PR(A) = 0.625. So, the
accumulated PageRank does not equal the number of pages, but
at least all pages which have outbound links are not harmed
from the danging links problem.
By removing dangling
links from the database, they do not have any negative effects
on the PageRank of the rest of the web. Since PDF files are
dangling links, links to PDF files do not diminish the
PageRank of the linking page or site. So, PDF files can be a
good means of search engine optimisation for Google.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PageRank and Google are trademarks of Google Inc.,
Mountain View CA, USA.
PageRank is protected by US Patent
6,285,999.
The content of this document may be
reproduced on the web provided that a copyright notice is
included and that there is a straight HTML hyperlink to the
corresponding page at pr.efactory.de in direct
context. |
|
|
|
|
|