Hilltop is one of
major concepts underpinning Google's search algorithm, yet its workings and implications are often misunderstood. After
infamous Florida Update, many webmasters were aghast as their rankings plummeted; and again, when
mysterious "sandbox" was implemented, some webmasters could not get a Web site to rank well, period. Part of
reason that some Web sites get shuffled out of
SERPs when new algorithmic features are implemented is that those sites never gained authority in
eyes of
search engines�that is, they were not sufficiently meshed into their local topical communities. This concept of authority was one pioneered in a paper titled "Hilltop: A Search Engine based on Expert Documents," written by Krishna Bharat and George A. Mihaila. The full text is available online at http://www.cs.toronto.edu/~georgem/hilltop/.
(Note: Google has obviously not implemented Hilltop in its pure form, but rather uses
principles of topical communities and authority in its algorithm. Likewise, other search engines such as MSN and Yahoo! are not using Hilltop per se, but rather similar algorithmic features. Thus when I mention �Hilltop' I am referring to not just
specific paper published by Bharat and Mihaila, but also to
fundamental theory upon which any authority-based link popularity algorithm is based. This theory applies to Topic-Sensitive PageRank, etc.)
The Basics of Hilltop
Google's PageRank formula revolutionized search, but it has a major flaw: it gives each page an absolute measure of importance. Recognizing that a page's importance should be interpreted in light of a given query topic,
Hilltop formula uses
link structure of
topical community related to
query topic when determining relevance.
For a given topic query, some pages are considered to be "expert documents," and others are "authorities." A page is an expert document if it "is about a certain topic and has links to many non-affiliated pages on that topic" (this type of page is also sometimes called a hub). A page is an authority "if and only if some of
best experts on
query topic point to it." To summarize: hubs link to authorities; authorities are linked to by hubs.
The Challenge for New Web Sites
The nature of
World Wide Web dictates that it will take time for a new Web site to get links from within its topical community. Many hubs such as resource lists or niche directories are only updated periodically with new links. Still others are static pages that will never be changed.
Then there is
"human factor." It takes time for a Web site to be recognized as valuable, and for webmasters to trust it enough to link to it. Older authority sites and hubs also tend to link to other older authority sites, creating a sort of self-perpetuating authority set (Mike Grehan refers to this phenomenon in his article "Filthy Linking Rich," available online at http://www.e-marketing-news.co.uk/Oct04/RichLinking.html). This all adds up to
fact that it is very hard to make a new Web site an authority in
eyes of
search engine, which begs
question: How can a new Web site become entrenched in its topical neighborhood more quickly?