I. The Genetic BlueprintA decade after
invention of
World Wide Web, Tim Berners-Lee is promoting
"Semantic Web". The Internet hitherto is a repository of digital content. It has a rudimentary inventory system and very crude data location services. As a sad result, most of
content is invisible and inaccessible. Moreover,
Internet manipulates strings of symbols, not logical or semantic propositions. In other words,
Net compares values but does not know
meaning of
values it thus manipulates. It is unable to interpret strings, to infer new facts, to deduce, induce, derive, or otherwise comprehend what it is doing. In short, it does not understand language. Run an ambiguous term by any search engine and these shortcomings become painfully evident. This lack of understanding of
semantic foundations of its raw material (data, information) prevent applications and databases from sharing resources and feeding each other. The Internet is discrete, not continuous. It resembles an archipelago, with users hopping from island to island in a frantic search for relevancy.
Even visionaries like Berners-Lee do not contemplate an "intelligent Web". They are simply proposing to let users, content creators, and web developers assign descriptive meta-tags ("name of hotel") to fields, or to strings of symbols ("Hilton"). These meta-tags (arranged in semantic and relational "ontologies" - lists of metatags, their meanings and how they relate to each other) will be read by various applications and allow them to process
associated strings of symbols correctly (place
word "Hilton" in your address book under "hotels"). This will make information retrieval more efficient and reliable and
information retrieved is bound to be more relevant and amenable to higher level processing (statistics,
development of heuristic rules, etc.). The shift is from HTML (whose tags are concerned with visual appearances and content indexing) to languages such as
DARPA Agent Markup Language, OIL (Ontology Inference Layer or Ontology Interchange Language), or even XML (whose tags are concerned with content taxonomy, document structure, and semantics). This would bring
Internet closer to
classic library card catalogue.
Even in its current, pre-semantic, hyperlink-dependent, phase,
Internet brings to mind Richard Dawkins' seminal work "The Selfish Gene" (OUP, 1976). This would be doubly true for
Semantic Web.
Dawkins suggested to generalize
principle of natural selection to a law of
survival of
stable. "A stable thing is a collection of atoms which is permanent enough or common enough to deserve a name". He then proceeded to describe
emergence of "Replicators" - molecules which created copies of themselves. The Replicators that survived in
competition for scarce raw materials were characterized by high longevity, fecundity, and copying-fidelity. Replicators (now known as "genes") constructed "survival machines" (organisms) to shield them from
vagaries of an ever-harsher environment.
This is very reminiscent of
Internet. The "stable things" are HTML coded web pages. They are replicators - they create copies of themselves every time their "web address" (URL) is clicked. The HTML coding of a web page can be thought of as "genetic material". It contains all
information needed to reproduce
page. And, exactly as in nature,
higher
longevity, fecundity (measured in links to
web page from other web sites), and copying-fidelity of
HTML code -
higher its chances to survive (as a web page).