Continued from page 1
Disallow: /duplicate/
The * after user-agent says that this action applies to all crawlers and /duplicate/ after disallow tells all crawlers to ignore this directory and not search it. For each user-agent and disallow line there must be a blank space between them in order for it to function correctly. So this is how you would create
above two commands into a robots.txt file:
# this identifies
wayback machine
User-agent: ia_archiver
Disallow: /
User-agent: *
Disallow: /duplicate/
One thing to note that is very important: Anyone can access
robots.txt file of a site. So if you have information that you don't want anyone to see don't include it into
robots.txt file. If
directory that you don't want anyone to see is not linked to from your web site
crawlers won't index it anyway.
An alternative to blocking indexing of your site is to put a meta tag into
page. It looks like this:
You put this into
tag of your web page. This line tells
robot crawlers not to index (search)
page and not to follow any of
hyperlinks on
page. So as an example tells
robots crawlers to not index
page, but follow
hyperlinks on this page. Did you know that Google has its own tag?
It looks like this: This tells
Google robot crawler not to index
page, not to follow any of
links, and not to keep from storing cached versions of your web site. You will want this done if you update
content on your site frequently. This prevents
web user from seeing outdated content that isn't refreshed because of storage in
cache.
You can use
tag to specifically talk to Google's robots to avoid complications or if you are optimizing your site for Google's search engine. This concludes this month's article.
Until
next article have a great day!
Copyright © Michael Rock
(You have permission to copy this article as long as it remains intact with
author's byline)
Web development contractor (Web Design and Hosting)
Internet Presence
www.TheInternetPresence.com
The owner of this registered company has over twenty years experience with DOS, windows business applications, numerous programming languages, artistic development, and web design. Other areas of interest include web marketing, web promoting, and business marketing and development. After
persuasion of those praising his work, he decided to go into business himself and highly suggests everyone else to do
same.

Internet Presence was founded in 2003 from a desire to become independent. Less than 1 year later Internet Presence has had accounts in three different states ranging from a locally owned auto collision repair shop to a glass packaging industry that sells its product worldwide.