Continued from page 1
Disallow: /duplicate/
The * after user-agent says that this action applies to all crawlers and /duplicate/ after disallow tells all crawlers to ignore this directory and not search it. For each user-agent and disallow line there must be a blank space between them in order for it to function correctly. So this is how you would create above two commands into a robots.txt file:
# this identifies wayback machine
User-agent: ia_archiver
Disallow: /
User-agent: *
Disallow: /duplicate/
One thing to note that is very important: Anyone can access robots.txt file of a site. So if you have information that you don't want anyone to see don't include it into robots.txt file. If directory that you don't want anyone to see is not linked to from your web site crawlers won't index it anyway.
An alternative to blocking indexing of your site is to put a meta tag into page. It looks like this:
You put this into
tag of your web page. This line tells robot crawlers not to index (search) page and not to follow any of hyperlinks on page. So as an example tells robots crawlers to not index page, but follow hyperlinks on this page. Did you know that Google has its own tag?
It looks like this: This tells Google robot crawler not to index page, not to follow any of links, and not to keep from storing cached versions of your web site. You will want this done if you update content on your site frequently. This prevents web user from seeing outdated content that isn't refreshed because of storage in cache.
You can use tag to specifically talk to Google's robots to avoid complications or if you are optimizing your site for Google's search engine. This concludes this month's article.
Until next article have a great day!
Copyright © Michael Rock
(You have permission to copy this article as long as it remains intact with author's byline)
Web development contractor (Web Design and Hosting)
Internet Presence
www.TheInternetPresence.com
The owner of this registered company has over twenty years experience with DOS, windows business applications, numerous programming languages, artistic development, and web design. Other areas of interest include web marketing, web promoting, and business marketing and development. After persuasion of those praising his work, he decided to go into business himself and highly suggests everyone else to do same.
Internet Presence was founded in 2003 from a desire to become independent. Less than 1 year later Internet Presence has had accounts in three different states ranging from a locally owned auto collision repair shop to a glass packaging industry that sells its product worldwide.