If you are thinking you have developed a truly great keyword-rich-unique-content fully optimized website for
search engines and an attracting site for
visitors - that's fine, but do you know you are missing something? A robots.txt file. Did you include it? By
way do you know what's
importance of a robots.txt file?Success of big companies lies in keeping their confidential data a secret, hidden from all. They tell
world something and do something. This enables them to execute their future course of action easily and change plans according to
situation. Job of robots.txt file is
same. It can or cannot allow a search engine to visit some or all of your web pages. Of course a human visitor is free to visit these pages. That being
case, for
search engines your website may be different than what a visitor is seeing. If you think one or some of
pages/files aren't good enough to be visited by a particular search engine or engines you can do it. Although this is not recommended - your website should be made in such a way it should not shy away from
search engines. Nevertheless its always better to know
basics of writing robots.txt file. It will help you. We will discuss farther down - robots.txt file is important. I repeat again - don't make pages you think should be hidden from
search engines. If any search engine think you are up to some tricks, it may panelize your site causing a no-rank - in
worst case for ever!
Every search engine has a "robot" (a software program) that does
job of visiting a website. Their purpose is to "know"
website, what it is all about, gather all information about it etc. Search engine robots gather this information and bring them back to their databases to show them in their search results. So, if your site is not there in their database it never shows up in
search results.
Web Robots are sometimes referred to as Web Crawlers, or Spiders. Therefore
process of a robot visiting your website is called "Spidering" or "Crawling". When somebody says "the search engines have spidered my website," it means
search engine robots have visited their website. This robot is known by a name and has an independent IP address. This IP address is of no importance to us, but knowing their names will help since this name will be used when we create a robots.txt file. This is why
file is called "robots.txt." Given below is
list of
robots of some of
very popular search engines:
Search Engine - Robot Alexa.com - ia_archiver Altavista.com - Scooter (Bought by Yahoo) UK.Altavista.com - AltaVista-Intranet (Bought by Yahoo) Alltheweb.com - FAST-WebCrawler (Bought by Yahoo) Excite.com - ArchitextSpider Euroseek.net - Arachnoidea Gendoor.com (Genealogical Search Engine) - GenCrawler Google.com - Googlebot (http://www.google.com/bot.html) Hotbot.com (uses Inktomi's robot) - Slurp Inktomi.com Slurp - (slurp@inktomi.com) (Bought by Yahoo) Infoseek.com - UltraSeek Looksmart.com - MantraAgent Lycos.com - Lycos_Spider_(T-Rex) Northernlight.com - Gulliver Nationaldirectory.com - NationalDirectory-SuperSpider UKSearcher.co.uk - UK Searcher Spider
Writing Robots.txt:
Let's learn to write robots command. Note that there are two ways to write robots command. One is to include all
commands in a text file called "robots.txt" and another is to write robots command in
meta tag.
We will learn both ways of writing robots command.
Writing robots command in Meta tag:
There are 4 things you can tell a search engine robot when it requests (visits) your page:
1) Do not index this page -
search engines will not index
page. 2) Do not follow any links on this page -
search engines will not follow
links included in
page, i.e. they will not index any page that this page links to. 3) Do index this page -
search engines will index
page. 4) Do follow
links -
search engines will index
pages that this page links to.
Note that "index" is different than "spider". A search engine first spiders a page and then indexes it. Indexing is giving a certain importance to
page on
basis of its content, information, meta tags, link popularity with respect to
searched keyword. All this is decided at run time. When you tell search engines not to index a page, it means they know that "certain" page exists but do not rank them. That is, a no-index page will never be shown in their search results. This in any case does not mean a no-index page will not get visitors, it might get visitors indirectly from a page which links to it. Yes, no direct visitors from
search engines.
Suppose you want
search engines to index and also index (follow) its linked pages then include
following command in
Meta Tag:
Suppose you want
search engines to index a page but not follow its links then include
following command in
Meta Tag:
Suppose you do not want
search engines to index a page but follow its links then include
following command in
Meta Tag:
Suppose you do not want
search engines to either index or follow links of a particular page then include
following command in
Meta Tag:
Note: Google makes a "Cached" of every file it spiders. It's a small snap shot of
page. Want to stop Google from doing so? Include
following Meta Tag:
Like any meta tag
above written tags should be placed in
HEAD section of an HTML page:
your title Creating robots.txt file:
A robots.txt file is an independent file and should be written in a plain text editor like Notepad. Do not use MS-Word or any other text editor to create robots.txt. The bottom line is this file should have
extension ".txt" else it will be useless.
Let's begin. Open Notepad (it comes free with Microsoft Windows) and save
file with
name "robots.txt". Make sure that
extension is .txt.
By
way, did you note we did not use name of any robot in
meta tag! What does it indicate? Simple - by using meta you direct all
search engines to do something or not do something on a page. You do not have control over any one search engine. The solution is robots.txt.