Some people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, while I don't recommend that people create different pages for different search engines, if you do decide to create such pages, there is one issue that you need to be aware of.These pages, although optimized for different search engines, often turn out to be pretty similar to each other. The search engines now have
ability to detect when a site has created such similar looking pages and are penalizing or even banning such sites. In order to prevent your site from being penalized for spamming, you need to prevent
search engine spiders from indexing pages which are not meant for it, i.e. you need to prevent AltaVista from indexing pages meant for Google and vice-versa. The best way to do that is to use a robots.txt file.
You should create a robots.txt file using a text editor like Windows Notepad. Don't use your word processor to create such a file.
Here is
basic syntax of
robots.txt file:
User-Agent: [Spider Name] Disallow: [File Name]
For instance, to tell AltaVista's spider, Scooter, not to spider
file named myfile1.html residing in
root directory of
server, you would write
User-Agent: Scooter Disallow: /myfile1.html
To tell Google's spider, called Googlebot, not to spider
files myfile2.html and myfile3.html, you would write
User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html
You can, of course, put multiple User-Agent statements in
same robots.txt file. Hence, to tell AltaVista not to spider
file named myfile1.html, and to tell Google not to spider
files myfile2.html and myfile3.html, you would write
User-Agent: Scooter Disallow: /myfile1.html
User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html
If you want to prevent all robots from spidering
file named myfile4.html, you can use
* wildcard character in
User-Agent line, i.e. you would write
User-Agent: * Disallow: /myfile4.html
However, you cannot use
wildcard character in
Disallow line.
Once you have created
robots.txt file, you should upload it to
root directory of your domain. Uploading it to any sub-directory won't work -
robots.txt file needs to be in
root directory.
I won't discuss
syntax and structure of
robots.txt file any further - you can get
complete specifications from here.
Now we come to how
robots.txt file can be used to prevent your site from being penalized for spamming in case you are creating different pages for different search engines. What you need to do is to prevent each search engine from spidering pages which are not meant for it.