Continued from page 1
Let's suppose yours is a dynamic database site containing information of your newsletter subscribers, customers, their address, phone numbers etc. All these confidential information is kept in a separate directory called "admin". (It is recommended to keep such information in a separate directory. Handling data will be easier for you and so will be easy to keep search engines away. We will just know how.) I am sure you would never want any unauthorized person to visit this area leave alone search engines. It does not help search engines either since they have nothing to do with data or files there. Here comes role of a robots.txt file. Write following in robots.txt file: (Ignore horizontal row - they are included only to separate commands from rest of text.)
--------------------------------------------------------------------------------
User-agent: * Disallow: /admin/
--------------------------------------------------------------------------------
This does not allow spiders to index anything in admin directory also including sub-directories if any.
The asterisk (*) mark indicates all search engines. How do you stop a particular search engine from spidering your files or directory?
Suppose you want to stop Excite from spidering this directory:
--------------------------------------------------------------------------------
User-agent: ArchitextSpider Disallow: /admin/
--------------------------------------------------------------------------------
Suppose you want to stop Excite and Google from spidering this directory:
--------------------------------------------------------------------------------
User-agent: ArchitextSpider Disallow: /admin/
User-agent: Googlebot Disallow: /admin/
--------------------------------------------------------------------------------
Files are no different. Suppose you want a file datafile.html not to be spidered by Excite:
--------------------------------------------------------------------------------
User-Agent: ArchitextSpider Disallow: /datafile.html
--------------------------------------------------------------------------------
Similarly, you do not want it to be spidered by Google too:
--------------------------------------------------------------------------------
User-agent: ArchitextSpider Disallow: /datafile.html
User-agent: Googlebot Disallow: /datafile.html
--------------------------------------------------------------------------------
Suppose you want two files datafile1.html and datafile2.html not to be spidered by Excite:
--------------------------------------------------------------------------------
User-Agent: ArchitextSpider Disallow: /datafile1.html Disallow: /datafile2.html
--------------------------------------------------------------------------------
Can you guess what does following mean?
--------------------------------------------------------------------------------
User-agent: ArchitextSpider Disallow: /datafile1.html Disallow: /datafile2.html
User-agent: Googlebot Disallow: /datafile1.html
--------------------------------------------------------------------------------
Excite will not spider datafile1.html and datafile2.html, but Google will not spider only datafile1.html. It will spider datafile2.html and rest of files in directory.
Imagine you have a file kept in a sub-directory that you wouldn't like to be spidered. What do you do? Lets suppose sub-directory is "official" and file is "confidential.html".
--------------------------------------------------------------------------------
User-agent: * Disallow: /official/confidential.html
--------------------------------------------------------------------------------
I hope that's enough. A little practice is of course required. If syntax of your robots.txt file is not written correctly, search engines will ignore that particular command. Before uploading robots.txt file double check for any possible errors. You should upload robots.txt file in ROOT Directory of your server. The search engines look for robots.txt file only in root directory else they totally ignore it. Mostly root directory is directory where index page is kept. In that case keep robots.txt file in same directory as index file.
I know a user-friendly software that will write robots command for you (the software is introduced at beginning of this article). It can make error-free robots.txt file very easily. This software RoboGen is a great tool. Never bother ever again to check syntax of your robots.txt file or even write a robots.txt file yourself. RoboGen is a visual editor for Robot Exclusion Files and is easy to use. Just select files you want to be visited or not to be visited by search engines, and it creates robots.txt file. You can also select search engines of your choice. RoboGen maintains a database of over 180 search engine user-agents, which are selectable from a drop down menu. It is BEST and ONLY software on Internet to write robots.txt file correctly and effectively. This great tool is cheaper than you expect. CLICK HERE NOW to know more!
Note: You should be able to see robots.txt file if you type following in address bar of your Internet browser.
http://www.your-domain.com/robots.txt
(Where your-domain is domain name of your website. If yours is not a .com site, replace .com with respective extension your website. For e.g. .net, .us, .org etc.)
You must be wondering whether to use Meta tag or Robots.txt or which of these is more effective!
A robots.txt correctly written is more effective than meta tag. All search engines support robots.txt, but not all search engines support robots command written in meta tags. I recommend that you use both so that you cover your site in both scenarios. RoboGen will help you to write both!
One last thing - You can look in your web server log files to see what search engine robots have visited. They all leave signatures that can be detected. These signatures are nothing but name of their robots. For instance if Google has spidered your site it will leave a log file called Googlebot. This is how you know which search engine has spidered your pages and when!
Senior Manager - Internet Promotions http://www.searchengineoptimizationpromotion.com