Continued from page 1
Change file names:
Change stats filename from index.htm to something different, such as stats- new.htm so that your stats URL now becomes www.domain.com/stats/stats-new.htm
Place a simple text file containing text, "Sorry you are not authorized to view this page", and save it as index.htm in your /stats/directory.
This way snooper cannot guess your actual filename and get to your banned content.
Use login passwords: Password-protect sensitive content listed in your robots.txt file.
Optimization of robots.txt file : -
The right commands in robots.txt : Use correct commands. Most common errors include - putting command meant for "User-agent" field in "Disallow field" and vice-versa. Please also note that there is no "Allow" command. Content not blocked in "Disallow" field is considered allowed. Currently, only two fields are recognized: "The User-agent field" and "Disallow field". Experts are considering addition of more robot recognizable commands to make robots.txt file more Webmaster and robot friendly.
Bad Syntax: Do not put multiple file URLs in one Disallow line in robots.txt file. Use a new Disallow line for every directory that you want to block access to. Incorrect Robots.txt example :
User-agent: * Disallow: /concepts/ /links/ /images/
Correct robots.txt example:
User-agent: * Disallow: /concepts/ Disallow: /links/ Disallow: /images/
Files and directories: If a specific file has to be disallowed, end it with file extension and without a forward slash in end. Study following Robots.txt example :
For file:
User-agent: * Disallow: /hilltop.html
For Directory:
User-agent: * Disallow: /concepts/
Remember if you have to block access to all files in directory, you don't have to specify each and every file in robots.txt . You can simply block directory as shown above. Another common error is leaving out slashes altogether. This would leave a very different message than intended.
The right location for robots.txt file: No robot will access a badly placed robots.txt file. Make sure that location is www.domain.com/robots.txt.
Capitalization in robots.txt : Never capitalize your syntax commands. Directory and filenames are case sensitive in Unix platforms. The only capitals used per standard are: "User-agent " and "Disallow "
Correct Order for robots.txt : If you want to block access to all but one or more than one robot, then specific ones should be mentioned first. Lets study this robots.txt example :
User-agent: * Disallow: /
User-agent: googlebot Disallow:
In above case, Googlebot would simply leave site without indexing after reading first command. Correct syntax is:
User-agent: googlebot Disallow:
User-agent: * Disallow: /
The robots.txt file : Not having a robots.txt file at all could generate a 404 error for search engine robots, which could redirect robot to default 404-error page or your customized 404-error page. If this happens seamlessly, it is up to robot to decide if target file is a robots.txt file or an html file. Typically it would not cause many problems but you may not want to risk it. It's always a better idea to put standard robots.txt file in root directory, than not having it at all.
The standard robots.txt file for allowing all robots to index all pages is:
User-agent: * Disallow:
Using # Carefully in robots.txt file: Adding comments after syntax commands is not a good idea using "#". Some robots might misinterpret line although it is acceptable as per robots exclusion standard. New lines are always preferred for comments.
Using robots.txt file : -
Robots are configured to read text. Too much graphic content could render your pages invisible to search engine. Use robots.txt file to block irrelevant and graphic-only content.
Indiscriminate access to all files, it is believed, can dilute relevance to your site content after being indexed by robots. This could seriously affect your site's ranking with search engines. Use robots.txt file to direct robots to content relevant to your site's theme by blocking irrelevant files or directories.
The robots.txt file can be used for multilingual websites to direct robots to relevant content for relevant topics for different languages. It ultimately helps search engines to present relevant results for specific languages. It also helps search engine in its advanced search options where language is a variable.
Some robots could cause severe server loading problems by rapid firing too many requests at peak hours. This could affect your business. By excluding some robots that might be irrelevant to your site, in robots.txt file, this problem can be taken care of. It is really not a good idea to let malevolent robots use up precious bandwidth to harvest your emails, images etc.
Use robots.txt file to block out folders with sensitive information, text content, demo areas or content yet to be approved by your editors before it goes live. The robots.txt file is an effective tool to address certain issues regarding website ranking. Used in conjunction with other SEO strategies, it can significantly enhance a website's presence on net.
Related Reading : -
A Standard for Robots Exclusion.
Guide to The Robots Exclusion Protocol
W3C Recommendations
Article last updated : 11th March 2004
(c) Copyright 2004 Jagdeep.S. Pannu, SEORank ----------------------------------------------
This Article is Copyright protected. If you have comments; or would like to have this article republished on your site, please contact author here: SEO Articles Feedback. We just require all due credits carried; and text, hyperlinks and headers unaltered. This article must not be used in unsolicited mail.
Jagdeep.S.Pannu is Manager-Online Marketing, at www.SEORank.com, a leading Search Engine Optimization Services Company.