Continued from page 1
Change file names:
Change
stats filename from index.htm to something different, such as stats- new.htm so that your stats URL now becomes www.domain.com/stats/stats-new.htm
Place a simple text file containing
text, "Sorry you are not authorized to view this page", and save it as index.htm in your /stats/directory.
This way
snooper cannot guess your actual filename and get to your banned content.
Use login passwords: Password-protect
sensitive content listed in your robots.txt file.
Optimization of
robots.txt file : -
The right commands in robots.txt : Use correct commands. Most common errors include - putting
command meant for "User-agent" field in
"Disallow field" and vice-versa. Please also note that there is no "Allow" command. Content not blocked in
"Disallow" field is considered allowed. Currently, only two fields are recognized: "The User-agent field" and
"Disallow field". Experts are considering
addition of more robot recognizable commands to make
robots.txt file more Webmaster and robot friendly.
Bad Syntax: Do not put multiple file URLs in one Disallow line in
robots.txt file. Use a new Disallow line for every directory that you want to block access to. Incorrect Robots.txt example :
User-agent: * Disallow: /concepts/ /links/ /images/
Correct robots.txt example:
User-agent: * Disallow: /concepts/ Disallow: /links/ Disallow: /images/
Files and directories: If a specific file has to be disallowed, end it with
file extension and without a forward slash in
end. Study
following Robots.txt example :
For file:
User-agent: * Disallow: /hilltop.html
For Directory:
User-agent: * Disallow: /concepts/
Remember if you have to block access to all files in
directory, you don't have to specify each and every file in robots.txt . You can simply block
directory as shown above. Another common error is leaving out
slashes altogether. This would leave a very different message than intended.
The right location for
robots.txt file: No robot will access a badly placed robots.txt file. Make sure that
location is www.domain.com/robots.txt.
Capitalization in robots.txt : Never capitalize your syntax commands. Directory and filenames are case sensitive in Unix platforms. The only capitals used per standard are: "User-agent " and "Disallow "
Correct Order for robots.txt : If you want to block access to all but one or more than one robot, then
specific ones should be mentioned first. Lets study this robots.txt example :
User-agent: * Disallow: /
User-agent: googlebot Disallow:
In
above case, Googlebot would simply leave
site without indexing after reading
first command. Correct syntax is:
User-agent: googlebot Disallow:
User-agent: * Disallow: /
The robots.txt file : Not having a robots.txt file at all could generate a 404 error for search engine robots, which could redirect
robot to
default 404-error page or your customized 404-error page. If this happens seamlessly, it is up to
robot to decide if
target file is a robots.txt file or an html file. Typically it would not cause many problems but you may not want to risk it. It's always a better idea to put
standard robots.txt file in
root directory, than not having it at all.
The standard robots.txt file for allowing all robots to index all pages is:
User-agent: * Disallow:
Using # Carefully in
robots.txt file: Adding comments after
syntax commands is not a good idea using "#". Some robots might misinterpret
line although it is acceptable as per
robots exclusion standard. New lines are always preferred for comments.
Using
robots.txt file : -
Robots are configured to read text. Too much graphic content could render your pages invisible to
search engine. Use
robots.txt file to block irrelevant and graphic-only content.
Indiscriminate access to all files, it is believed, can dilute relevance to your site content after being indexed by robots. This could seriously affect your site's ranking with search engines. Use
robots.txt file to direct robots to content relevant to your site's theme by blocking
irrelevant files or directories.
The robots.txt file can be used for multilingual websites to direct robots to relevant content for relevant topics for different languages. It ultimately helps
search engines to present relevant results for specific languages. It also helps
search engine in its advanced search options where language is a variable.
Some robots could cause severe server loading problems by rapid firing too many requests at peak hours. This could affect your business. By excluding some robots that might be irrelevant to your site, in
robots.txt file, this problem can be taken care of. It is really not a good idea to let malevolent robots use up precious bandwidth to harvest your emails, images etc.
Use
robots.txt file to block out folders with sensitive information, text content, demo areas or content yet to be approved by your editors before it goes live. The robots.txt file is an effective tool to address certain issues regarding website ranking. Used in conjunction with other SEO strategies, it can significantly enhance a website's presence on
net.
Related Reading : -
A Standard for Robots Exclusion.
Guide to The Robots Exclusion Protocol
W3C Recommendations
Article last updated : 11th March 2004
(c) Copyright 2004 Jagdeep.S. Pannu, SEORank ----------------------------------------------
This Article is Copyright protected. If you have comments; or would like to have this article republished on your site, please contact
author here: SEO Articles Feedback. We just require all due credits carried; and text, hyperlinks and headers unaltered. This article must not be used in unsolicited mail.

Jagdeep.S.Pannu is Manager-Online Marketing, at www.SEORank.com, a leading Search Engine Optimization Services Company.