Working with the robots.txt file

Written by Jagdeep.S.Pannu


Working withrepparttar robots.txt file

What isrepparttar 127956 robots.txt file?

Working withrepparttar 127957 robots.txt file

Advantages of robots.txt

Disadvantages ofrepparttar 127958 robots.txt file

Optimization ofrepparttar 127959 robots.txt file

Usingrepparttar 127960 robots.txt file

What isrepparttar 127961 robots.txt file?

The robots.txt file is an ASCII text file that has specific instructions for search engine robots about specific content that they are not allowed to index. These instructions arerepparttar 127962 deciding factor of how a search engine indexes your website's pages. The universal address ofrepparttar 127963 robots.txt file is: www.domain.com/robots.txt . This isrepparttar 127964 first file that a robot visits. It picks up instructions for indexingrepparttar 127965 site content and follows them. This file contains two text fields. Lets study this robots.txt example :

User-agent: * Disallow:

The User-agent field is for specifying robot name for whichrepparttar 127966 access policy follows inrepparttar 127967 Disallow field. Disallow field specifies URLs whichrepparttar 127968 specified robots have no access to. A robots.txt example :

User-agent: * Disallow: /

Here "*" means all robots and "/ " means all URLs. This is read as, " No access for any search engine to any URL" Since all URLs are preceded by "/ " so it bans access to all URLs when nothing follows after "/ ". If partial access has to be given, onlyrepparttar 127969 banned URL is specified inrepparttar 127970 Disallow field. Lets consider this robots.txt example :

# Research access for Googlebot. User-agent: Googlebot Disallow:

User-agent: * Disallow: /concepts/new/

Here we see that bothrepparttar 127971 fields have been repeated. Multiple commands can be given for different user agents in different lines. The above commands mean that all user agents are banned access to /concepts/new/ except Googlebot which has full access. Characters following # are ignored up torepparttar 127972 line termination as they are considered to be comments.

Working withrepparttar 127973 robots.txt file : -

The robots.txt file is always named in all lowercase (e.g. Robots.txt or robots.Txt is incorrect)

Wildcards are not supported in bothrepparttar 127974 fields. Only * can be used inrepparttar 127975 User-agent fields' command syntax because it is a special character denoting "all". Googlebot isrepparttar 127976 only robot that now supports some wildcard file extensions. Ref: http://www.google.com/webmasters/faq.html#12

The robots.txt file is an exclusion file meant for search engine robot reference and not obligatory for a website to function. An empty or absent file simply means that all robots are welcome to index any part ofrepparttar 127977 website.

Only one robots.txt file can be maintained per domain.

Website owners who do not have administrative rights cannot sometimes make a robots.txt file. In such situations,repparttar 127978 Robots Meta Tag can be configured which will solverepparttar 127979 same purpose. Here we must keep in mind that lately, questions have been raised about robot behavior regardingrepparttar 127980 Robot Meta Tag. Some robots might skip it altogether. Protocol makes it obligatory for all robots to start withrepparttar 127981 robots.txt thereby making itrepparttar 127982 default starting point for all robots.

Separate lines are required for specifying access to different user agents and Disallow field should not carry more than one command in a line inrepparttar 127983 robots.txt file. There is no limit torepparttar 127984 number of lines though i.e. bothrepparttar 127985 User-agent and Disallow fields can be repeated with different commands any number of times. Blank lines will also not work within a single record set of bothrepparttar 127986 commands.

Use lower-case for all robots.txt file content. Please also note that filenames on Unix systems are case sensitive. Be careful about case sensitivity when defining directory or files for Unix hosted domains. You can use this great tool to check your robots.txt from www.searchengineworld.com:

The robots.txt Validator

Please note thatrepparttar 127987 full path torepparttar 127988 robots.txt file must be entered inrepparttar 127989 field.

Advantages ofrepparttar 127990 robots.txt file : -

Protocol demands that all search engine robots start withrepparttar 127991 robots.txt file. This isrepparttar 127992 default entry point for robots ifrepparttar 127993 file is present. Specific instructions can be placed on this file to help index your site onrepparttar 127994 web. Major search engines will never violaterepparttar 127995 Standard for Robots Exclusion.

The robots.txt file can be used to keep out unwanted robots like email retrievers, image strippers etc. The robots.txt file can be used to specifyrepparttar 127996 directories on your server that you don't want robots to access and/or index e.g. temporary, cgi, and private/back-end directories. An absent robots.txt file could generate a 404 error and redirectrepparttar 127997 robot to your default 404 error page. Here it was noticed after careful research that sites that do not have a robots.txt file present and had a customized 404-error page, would serverepparttar 127998 same torepparttar 127999 robots. The robot is bound to treat it asrepparttar 128000 robots.txt file, which can confuse it's indexing. The robots.txt file is used to direct select robots to relevant pages to be indexed. This specially comes in handy whererepparttar 128001 site has multilingual content or whererepparttar 128002 robot is searching for only specific content. The need forrepparttar 128003 robots.txt file was also felt to stop robots from deluging servers with rapid-fire requests or re-indexingrepparttar 128004 same files repeatedly. If you have duplicate content on your site for any reason,repparttar 128005 same can be controlled from getting indexed. This will help you avoid any duplicate content penalties.

Disadvantages ofrepparttar 128006 robots.txt file : -

Careless handling of directory and filenames can lead hackers to snoop around your site by studyingrepparttar 128007 robots.txt file, as you sometimes may also list filenames and directories that have classified content. This is not a serious issue as deploying some effective security checks torepparttar 128008 content in question can take care of it. For example if you have your traffic log on your site on a URL such as www.domain.com/stats/index.htm which you do not want robots to index, then you would have to add a command to your robots.txt file. As an example:

User-agent: * Disallow: /stats/

However, it is easy for a snooper to guess what you are trying to hide and simply typingrepparttar 128009 URL www.domain.com/stats in his browser would enable access torepparttar 128010 same. This calls for one ofrepparttar 128011 following remedies -

The Basics of Search Engine Optimization (S.E.O.)

Written by Jeff McIntire-Strasburg, Ph.D.


One means of making sure your site receives maximum visibility is search engine optimization (S.E.O.). You must think in terms of search engine optimization fromrepparttar earliest stages of planning and design if you want to receive heavy traffic fromrepparttar 127955 major search engines. In order to optimize your site, you need to understand how search engines work, how web surfers search, and what elements of your site contribute to successful search placement. While much information (and misinformation) exists about search engine optimization, experts agree thatrepparttar 127956 elements you must focus on are keywords, meta tags and links from other sites.

How Search Engines Work

Before delving into specific elements, it’s necessary to understand how search engines work. True search engines (as opposed to directories like Yahoo!) use a program called a “robot” or “spider” to “crawl”repparttar 127957 web while gathering and indexingrepparttar 127958 information they find. A searcher receivesrepparttar 127959 results of these crawls when s/he conducts a search. Sounds simple, right? Unfortunately, every search engine (and literally thousands of engines exist) has its own criteria for indexing sites, and further criteria for determiningrepparttar 127960 ranking of sites. It’s easy to see why S.E.O. consultants can make a handsome living – it’s virtually impossible for a single webmaster to know all ofrepparttar 127961 various rules that will optimize his/her site for every engine.

Feeling Overwhelmed Yet?

While a search engine optimization consultant can provide a greater amount of detailed knowledge for an individual web master, experts tend to agree that there are a few rules that almost anyone can apply to optimize his/her site.

First, while thousands of search engines exist, most surfers do their searching on a handful of sites. Yahoo! generally accounts for half of all web searches. While they charge a fee for simply considering a business listing in their directory (currently $299), many business people may find that a reasonable investment, consideringrepparttar 127962 popularity of this portal. Among true search engines,repparttar 127963 most popular by far are Google and Overture. It’s important to know that many search engines actually get there results from these giants, so applyingrepparttar 127964 rules that work for these two can help get your site listed in many engines.

Optimizing You Site

In order to make sure your site is listed in searches that directly apply to your type of business, it’s essential to userepparttar 127965 best keywords and phrases. Keywords arerepparttar 127966 words that searchers use to find web sites that will give themrepparttar 127967 information they want. In order to take advantage ofrepparttar 127968 power of keywords, you must find out what keywords and key phrases are most popular amongrepparttar 127969 searchers you want to attract to your site, so you may want to consider using a service like Wordtracker (http://www.wordtracker.com) to find outrepparttar 127970 words and phrases that searchers are using. Once you’ve determined this, you want to place those words and phrases atrepparttar 127971 points in your page most likely to be noticed by search engines. Experts tend to agree thatrepparttar 127972 best places to use keywords are in your page title, your page’s copy, and inrepparttar 127973 site’s meta tags.

The page title is likelyrepparttar 127974 most important place to userepparttar 127975 proper keywords, as all search engine spiders tend to look at it. Precision is key – userepparttar 127976 keywords that are most likely to bringrepparttar 127977 right visitors to that particular page. While some consultants suggest using many key words, or many slight variations of key words, or repeated key words, search engines have caught on to these strategies and may actually penalize your site for using such tactics. According to Webmonkey.com’s Paul Boutin, it’s best not to include extra words, such as a company name; using onlyrepparttar 127978 keywords as site title will generally bringrepparttar 127979 best results.

Cont'd on page 2 ==>
 
ImproveHomeLife.com © 2005
Terms of Use