Why Robots.txt?

Written by Matt Benya


I am sure that a lot of you have heard ofrepparttar file named robots.txt (also called a "robot exclusion file") before. But what does this file really pertain to? Basically you can think of a robots.txt file as a list of rules that search engines follow when they spider your site. A robots.txt file gives yourepparttar 132534 Webmaster a say in what does and does not get indexed when spiders come to your little corner ofrepparttar 132535 web.

Okay I can hear a few people asking why anyone would want to keep some things from being indexed. I thoughtrepparttar 132536 goal was to get indexed, right? Well yes and no, there are quite a few instances when blocking spider access to certain areas or pages is almost a must. Here are several examples of what a person might want to restrict access to: temporary files or directories, presentations, information with a specific sequential order, testing directories or cgi-bin. As you can see just from these few examples there are definitely files that you would most certainly want to keep from being indexed. While there is a Meta tag (<meta name="Robots" content="attributes">) available that does in essencerepparttar 132537 same thing as a robots.txt file it is not currently 100% supported by search engines. Another drawback is thatrepparttar 132538 tag needs to go on every page you do not want indexed, as opposed to one central point of control.

Writing 101

All right I have given you a few vague examples as to what might be included in such a file, essentially there is never going to be a set list of things that should and should not be indexed, a robots.txt file needs to be tailored to your site and your content. There is however a very specific format that needs to be followed when creating a robots.txt file.

Step 1: First a robots.txt file needs to be created in Unix format, or Unix line ender mode. The reason for this is to ensure that there are no carriage returns inserted into your file. I would suggest looking at Notepad++, my personal favorite text editor due torepparttar 132539 amount of languages and formatting it supports. Notepad++ is able to create a document directly in Unix format by selectingrepparttar 132540 "Convert to Unix Format" fromrepparttar 132541 "Format" option. Other plain text editors should be able to achieverepparttar 132542 same results however stay away from editors like WordPad or Microsoft Word when creating your robots.txt file. Also I do not recommend using HTML editors for this task.

Step 2: Now lets begin adding some content to our file. A robots.txt file is made up of two fields. The first line isrepparttar 132543 User-agent line. This line specifiesrepparttar 132544 spider/robot that we are intending to limit or allow. An example of this would be:

User-agent: googlebot

In addition to allowing or restricting specific spiders you can use a wildcard and target all spiders coming to your site. To do this you simply need to place an asterisk (*) in for your User-agent. Example:

User-agent: *

Step 3: Now we will begin to disallow our desired content; either a file or a whole directory can be kept from being index with a robots.txt file. We will do this withrepparttar 132545 second line of our filerepparttar 132546 Disallow: directive line. Here is an example:

Disallow: /cgi-bin/

Or for a file:

Disallow: /temp/temp.html

Moreover you are not limited to just one Disallow per User-agent and in fact you can get pretty granular as to what you give spiders access to. Just make sure that you give each Disallow its own line. If you leaverepparttar 132547 Disallow field empty (i.e. Disallow: ) you are giving permission for all files and directories to be indexed.

One word of caution when writing your robots exclusion file; if you are not careful you can shut one or all spider's access to your site off completely. This would be done by prohibiting access atrepparttar 132548 root level by using a slash (/). Example:

Disallow: /

If you were to userepparttar 132549 asterisk wildcard to specify your User-agent withrepparttar 132550 above example you would block all search engines from every part of your site.

Step 4: That is all there is to creating a robots.txt file. The final step is to upload it torepparttar 132551 root directory of your site: www.yoursite.com/. Make sure that you upload it as ASCII just like all other text files and you are done.

Step 5: Writing a robots.txt file is pretty straightforward after you get comfortable withrepparttar 132552 files configuration. Once your file is complete and uploaded it is good practice to have it validated; you can do this through www.searchengineworld.com.

Notes: Aside from search engine specific information you are also able to comment your robots.txt file. This is achieved by usingrepparttar 132553 pound sign (#). Though you can place a comment afterrepparttar 132554 Disallow field it is not recommended. Instead make sure that you begin your comments on a new line starting withrepparttar 132555 pound sign. Example:

# Just making a comment

User-agent: googlebot


Characteristics of Successful URLs

Written by Sue and Chuck DeFiore


While there are many things that make a successful web address, there are certain attributes that all successful web addresses share.

They are:

First, they are short and simple. Shorter is always better. Short web addresses are easier to remember and reducerepparttar possibility of typing errors that can trip up potential visitors.

Second, they are descriptive. Your web address should be a preview of what visitors will encounter when they visit. Your name, in itself, may not provide enough of a clue to your web site's contents to entice visitors to drop in.

Third, it should be memorable. The very best URL's combine simplicity and description with an unique element-often a play on words-that helps potential visitors remember your web site address hours, days, or even months after they first encounter it.

As a result, a successful web address often requires more thought and creativity than immediately obvious. Your first instinct, usually to name your URL after your name or your firm's name, may not-inrepparttar 132531 long run-berepparttar 132532 best choice.

So how does your present or proposed URL measure up:

Is it Short. Is your URL as short and easy to type as possible?

Is it Descriptive. Does your URL do a good job of describingrepparttar 132533 content visitors will find when they visit?

Is it Memorable. Is your URL easy to remember, even if your name or your firm's name isn't? Is it distinctive or humorous?

Remember,repparttar 132534 better your business name,repparttar 132535 easier it will be to choose an URL. Like good business names, effective URLs are short, descriptive and memorable.

Cont'd on page 2 ==>
 
ImproveHomeLife.com © 2005
Terms of Use