How to Control Search Engine Robots

Written by Michael Rock

How to Control Search Engine Robots

Wouldn't it be nice to be able to leave some code in your web site to tell repparttar search engine spider crawlers to make your site number one? Unfortunately a robots.txt file or robots meta tag won't do that, but they can helprepparttar 138175 crawlers to index your site better and block outrepparttar 138176 unwanted ones.

First a little definition explaining:

Search Engine Spiders or Crawlers - A web crawler (also known as web spider) is a program which browsesrepparttar 138177 World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of allrepparttar 138178 visited pages for later processing by a search engine, that will index repparttar 138179 downloaded pages to provide fast searches.

A web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit. As it visits these URLs, it identifies allrepparttar 138180 hyperlinks inrepparttar 138181 page and adds them torepparttar 138182 list of URLs to visit, recursively browsingrepparttar 138183 Web according to a set of policies.

Robots.txt - The robots exclusion standard or robots.txt protocol is a convention to prevent well-behaved web spiders and other web robots from accessing all or part of a website. The information specifyingrepparttar 138184 parts that should not be accessed is specified in a file called robots.txt inrepparttar 138185 top-level directory ofrepparttar 138186 website.

The robots.txt protocol is purely advisory, and relies onrepparttar 138187 cooperation of repparttar 138188 web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught out trying to userepparttar 138189 robots file to make private parts of a website invisible torepparttar 138190 rest ofrepparttar 138191 world. Howeverrepparttar 138192 file is necessarily publicly available and is easily checked by anyone with a web browser.

The robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories haverepparttar 138193 final '/' character appended: otherwise all files with names starting with that substring will match, rather than just those inrepparttar 138194 directory intended.

Meta Tag - Meta tags are used to provide structured data about data.

Inrepparttar 138195 early 2000s, search engines veered away from reliance on Meta tags, as many web sites used inappropriate keywords, or were keyword stuffing to obtain any and all traffic possible.

Some search engines, however, still take Meta tags into some consideration when delivering results. In recent years, search engines have become smarter, penalizing websites that are cheating (by repeatingrepparttar 138196 same keyword several times to get a boost inrepparttar 138197 search ranking). Instead of going up rankings, these websites will go down in rankings or, on some search engines, will be kicked off ofrepparttar 138198 search engine completely.

Index a site - The act of crawling your site and gathering information.

How canrepparttar 138199 robots.txt file and meta tag help you?

Inrepparttar 138200 robots.txt you can tellrepparttar 138201 harmful 'web crawlers' to leave your web site alone, and give helpful hints torepparttar 138202 ones you want to crawl your site. Here is an example on how to disallow a web crawler to search your site:

# this identifiesrepparttar 138203 wayback machine

User-agent: ia_archiver

Disallow: /

ia_archiver isrepparttar 138204 crawler name forrepparttar 138205 wayback machine that you may have heard of, andrepparttar 138206 / after disallow tells ai_archiver not to index any of your site. The # allows you to write comments to yourself so you can keep track of what you typed.

Typerepparttar 138207 above three lines into notepad from your computer and save it torepparttar 138208 root directory of your web site as robots.txt. Web crawlers look for this document first at a web site before doing anything else. This helpsrepparttar 138209 crawler to do its job, and helpsrepparttar 138210 web site owner tellrepparttar 138211 spider what to do. Say for instance you have some data that you don't wantrepparttar 138212 crawlers to see. (Like duplicate content for other browser referrer pages) You can deter crawlers from indexingrepparttar 138213 'duplicate' directory by typing this into your robots.txt file.

Or if you would like to haverepparttar 138214 robots.txt file created for you, visit To validate your robots.txt file to make sure it works properly you can visit

User-agent: *

Where To Look For Dirt-Cheap DVD Software. Part XVII

Written by David D. Deprice

ImTOO DVD Ripper - $29

ImTOO DVD Ripper is a DVD ripping tool easy to use with high ripping speed. It can backup your favorite DVD movie into almost all popular video formats such as VCD, SVCD, 2-pass DivX, MPEG1, MPEG2, MPEG4, AVI etc.

When compared with other DVD rippers, ImTOO DVD Ripper has more settings that you can customize while still being easy to use. For experienced users, there are plenty of options to adjust and customize. But even beginners can take advantage ofrepparttar software by usingrepparttar 138096 provided default settings, which still produce very good quality indeed.

Cont'd on page 2 ==> © 2005
Terms of Use