Continued from page 1
* June 4 - Teoma returns and crawls 590 pages! No others.
* June 5 - Teoma returns and crawls 1902 pages! No others. * June 6 - Teoma returns and crawls 290 pages. No others. * June 7 - Teoma returns and crawls 471 pages. No others.
* June 8-14 Odd spider behavior, looking at robots.txt only.
* June 15 - Slurp gets thirsty, gulps 1396 pages! No others.
* June 16 - Slurp still thirsty, gulps 1379 pages! No others.
So we'll take a break here at
5 weeks point and take note of
very different behavior of
top crawlers. Googlebot visits once and looks at a substantial number of pages but doesn't return for over a month. Slurp finds bad links and seems addicted to them as it stops crawling good pages until it is told to lay off
bad liquor, er that is links by getting robots.txt to slap slurp to its senses. MSNbot visits looking for that robots.txt and won't crawl any pages until told what NOT to do by
robots.txt file. Teoma just crawls like crazy, takes breaks, then comes back for more.
This behavior may imitate
differing personalities of
software engineers who designed them. Teoma is tenacious and hard working. MSNbot is timid and needs instruction and some reassurance it is doing
right thing, picks up pages slowly and carefully. Slurp has addictive personality and performs erratically on a random schedule. Googlebot takes a good long look and leaves. Who knows whether it will be back and when.
Now let's look at indexing by each engine. As of this writing on July 7, each engine also shows differing indexing behavior as well. Google shows no pages indexed although it crawled 250 pages nearly two months ago. Yahoo has three pages indexed in a clear aging routine that doesn't list any of
nearly 8,000 pages it has crawled to date (not all itemized above.) MSN has 187 pages indexed while crawling fewer pages than any of
others. Ask Jeeves has crawled more pages to date than any search engine, yet has not indexed a single page.
Each of
engines will show
number of pages indexed if you use
query operator "site:publish101.com" without
quotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.
The daily activity not listed in
three weeks since June 16 above has not varied dramatically, with Teoma crawling a bit more than other engines, Slurp erratically up and down and MSN slowly gathering 30 to 50 pages daily. Google is absent.
Linking campaign has been minimal with posts to discussion lists, a couple of articles and some blog activity. Looking back over this time it is apparent that a listing delay is actually quite sensible from
view of
search engines. Our site restructuring and bobbled robots.txt implementation seems to have abruptly stalled crawling but
indexing behavior of each engine displays distinctly differing policy by each major player.
The sandbox is apparently not just Google's playground, but it is certainly tiresome after nearly two months. I think I'd like to leave for home, have some lunch and take a nap now.
Back to class before we leave for
day kiddies. What did we learn today? Watch early crawler activity and be certain to implement robots.txt early and adjust often for bad bots. Oh yes, and
sandbox belongs to all search engines.

Mike Banks Valentine is a search engine optimization specialist who operates http://WebSite101.com and will continue reports of case study chronicling search indexing of http://Publish101.com