There has been endless webmaster speculation and worry about
so-called "Google Sandbox" -
indexing time delay for new domain names - rumored to last for at least 45 days from
date of first "discovery" by Googlebot. This recognized listing delay came to be called
"Google Sandbox effect."Ruminations on
algorithmic elements of this sandbox time delay have ranged widely since
indexing delay was first noticed in spring of 2004. Some believe it to be an issue of one single element of good search engine optimization such as linking campaigns. Link building has been
focus of most discussion, but others have focused on
possibility of size of a new site or internal linking structure or just specific time delays as most relevant algorithmic elements.
Rather than contribute to this speculation and further muddy
Sandbox, we'll be looking at a case study of a site on a new domain name, established May 11, 2005 and
specific site structure, submissions activity, external and internal linking. We'll see how this plays out in search engine spider activity vs. indexing dates at
top four search engines.
Ready? We'll give dates and crawler action in daily lists and see how this all plays out on this single new site over time.
* May 11, 2005 Basic text on large site posted on newly purchased domain name and going live by days end. Search friendly structure implemented with text linking making full discovery of all content possible by robots. Home page updated with 10 new text content pages added daily. Submitted site at Google's "Add URL" submission page.
* May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google. (Slurp is Yahoo's spider and Teoma is from Ask Jeeves) Posted link on WebSite101 to new domain at Publish101.com
* May 15 - Googlebot arrives and eagerly crawls 245 pages on new domain after looking for, but not finding
robots.txt file. Oooops! Gotta add that robots.txt file!
* May 16 - Googlebot returns for 5 more pages and stops. Slurp greedily gobbles 1480 pages and 1892 bad links! Those bad links were caused by our email masking meant to keep out bad bots. How ironic slurp likes these. * May 17 - Slurp finds 1409 more masking links & only 209 new content pages. MSNbot visits for
first time and asks for robots.txt 75 times during
day, but leaves when it finds that file missing! Finally get around to add robots.txt by days end & stop slurp crawling email masking links and let MSNbot know it's safe to come in!
* May 23 - Teoma spider shows up for
first time and crawls 93 pages. Site gets slammed by BecomeBot, a spider that hits a page every 5 to 7 seconds and strains our resources with 2409 rapid fire requests for pages. Added BecomeBot to robots.txt exclusion list to keep 'em out.
* May 24 - MSNbot has stopped showing up for a week since finding
robots.txt file missing. Slurp is showing up every few hours looking at robots.txt and leaving again without crawling anything now that it is excluded from
email masking links. BecomeBot appears to be honoring
robots.txt exclusion but asks for that file 109 times during
day. Teoma crawls 139 more pages.
* May 25 - We realize that we need to re-allocate server resources and database design and this requires changes to URL's, which means all previously crawled pages are now bad links! Implement subdomains and wonder what now? Slurp shows up and finds thousands of new email masking links as
robots.txt was not moved to new directory structures. Spiders are getting errors pages upon new visits. Scampering to put out fires after wide-ranging changes to site, we miss this for a week. Spider action is spotty for 10 days until we fix robots.txt