Our corporate intranet is a non-framed environment with both Lotus Domino and IIS (.Net and classic ASP) applications and content. We have between 300,000-500,000 pages of web content and documents across more than 1200 “sites” on approximately 30 unique domains. We used to have Inktomi’s UltraSeek Server 3.0 as our intranet search engine which was beginning to look like its age (purchased in 1998). The Inktomi product did not handle attachments well (DOC, PPT, PDF, etc.), would not crawl our secured sites, and was no longer supported by vendor. We did a cursory review of search vendors and were immediately attracted to Google’s 30 day trial offer for their Google Search Appliance (GSA). After signing a standard agreement, they shipped us a brand new shiny yellow unit which we could test for 30 days before returning or purchasing.Product info The GSA is a “black box” 1U standard rack-mountable server. By “black box” I mean, Google gives you a web interface to administer device but do not want you to access Operating System (a heavily Google-customized version of Linux). In fact, license agreement stipulates that you will not tamper with hardware or OS of appliance in any way. The device has no need for a keyboard, mouse or video – all you need for normal operation is a network cable and standard power input.
The GSA comes in different flavors to fit different needs varying by size of hardware and correspondingly size of license. (Licensing is based on number of URLs crawled by appliance.) There are 3 different hardware configurations; GB-1001, GB-5005, and GB-800. These are broken down as follows;
- GB-1001 – 150K documents for $28K, 300K documents for $50K
- GB-5005 – 1.5M documents for $230K
- GB-8008 – 4M documents for $450K
Why Google? As advertised, GSA met all of our needs being able to index large variety of filetypes we have in our environment, access secured content, having a documented API, etc. The Google brand power was another big selling factor. When we told our users that they were going to get a Google-based search engine they knew their days of troubled searching were over. Lastly, 30-day trial run experience we had with GSA sealed deal. The appliance is easiest enterprise solution I’ve ever had to install, configure and maintain. We were literally up and running within an hour of opening shipping box.
Installation The appliance has two network ports on back panel; one for normal operation and other used exclusively for network configuration. To configure network settings we connected a laptop to appliance via a special (some pin-outs are non-standard) orange Ethernet cable which is included. The installation process was about as easy as one can imagine for a “black box.”
First we plugged in normal operation network cable and then power. The power plug on appliance IS power switch; plug it in to turn on and unplug it to turn it off. After plugging it in, we waited about 5 minutes for appliance to play a tune which is signal to continue. Next, we hooked up our laptop (already set to DHCP mode) to appliance and powered it up. After logging in to our laptop and making sure we had correct IP assigned by appliance’s built-in DHCP server we are ready to configure network settings. Total elapsed time (excluding rack mounting): 10 minutes.
Configuration Network configuration, like normal administration, is done entirely through a browser and is a simple 5 step process. The first screens ask you for basic network information; IP address, subnet mask, default gateway, and DNS. Subsequent screens collect SMTP server, “From” address for GSA notification messages, time zone, NTP (time) servers and admin account name/password. The last step is to test a few URLs which you will be crawling to make sure you’ve done setup correctly. After a final settings review screen configuration is complete and you can then unplug your laptop and get to good part; start crawling. Total elapsed time: 10 minutes.
Crawling site(s) Using URL provided, all administration of GSA is done remotely. After logging in with ID/password we provided in previous step, we were presented with Administration console. We created a new collection to hold our index, put in “Start crawling from” URL, copied that same URL into “Follow and Crawl only URLs with Following patterns” box and we were done. We saved our settings and then clicked “Start crawling” button. We then went over to “Crawl status” screen and watched “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.