Our corporate intranet is a non-framed environment with both Lotus Domino and IIS (.Net and classic ASP) applications and content. We have between 300,000-500,000 pages of web content and documents across more than 1200 “sites” on approximately 30 unique domains. We used to have Inktomi’s UltraSeek Server 3.0 as our intranet search engine which was beginning to look like its age (purchased in 1998). The Inktomi product did not handle attachments well (DOC, PPT, PDF, etc.), would not crawl our secured sites, and was no longer supported by
vendor. We did a cursory review of
search vendors and were immediately attracted to Google’s 30 day trial offer for their Google Search Appliance (GSA). After signing a standard agreement, they shipped us a brand new shiny yellow unit which we could test for 30 days before returning or purchasing.Product info The GSA is a “black box” 1U standard rack-mountable server. By “black box” I mean, Google gives you a web interface to administer
device but do not want you to access
Operating System (a heavily Google-customized version of Linux). In fact,
license agreement stipulates that you will not tamper with
hardware or OS of
appliance in any way. The device has no need for a keyboard, mouse or video – all you need for normal operation is a network cable and standard power input.
The GSA comes in different flavors to fit different needs varying by size of
hardware and correspondingly size of
license. (Licensing is based on
number of URLs crawled by
appliance.) There are 3 different hardware configurations;
GB-1001, GB-5005, and GB-800. These are broken down as follows;
- GB-1001 – 150K documents for $28K, 300K documents for $50K
- GB-5005 – 1.5M documents for $230K
- GB-8008 – 4M documents for $450K
Why Google? As advertised,
GSA met all of our needs being able to index
large variety of filetypes we have in our environment, access secured content, having a documented API, etc. The Google brand power was another big selling factor. When we told our users that they were going to get a Google-based search engine they knew their days of troubled searching were over. Lastly,
30-day trial run experience we had with
GSA sealed
deal. The appliance is
easiest enterprise solution I’ve ever had to install, configure and maintain. We were literally up and running within an hour of opening
shipping box.
Installation The appliance has two network ports on
back panel; one for normal operation and
other used exclusively for network configuration. To configure
network settings we connected a laptop to
appliance via a special (some pin-outs are non-standard) orange Ethernet cable which is included. The installation process was about as easy as one can imagine for a “black box.”
First we plugged in
normal operation network cable and then
power. The power plug on
appliance IS
power switch; plug it in to turn on and unplug it to turn it off. After plugging it in, we waited about 5 minutes for
appliance to play a tune which is
signal to continue. Next, we hooked up our laptop (already set to DHCP mode) to
appliance and powered it up. After logging in to our laptop and making sure we had
correct IP assigned by
appliance’s built-in DHCP server we are ready to configure
network settings. Total elapsed time (excluding rack mounting): 10 minutes.
Configuration Network configuration, like normal administration, is done entirely through a browser and is a simple 5 step process. The first screens ask you for basic network information; IP address, subnet mask, default gateway, and DNS. Subsequent screens collect SMTP server, “From” address for GSA notification messages, time zone, NTP (time) servers and
admin account name/password. The last step is to test a few URLs which you will be crawling to make sure you’ve done
setup correctly. After a final settings review screen configuration is complete and you can then unplug your laptop and get to
good part; start crawling. Total elapsed time: 10 minutes.
Crawling
site(s) Using
URL provided, all administration of
GSA is done remotely. After logging in with
ID/password we provided in
previous step, we were presented with
Administration console. We created a new collection to hold our index, put in
“Start crawling from” URL, copied that same URL into
“Follow and Crawl only URLs with
Following patterns” box and we were done. We saved our settings and then clicked
“Start crawling” button. We then went over to
“Crawl status” screen and watched
“Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found
crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.