Review: Implementing the Google Search Appliance in an Intranet environment

Written by Bryan A. Mjaanes


Continued from page 1

Afterrepparttar crawl is donerepparttar 132266 collection is automatically indexed and then checked againstrepparttar 132267 Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) andrepparttar 132268 collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

Crawling configuration After your first crawl you may findrepparttar 132269 need to go back and tweakrepparttar 132270 crawling parameters. Google gives you a good amount of control over how sites are crawled,repparttar 132271 frequency, how many threads are used, etc. For sites with security,repparttar 132272 GSA supports Basic Authentication and an additional security module is available which supports Forms Authentication. The most challenging configuration aspects for us were determiningrepparttar 132273 right combination of URL patterns to exclude fromrepparttar 132274 search. If you are a Domino shop and looking to userepparttar 132275 GSA you may need to spend some time gettingrepparttar 132276 crawler configuration just right to supportrepparttar 132277 sometimes convoluted Domino query string parameters.

After we gotrepparttar 132278 crawl parameters tuned andrepparttar 132279 first complete crawl done we did some testing to see ifrepparttar 132280 crawler grabbed allrepparttar 132281 content. Browsing our site and testing with some strings buried deep insiderepparttar 132282 taxonomy we always foundrepparttar 132283 GSA had crawled them accurately. We also did some testing with strings inside PDF documents, PowerPoint presentations andrepparttar 132284 like. When we did come across something that hadn’t been crawled a careful analysis led us to discover that we needed to do some more tweaking ofrepparttar 132285 crawl settings.

Other notable features

Google also gives you a KeyMatch tool that allows you to specify which indexed documents should appear atrepparttar 132286 top ofrepparttar 132287 results page for a given query. These manifest themselves almost identically torepparttar 132288 Sponsored Links atrepparttar 132289 top ofrepparttar 132290 results page ofrepparttar 132291 Google we all use. A Synonym tool allows you to specify alternate words or phrases for search queries. For example, if someone searches for WCM, you can suggest “Web Content Management” atrepparttar 132292 top ofrepparttar 132293 results page.

An output format feature lets you control (via an XSLT)repparttar 132294 presentation ofrepparttar 132295 search results. You can use this for changingrepparttar 132296 fonts, colors, logo, header, etc. ofrepparttar 132297 results page. We were able to easily removerepparttar 132298 “Cached” feature onrepparttar 132299 results page with some XSLT modifications. The Reporting tool lets you run reports on search queries over various time ranges. It will show yourepparttar 132300 number of searches per day, per hour,repparttar 132301 top 100 keywords and top 100 queries forrepparttar 132302 time period specified.

Downsides

The GSA is not for organizations looking to index their shared network drives asrepparttar 132303 appliance has no facility for crawling file systems. This is really too bad as many companies struggle withrepparttar 132304 huge quantities of unstructured content on stored on their networks. Of course, there are a plethora of other products out there for exactly this issue.

Access directly to databases (e.g. SQL, Oracle, etc.) is another area which is off-limits forrepparttar 132305 GSA as well as any kind of integration with content or document management systems.

Conclusion The Google Search Appliance (GSA) is an excellent search product for HTTP-accessible content. It gives great control over administrative features such as crawler configuration and results serving and sufficient reporting capabilities as well. Those looking for a solution to integrate directly with a content/document management system, databases, or indexing network drives should look to another product. However, if you have a intranet or intranet site with plenty of HTML-based contentrepparttar 132306 GSA may be just what you need.

Bryan Mjaanes is the creator/editor of Intranet101.com, a community-based forum for Intranet professionals.


Website Design, Website Development & flash design Services.

Written by Tripmedia Pvt. Ltd.


Continued from page 1
development, icons outsourced projects from Europe,US,UK

Website design & development, design services flash outsourcing, development, icons outsourced projects from Europe,US,UK.


    <Back to Page 1
 
ImproveHomeLife.com © 2005
Terms of Use