Search-engine software has two components: an indexer and the actual search engine. Indexers retrieve content, extract words and index them for fast retrieval. Engines interpret queries and locate words, concepts or phrases relevant to the question in the index, then format the output in HTML or XML and
send it to the user or device that initiated the question.
We went looking for enterprise-class search engines--those that work behind a firewall or secure VPN. The vendors had to supply search-engine software or an appliance that supported it. We did not want it bundled with portal software or content-management software. Our contestants also had to be able to search both structured data in databases and unstructured data on Web servers and file stores. And we required support for a variety of document formats, including word processing and presentation and graphics editors.
We required indexers to retrieve content from secure Web pages (HTTPS) and standard HTTP servers and file systems, and to remove duplicate pages. We also required them to extract words from HTML, XML, Microsoft Office and PDF documents, and index the content. Finally, they had to support ODBC or JDBC (Java Database Connectivity) connectors or gateways.
As for the search engines, we asked that they include a spellchecker and support for phrase searching and stemming (grammatical variations) in addition to keyword searching. We also required a prebuilt search form or user interface to test the indexers and search engines.
We sent invitations to 11 vendors. Four stepped up to the table: CSIRO (Commonwealth Scientific and Industrial Research Organisation), Kanisa, Mondosoft and dtSearch Corp. Each sent software products to our Syracuse University Real-World Labs®.
The companies that dropped out, declined or just didn't qualify ran the gamut from small to large. Copernic Technologies didn't qualify because its product doesn't support ODBC or JDBC. Autonomy Corp. and EasyAsk declined to participate but gave no reason. Convera, Dieselpoint and Fast Search & Transfer each said it is working on a new version of its software and declined. Both Verity and Google declined to participate on the basis of company policy, though Verity was changing its policy as this article went to press.
As for our four contestants, we tested their ability to satisfy navigational searches by using Network Computing's production Web site (www.nwc.com), which contains almost 35,000 pages (see "How We Tested,"). We also tested indexing and searching capabilities using informational searches taken directly from the log files on www.nwc.com. Three of the four products we tested performed above average. Only dtSearch came in under par.
We judged the search engines on their ability to retrieve content using an indexer, also called a spider or crawler. We put a heavy emphasis on the search process, including how much control the administrator could assert, and assessed the amount of control that could be applied as well as the overall performance in navigational searches. We also looked at each vendor's management console and how it accomplished installation, configuration and customization tasks on the search-engine portion. And we considered log files and reporting capabilities. Prices were compared across the board.
Panoptic Enterprise Search Engine won our Editor's Choice award. Its secure and easy-to-use administrative interface, navigational deftness and indexing prowess put it on top.