home
NEWS       BLOGS       FORUMS       NEWSLETTERS       RESEARCH       EVENTS       DIGITAL LIBRARY       CAREERS  
Network Computing Network Computing Powered by InformationWeek Business Technology Network

IMMERSE YOURSELF:

SOA

  |

Data Center

  |

802.11n

  |

Data Privacy

  |
APO  |

Virtualization

  |

NAC

  |

Security

  |

Network Mgmt

  |

Enterprise Apps

  |

Storage & Servers


Internal Search Engines Get Yo u Where You Want To Go


Free Web Search Tools

We found three search engines with attractive price tags: They're free. But, on the downside, these tools typically require more babysitting effort on your part, and may even require that you compile the source code to produce executable files that will run on a computer in your office. The tools are WebGlimpse from the University of Arizona, Simple Web Indexing System for Humans (SWISH) from Enterprise Integration Technologies and freeWAIS from the European Microsoft Windows NT Academic Consortium (EMWAC).

All three index only HTML and plain-text files, and they lack such niceties as stemming and proximity searching. They're appropriate if your budget for Web tools is small or nonexistent. They're also useful as learning environments if you're contemplating buying your first search engine and want some hands-on experience before deciding on a particular tool. Here's a closer look at the tools.

University of Arizona's WebGlimpse 4.0B1 Glimpse (which stands for GLobal IMPlicit SEarch) is a search tool similar to the Unix egrep command, and WebGlimpse is a Web site search engine that uses Glimpse. In the same tests the commercial products underwent, we found WebGlimpse to be basic but serviceable. It let us perform keyword searches of our intranet with Boolean operators, gave us a choice of how many misspellings to ignore, indexed only new and revised Web pages, and offered a flexible scheduling system.

The word-based WebGlimpse index doesn't lend itself to phrase searching, however. WebGlimpse overcomes this weakness by first searching for individual words in a phrase. It then uses the intermediate results to iteratively search for the second word in the result set containing the first word, search for the first word in the re sult set containing the second word, and so on. The process is very CPU-intensive.

WebGlimpse executables are available for OSF/1 Digital Alpha, SPARC Solaris, Linux, HP-UX, SGI Irix 5.3 and UnixWare. You can download the WebGlimpse source code if you want to compile it for a different machine. Source and executable files are at glimpse.cs.arizona.edu.

SWISH 1.1 Useful for simple Web sites, SWISH did a good job of indexing small collections of HTML files in the lab, but it choked when fed thousands of Web pages.

Interestingly, WebIndex, the indexer that comes with O'Reilly & Associates' WebSite server for Windows NT and Windows95, is a derivative of SWISH. You launch WebIndex from WebSite's GUI administration tool, and it prompts you graphically for URLs to include in the index; one mouse click and it begins indexing.

Using a subset of our intranet's test data in our lab, the SWISH search engine successfully let us use Boolean operators to search for words, phrases and fields (HTML metadata).

SWISH is available at www.eit.com.

freeWAIS 0.3 freeWAIS (Wide Area Information Servers) is another basic search engine with few features, but it's suitable for very small Web sites. Testing revealed freeWAIS offers keyword searching, and not much more. The only Boolean operator it offers is OR; entering multiple search words caused freeWAIS to look for all the search arguments. There's no phrase search or proximity search capability.

In lieu of phrase searching, however, freeWAIS orders its results by the frequency of matched words it finds in the index. In our tests, freeWAIS produced a ranking of result documents in which the first entries fit the query better and later entries didn't fit as well. So, we found documents containing all search terms near the beginning of the ranking list and documents containing few search terms near the end.

Making freeWAIS available on our intranet was simple. After we created an index named "index," we only had to insert the HTML keyword "<isindex>" in a document called INDEX.HTM. The browser then displayed the standard, "This is a searchable index. Enter search keywords." When we entered a search term, the Web server passed it to waislook, the freeWAIS software component for searching the index and returning HTML-formatted results.

While administering freeWAIS and indicating which files to include or exclude from the indexer, we found that it has only a feeble exclusion mechanism. You can exclude wild-carded file names, but not wild-carded directory paths.

The EMWAC at the University of Edinburgh in Scotland has written various Internet tools for Windows NT and distributes them as freeware at emwac.ed.ac.uk. One of these is freeWAIS. Microsoft includes freeWAIS on the Windows NT Resource Kit CD. In addition, you can download it from Process Software's Web site at www.process.com. Versions of freeWAIS for many flavors of Unix are avai lable from the Clearinghouse for Networked Information Discovery and Retrieval (CNIDR) at ftp:/ /ftp.cnidr.org.





Updated October 15, 1997





Ready to take that job and shove it?

Function:

Keyword(s):

State:
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.










InformationWeek U.S. IT Salary Survey 2008
Salaries for business technology professionals are falling. Here's what you need to know in order to make good hiring decisions and personal career choices. Download Today
 
ROLLING RIGHT ALONG
Follow key Network Computing Reviews from conception to completion. This Week: Holistic APM.



Network Computing Reports Emerging Enterprise Podcast Series: Secrets to Success








TechSearch


Microsite of the Week


Powerful Information at Your Fingertips



InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space


App Infrastructure   |   Messaging & Collaboration   |   Network & Systems Mgmt   |   Network Infrastructure   |   Security  |   Storage & Servers   |   Wireless   |   Enterprise Apps
About Us  |  Contact Us  |  Site Map  |  Technology Marketing Solutions  |  Advertising Contacts  |   Briefing Centers
Copyright © 2008  United Business Media LLC  |  Privacy Statement  |  Terms of Service  |  Your California Privacy Rights