Chapter 1: Retrieving Information Intelligently

Managing unstructured information

In many organizations, employees keep their research, financial projections, and presentations on local PCs or team-shared space on the company network. Accessing such information – or even finding it – often proves difficult when staff or storage rules and structures change.

Sybase Search technology

Sybase Search technology extracts and processes the text content from file systems and databases where the content is unstructured. The ability to automatically process this content removes the need to index or describe information manually and allows organizations to automate such common business operations as data capture, retrieval, and linking. Sybase Search technology offers an efficient and cost-effective solution for searching unstructured information, regardless of the format and the language in which the content is written.

The Internet offers familiarity with keyword search, which is the most common type of search-and-retrieval technology. Most people are familiar with the process of retrieving the information by typing one or two relevant keywords into a search engine.

Keyword search technology requires a business to identify documents by associating keywords with the document, which are then used for subsequent retrieval. This process, known as document “tagging,” can be costly and time-consuming.

Sybase Search provides the means to automatically capture and retrieve information based on concepts rather than keywords. Through the use of proprietary algorithms, Sybase Search delivers a language-independent product capable of operating without the costly overhead associated with tagging. This provides an essential tool for managing the proliferation of unstructured information in today’s business environment.

Sybase Search features

Sybase Search offers a number of features that allow today’s organizations to ease or eliminate the time and cost required to support the demands of managing an organization’s unstructured information. These features include:

The support of more than 250 different formats of data, including most types of document, presentation, spreadsheet, and Web content formats
The automatic capture and aggregation all of unstructured data
The elimination of preprocessing or manual tagging of files, greatly improving the accuracy and efficiency of document retrieval
The extraction of paragraphs from matching documents
The ability to find similar documents by automatically providing a set of relevant content that is conceptually related to each document
The ability to scale to millions of documents using a fully distributed architecture
The ability to query and process using natural language
Language independence
A well-defined Java and XML API that allows Sybase Search to be integrated easily into other applications