Developing and configuring customized filters

Sybase Search uses a third-party solution, Stellent, for parsing many document formats. The Stellent document filter is a multi-filter—in other words, the same filter instance handles all supported MIME types. Thus, the Stellent filter is configured to handle the MIME type */*, indicating that it can filter text from documents of any MIME type presented to it.

When Sybase Search obtains a filter for a document, it first identifies its MIME type from the file extension. For example, C:\document.pdf has the MIME type “application” and the subtype “pdf” (application/pdf). Sybase Search then requests a filter from the Filter Factory to handle documents with the identified MIME type.

The filter look-up is performed in this order:

  1. If a filter is configured to handle a specific MIME type, that filter instance is returned.

  2. If a multi-filter (*/*) is configured, that filter instance is returned.

  3. No filter is returned, denoting “not indexable.”

You can add additional filters by editing the XML configuration file located in <installLocation>\OmniQ\config\FilterFactory.default.xml. See “Configuring modules” for information about the FilterFactory.default.xml file.