PREFILTER clause - Specify the external prefilter algorithm

Prefiltering is the process of extracting text data from a file types such as Word, PDF, HTML, and XML. In the context of text indexing, prefiltering allows you to extract only the data you want indexed, and avoid indexing unnecessary content such HTML tags. For certain types of documents (for example, Microsoft Word documents), prefiltering is required to make full text indexes useful.

SQL Anywhere does not provide a built-in prefilter feature. However, you can create an external prefilter library to perform prefiltering according to your requirements, and then alter your text configuration object to point to it.

The following table explains the impact that the value of PREFILTER EXTERNAL NAME has on text indexing and on how query strings are handled:

Text indexes Query strings
  • GENERIC and NGRAM text indexes   An external prefilter takes an input value (a document) and filters it according to the rules specified by the prefilter library. The resulting text is then passed to the term breaker before building or updating the text index.

  • GENERIC and NGRAM text indexes   Query strings are not passed through a prefilter, so the setting of the PREFILTER EXTERNAL NAME clause has no impact on query strings.

The ExternalLibrariesFullText directory in your SQL Anywhere install contains prefilter and term breaker sample code for you to explore. This directory is found under your Samples directory. For the location of your Samples directory, see Samples directory.

 See also