Term Breaker Algorithm (TERM BREAKER)

The TERM BREAKER setting specifies the algorithm to use for breaking strings into terms.

SAP Sybase IQ GENERIC (the default) or NGRAM for storing terms.

Note: NGRAM term breakers store n-grams. An n-gram is a group of characters of length n where n is the value of MAXIMUM TERM LENGTH.

Regardless of the term breaker you specify, the database server records in the TEXT index the original positional information for terms when they are inserted into the TEXT index. In the case of n-grams, the positional information of the n-grams is stored, not the positional information for the original terms.

TERM BREAKER impact
To TEXT index To query terms
  • GENERIC TEXT index – when building a GENERIC TEXT index (the default), groups of alphanumeric characters appearing between non-alphanumeric characters are processed as terms by the database server. After the terms have been defined, terms that exceed the term length settings, and terms found in the stoplist, are counted but not inserted in the TEXT index.

    Performance on GENERIC TEXT indexes can be faster than NGRAM TEXT indexes. However, you cannot perform fuzzy searches on GENERIC TEXT indexes.

  • GENERIC TEXT index – when querying a GENERIC TEXT index, terms in the query string are processed in the same manner as if they were being indexed. Matching is performed by comparing query terms to terms in the TEXT index.
  • NGRAM TEXT index – when building an NGRAM TEXT index, the database server treats as a term any group of alphanumeric characters between non-alphanumeric characters. Once the terms are defined, the database server breaks the terms into n-grams. In doing so, terms shorter than n, and n-grams that are in the stoplist, are discarded.

    For example, for an NGRAM TEXT index with MAXIMUM TERM LENGTH 3, the string 'my red table' is represented in the TEXT index as these n-grams: red tab abl ble.

  • NGRAM TEXT index – when querying an NGRAM TEXT index, terms in the query string are processed in the same manner as if they were being indexed. Matching is performed by comparing n-grams from the query terms to n-grams from the indexed terms.