Altering text configuration objects

Each text configuration object contains the following settings. You can alter these in Sybase Central, or by executing the ALTER TEXT CONFIGURATION statement.

Setting Description
Stoplist

The stoplist setting specifies the terms to ignore when building a text index. If a stoplist term is specified in a full text query, then the query returns an error.

A newly created text configuration object has the same stoplist as the text configuration object that was used as the template (including if the stoplist is empty).

To alter the stoplist, use the ALTER STOPLIST clause of the ALTER TEXT CONFIGURATION statement. For example:

ALTER TEXT CONFIGURATION config-name
STOPLIST string-expression;

To drop a stoplist, use the DROP STOPLIST clause of the ALTER TEXT CONFIGURATION statement. For example:

ALTER TEXT CONFIGURATION config-name
DROP STOPLIST;

Dropping a stoplist removes all terms from the stoplist, leaving it empty.

By default the stoplist is empty for both the default_char and the default_nchar configuration.

The Samples directory contains sample SQL code that loads stoplists for several languages. For the location of the Samples directory, see Samples directory.

Minimum term length

The minimum term length setting specifies the minimum length, in characters, of terms allowed in the text index. Terms that are shorter than this setting are ignored when building or refreshing the text index.

If you perform a full text search for a term that is shorter than the minimum term length, the term is not found because it does not exist in the text index.

To set the minimum term length, use the MINIMUM TERM LENGTH clause of the ALTER TEXT CONFIGURATION statement. For example:

ALTER TEXT CONFIGURATION config-name
MINIMUM TERM LENGTH 4;

The value of this option must be greater than 0. If you set it higher than the maximum term length, the maximum term length is automatically adjusted to be equal to minimum term length. The default for this setting in the default_char and default_nchar text configuration objects is 1.

Maximum term length

The maximum term length setting specifies the maximum length, in characters, of terms allowed in the text index. Terms that are longer than this setting are ignored when building or refreshing the text index.

If you perform a full text search for a term that is longer than the maximum term length, the term is not found because it does not exist in the text index.

To set the maximum term length, use the MAXIMUM TERM LENGTH clause of the ALTER TEXT CONFIGURATION statement. For example:

ALTER TEXT CONFIGURATION config-name
MAXIMUM TERM LENGTH 20;

The value of this option must be less than or equal to 60. If you set it lower than the minimum term length, the minimum term length is automatically adjusted to be equal to maximum term length. The default for this setting in the default_char and default_nchar text configuration objects is 20.

Term breaker

The term breaker setting specifies the name of the algorithm to use for separating column values into terms. The choices are GENERIC or NGRAM.

The GENERIC algorithm treats as a term any string of one or more alphanumerics, separated by non-alphanumerics.

The NGRAM algorithm breaks the strings into n-grams. An n-gram is an n-character substring of a larger string. N-grams are useful for approximate matching or for documents that do not use a whitespace to separate terms. The NGRAM algorithm still pays attention to whether characters are alphanumerics. However, it breaks terms by counting characters rather than by using non-alphanumerics as separators. For example, an n-gram where n is 3 is any sequence of 3 adjacent alphanumeric characters. The NGRAM algorithm also returns overlaps. For example, the string abcd efg separated into 3-grams gives the terms abc, bcd, and efg.

Use the NGRAM term breaker for any text where words aren't separated by spaces, or when fuzzy searching is required. The FUZZY clause of the CONTAINS search condition is only supported if the NGRAM term breaker is in use. See Using the FUZZY operator in full text searches.

The MAXIMUM TERM LENGTH setting determines the length of the n-grams in characters. An appropriate choice of length depends on the language. Typical values are 4 or 5 characters for English, and 2 or 3 characters for Chinese. The MINIMUM TERM LENGTH setting is not meaningful for the NGRAM term breaker and is ignored.

To set the term breaker algorithm, use the TERM BREAKER clause of the ALTER TEXT CONFIGURATION statement. For example:

ALTER TEXT CONFIGURATION config-name
TERM BREAKER GENERIC;

The default for this setting in the default_char and default_nchar text configuration objects is GENERIC.

See also

Alter a text configuration object