For in-depth descriptions of text configuration object settings and how they impact the contents of a text index and the results returned when querying a text index, see Text configuration object settings.
For a list of all text configuration objects in the database and the settings they contain, query the SYSTEXTCONFIG system
view (for example, SELECT * FROM SYSTEXTCONFIG
). See SYSTEXTCONFIG system view.
You can test how a text configuration object would break a string into terms using the sa_char_terms and sa_nchar_terms system procedures. See sa_char_terms system procedure, and sa_nchar_terms system procedure.
SQL Anywhere provides two default text configuration objects, default_nchar and default_char for use with NCHAR and non-NCHAR data, respectively. These configurations are created the first time you attempt to create a text configuration object or text index. If you delete one by mistake, it is recreated the next time you attempt to create a text configuration object or text index.
The settings for default_char and default_nchar at the time of installation are shown in the table below. These settings were chosen because they were best suited for most character-based languages. It is strongly recommended that you do not change the settings in the default text configuration objects.
Setting | Installed value |
---|---|
TERM BREAKER |
0 (GENERIC) |
MINIMUM TERM LENGTH | 1 |
MAXIMUM TERM LENGTH | 20 |
STOPLIST | (empty) |
If you delete a default text configuration object, it is automatically recreated the next time you create a text index or text configuration object. See DROP TEXT CONFIGURATION statement.
When a default text configuration object is created by the database server, the database options that affect how date and time values are converted to strings are saved to the text configuration object from the current connection. See Text configuration objects and database options.
For a description of text configuration object settings, see Text configuration object settings.
The following table shows the settings for different text configuration objects and how the settings impact what is indexed
and how a full text query string is interpreted. All the examples use the string 'I'm not sure I understand'
.
Configuration settings | Terms that are indexed | Query interpretation |
---|---|---|
TERM BREAKER GENERIC MINIMUM TERM LENGTH 1 MAXIMUM TERM LENGTH 20 STOPLIST '' |
|
Note that the 'not' in the original string gets interpreted as an operator, not the word 'not'. |
TERM BREAKER GENERIC MINIMUM TERM LENGTH 2 MAXIMUM TERM LENGTH 20 STOPLIST 'not and' |
|
Note that 'sure' gets dropped because 'not' is interpreted as an operator (AND NOT) between phrase "i am" and "sure". Since the phrase "i am" contains terms that are too short and are dropped, the right side of the AND NOT condition ('sure') is also dropped. This leaves only 'understand'. |
TERM BREAKER NGRAM MAXIMUM TERM LENGTH 3 STOPLIST 'not and' |
|
For a fuzzy search: |
TERM BREAKER GENERIC MINIMUM TERM LENGTH 1 MAXIMUM TERM LENGTH 20 STOPLIST 'not and' |
|
|
TERM BREAKER NGRAM MAXIMUM TERM LENGTH 20 STOPLIST 'not and' |
Nothing is indexed because no term is equal to or longer than 20 characters. This illustrates how differently MAXIMUM TERM LENGTH impacts GENERIC and NGRAM text indexes; on NGRAM text indexes, MAXIMUM TERM LENGTH sets the length of the n-grams inserted into the text index. |
The search returns an empty result set because no n-grams of 20 characters can be formed from the query string. |
You can test how a text configuration object would break a string into terms using the sa_char_terms and sa_nchar_terms system procedures. See sa_char_terms system procedure, and sa_nchar_terms system procedure.
The following table provides examples of how the settings of the text configuration object strings are interpreted.
The parenthetical numbers in the Interpreted string column reflect the position information stored for each term. The numbers are for illustration purposes in the documentation. The actual stored terms do not include the parenthetical numbers.
Configuration settings | String | Interpreted String |
---|---|---|
TERM BREAKER GENERIC MINIMUM TERM LENGTH 3 MAXIMUM TERM LENGTH 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TERM BREAKER NGRAM MAXIMUM TERM LENGTH 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TERM BREAKER NGRAM MAXIMUM TERM LENGTH 3 SKIPPED TOKENS IN TABLE AND IN QUERIES |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You can test how a text configuration object would break a string into terms using the sa_char_terms and sa_nchar_terms system procedures. See sa_char_terms system procedure, and sa_nchar_terms system procedure.
Discuss this page in DocCommentXchange.
|
Copyright © 2010, iAnywhere Solutions, Inc. - SQL Anywhere 12.0.0 |