Perform a full text query using the CONTAINS clause in the FROM clause of a SELECT statement, or by using the CONTAINS search condition (predicate) in a WHERE clause. Both methods return the same rows; however, the CONTAINS clause also returns scores for the matching rows.
CONTAINS ( column-name [,...], contains-query-string )
contains-query-string: simple-expression | or-expression
simple-expression: primary-expression | and-expression
or-expression: simple-expression { OR | | } contains-query-string
primary-expression: basic-expression | FUZZY " fuzzy-expression " | and-not-expression
and-expression: primary-expression [ AND | & ] simple-expression
and-not-expression: primary-expression [ AND | & ] { NOT | - } basic-expression
basic-expression: term | phrase | ( contains-query-string ) | proximity-expression
fuzzy-expression: term | fuzzy-expression term
term: simple-term | prefix-term
prefix-term: simple-term*
phrase: " phrase-string "
proximity-expression: term ( BEFORE | NEAR) [minimum distance, | maximum distance ] term | term { BEFORE | NEAR | ~ } term
phrase-string: term | phrase-string term
simple-term A string separated by white space and special characters that represents a single indexed term (word) for which to search.
and-expression Use and-expression to specify that both primary-expression and simple-expression must
be found in the TEXT index. By default, if no
operator is specified between terms or expressions, an and-expression
is assumed. For example, 'a b'
'
is interpreted as 'a AND b'
. An
ampersand (&) can be used instead of AND, and can abut the expressions
or terms on either side (for example, 'a &
b'
).
and-not-expression Use and-not-expression to specify that primary-expression must
be present in the TEXT index, but that basic-expression must
not be found in the TEXT index. This is also
known as a negation. When you use a hyphen for negation, a space
must precede the hyphen, and the hyphen must abut the subsequent
term. For example, 'a -b'
is equivalent
to 'a AND NOT b'
; whereas
for 'a - b'
, the hyphen
is ignored and the string is equivalent to 'a
AND b'
. 'a-b'
is
equivalent to the phrase '"a b"'
.
or-expression Use or-expression to specify that at
least one of simple-expression or contains-query-string must
be present in the TEXT index. For example, 'a|b'
is
interpreted as 'a OR b'
.
fuzzy-expression Use fuzzy-expression to find terms that are similar to what you specify. Fuzzy matching is only supported on NGRAM TEXT indexes for IN SYSTEM tables.
proximity-expression Use proximity-expression to search for
terms that are near each other. For example, 'b
NEAR[2,5] c'
searches for instances
of b and c that are at most five and at least 2 terms away from each
other. The order of terms is not significant; 'b
NEAR c'
is equivalent to 'c
NEAR b'
. If NEAR is specified
without distance, a default of 10 terms is
applied. You can specify a tilde (~) instead of NEAR.
This is equivalent to specifying NEAR without
a distance so a default of 10 terms is applied. NEAR expressions
cannot be chained together (for example, 'a NEAR[1] b
NEAR[1] c'
).
BEFORE is like NEAR except
that the order of terms is significant. 'b BEFORE
c'
is not equivalent to 'c
BEFORE b'
; in the former, the term 'b'
must
precede 'c'
while in the
latter the term 'b'
must
follow 'c'
. BEFORE accepts
both minimum and maximum distances like NEAR.
The default minimum distance is 1. The minimum distance, if given,
must be less than or equal to the maximum distance; otherwise, an
error is returned.
prefix-term Use prefix-term to search for terms that
start with the specified prefix. For example, 'datab*'
searches
for any term beginning with datab. This is
also known as a prefix search. In a prefix search, matching is performed
for the portion of the term to the left of the asterisk.
The CONTAINS search condition takes a column list and contains-query-string as arguments. It can be used anywhere a search condition (also referred to as predicate) can be specified, and returns TRUE or FALSE. contains-query-string must be a constant string, or a variable, with a value that is known at query time.
If multiple columns are specified, then they must all refer to a single base table; a TEXT index cannot span multiple base tables. You can reference the base directly in the FROM clause, or use it in a view or derived table, provided that the view or derived table does not use DISTINCT, GROUP BY, ORDER BY, UNION, INTERSECT, EXCEPT, or a row limitation.
Queries using ANSI join syntax are supported (FULL OUTER JOIN, RIGHT OUTER JOIN, LEFT OUTER JOIN), but may have suboptimal performance. Use outer joins for CONTAINS in the FROM clause only if the score column from each of the CONTAINS clauses is required. Otherwise, move CONTAINS to an ON condition or WHERE clause.
These types of queries are unsupported:
Remote queries using a SQL Anywhere table with a full TEXT index that is joined to a remote table.
Queries using Sybase IQ and SQL Anywhere tables, where the full TEXT index to be used is on the SQL Anywhere table.
Queries using TSQL style outer join syntax (*=*
, =*
and *=
).
If you use a SQL variable less than 32KB in length as a search term and the type of variable is LONG VARCHAR, use CAST to convert the variable to VARCHAR data type. For example:
SELECT * FROM tab1 WHERE CONTAINS(c1, cast(v1 AS VARCHAR(64))
The following warnings apply to the use of non-alphanumeric characters in query strings:
An asterisk in the middle of a term returns an error.
Avoid using non-alphanumeric characters (including special characters) in fuzzy-expression, because they are treated as white space and serve as term breakers.
If possible, avoid using non-alphanumeric characters
that are not special characters in your query string. Any non-alphanumeric
character that is not a special character causes the term containing
it to be treated as a phrase, breaking the term at the location
of the character. For example, 'things we've
done'
is interpreted as 'things
"we ve" done'
.
Within phrases, the asterisk is the only special character that continues to be interpreted as a special character. All other special characters within phrases are treated as white space and serve as term breakers.
Interpretation of contains-query-string takes place in two main steps:
Step 1: Interpreting operators and precedence During this step, keywords are interpreted as operators, and rules of precedence are applied.
Step 2: Applying text configuration object settings During this step, the text configuration object settings are applied to terms. Any query terms that exceed the term length settings, or that are in the stop list, are dropped.
During query evaluation, expressions are evaluated using the following order of precedence:
FUZZY, NEAR
AND NOT
AND
OR
The asterisk is used for prefix searching. An asterisk can occur at the end of the query string, or be followed by a space, ampersand, vertical bar, closing bracket, or closing quotation mark. Any other usage of asterisk returns an error.
Table 4-1 shows allowable asterisk usage:
Query string |
Equivalent to |
Interpreted as |
---|---|---|
|
|
Find any term beginning with th, and the term best. |
|
|
Find either any term beginning with th, or the term best. |
|
|
Find the term very, and the term best or any term beginning with th. |
|
Find the term fast, immediately followed by a term beginning with auto. |
|
|
Find a term beginning with auto, immediately followed by the term price. |
Interpretation of query strings containing asterisks varies depending on the text configuration object settings.
The hyphen can be used for term or expression negation, and is equivalent to NOT. Whether a hyphen is interpreted as a negation depends on its location in the query string. For example, when a hyphen immediately precedes a term or expression, it is interpreted as a negation. If the hyphen is embedded within a term, it is interpreted as a hyphen.
A hyphen used for negation must be preceded by a white space, and followed immediately by an expression.
When used in a phrase of a fuzzy expression, the hyphen is treated as white space and used as a term breaker.
Table 4-2 shows the allowed syntax for hyphen:
Query string |
Equivalent to |
Interpreted as |
---|---|---|
|
|
Find the term the, and not the term best. |
|
|
Find the term the, and not the terms very and best. |
|
|
Find the term the, and not the phrase very best. |
|
'"alpha numerics"' |
Find the term alpha, immediately followed by the term numerics. |
|
|
Find the term wild, and the term west. |
Table 4-3 shows the allowed syntax for all special characters except asterisk and hyphen. These characters are not considered special characters, if they are found in a phrase, and are dropped.
The restrictions on specifying string literals also apply to the query string. For example, apostrophes must be within an escape sequence.
Character or syntax |
Usage examples and remarks |
---|---|
ampersand (&) |
The ampersand is equivalent to AND, and can be specified as follows:
|
vertical bar (|) |
The vertical bar is equivalent to OR, and can be specified as follows:
|
double quotes (") |
Double quotes are used to contain a sequence
of terms where order and relative distance are important. For example,
in the query string |
parentheses () |
Parentheses are used to specify the order
of evaluation of expressions, if different from the default order.
For example, |
tilde (~) |
The tilde is equivalent to NEAR,
and has no special syntax rules. The query string |
square brackets [ ] |
Square brackets are used in conjunction with the keyword NEAR to contain distance. Other uses of square brackets return an error. |
TEXT indexes are built according to the settings defined for the text configuration object used to create the TEXT index. ATEXT index excludes terms that meet any of the following conditions:
The term is included in the stop list.
The term is shorter than the minimum term length (GENERIC only).
The term is longer than the maximum term length.
The same rules apply to query strings. The dropped term can match zero or more terms at the end or beginning of the phrase. For example, suppose the term 'the' is in the stop list:
If the term appears on either side of an AND, OR,
or NEAR, then both the operator and the term
are removed. For example, searching for 'the
AND apple'
, 'the OR apple'
,
or 'the NEAR apple'
are
equivalent to searching for 'apple'
.
If the term appears on the right side of an AND
NOT, both the AND NOT and the term
are dropped. For example, searching for 'apple
AND NOT the'
is equivalent to searching for 'apple'
.
If the term appears on the left side of an AND
NOT, the entire expression is dropped. For example, searching
for 'the AND NOT apple'
returns
no rows. Another example: 'orange and the AND
NOT apple'
is the same as 'orange
AND (the AND NOT apple)'
which, after the AND NOT expression
is dropped, is equivalent to searching for 'orange'
. Contrast
this with the search expression '(orange and
the) and not apple'
, which is equivalent to
searching for 'orange and not apple'
.
If the term appears in a phrase, the phrase is allowed
to match with any term at the position of the dropped term. For
example, searching for 'feed the dog'
matches 'feed
the dog'
, 'feed my dog'
, 'feed any
dog'
, and so on.
If all of the terms for which you are searching are
dropped, Sybase IQ returns the error CONTAINS
has NULL search term
. SQL Anywhere reports
no error and returns zero rows.
When you include a CONTAINS clause in the FROM clause of a query, each match has a score associated with it. The score indicates how close the match is, and you can use score information to sort the data. Two main criteria determine score:
The number of times a term appears in the indexed row. The more times a term appears in an indexed row, the higher its score.
The number of times a term appears in the TEXT index. The more times a term appears in a TEXT index, the lower its score.
Depending on the type of full text search, other criteria
affect scoring. For example, in proximity searches, the proximity
of search terms impacts scoring. By default, the result set of a CONTAINS clause
has the correlation name contains that has
a single column in it called score. You can
refer to "contains".score in the SELECT list, ORDER
BY clause, or other parts of the query. However, because contains is
a SQL reserved word, you must remember to put it in double quotes.
Alternatively, you can specify another correlation name, for example, CONTAINS
( expression ) AS ct
. The
examples for full text search refer to the score column
as ct.score.
This statement searches MarketingInformation.Description for terms starting with ‘stretch” or terms starting with “comfort”:
SELECT ID, ct.score, Description FROM MarketingInformation CONTAINS ( MarketingInformation.Description, 'stretch* | comfort*' ) AS ct ORDER BY ct.score DESC;
The SQL Anywhere documentation provides full text search examples. Not all of these examples apply to Sybase IQ 15.2. For example, Sybase IQ does not support text searches that are part of the IF search condition and does not allow fuzzy or NGRAM searches.
in SQL Anywhere Server - SQL Usage.