The interface that an external term breaker library must implement to perform term breaking for text index operations.
typedef struct a_word_source { a_sql_uint32 ( SQL_CALLBACK *begin_document )( a_word_source *This , a_sql_uint32 has_prefix ); a_sql_uint32 ( SQL_CALLBACK *get_words )( a_word_source *This , a_term** words ,a_sql_uint32 *num_words ); a_sql_uint32 ( SQL_CALLBACK *end_document )( a_word_source *This ); a_sql_uint32 ( SQL_CALLBACK *fini_all )( a_word_source *This ); a_server_context *_context; // Only one of the following pointers can be valid // in any implementation. // For example: if the producer for this module is // a a_text_source, then only my_text_producer will // be a valid pointer whereas my_word_producer // should be assigned a NULL a_text_source *_my_text_producer; a_word_source *_my_word_producer; // Following members have been reserved for // future use ONLY a_text_source *_my_text_consumer; a_word_source *_my_word_consumer; } a_word_source; |
Member | Type | Description |
---|---|---|
begin_document | a_sql_uint32 |
Performs the necessary setup steps for processing a document. The parameter has_prefix is set to 1, not true, or TRUE if the document being tokenized is a prefix query term. If has_prefix is set to TRUE, the term breaker must return at least one term (possibly empty). has_prefix can only be 1, not true, or TRUE, if the purpose of pipeline initialization is TERM_BREAKER_FOR_QUERY. The result of prefix tokenization is treated as a phrase with the last term of the phrase being the actual prefix string. |
get_words | a_sql_uint32 |
Returns a pointer to an array of a_term structures. This method is called in a loop for a given document until all the contents of the document has been broken into terms. The database server expects that two immediately consecutive terms in a document have positions differing by 1. If the term breaker is performing its own stoplist processing, it is possible that the difference between two consecutive terms returned is more than 1; this is expected and acceptable. However, in other cases where numbers are not consecutive with positions differing by 1, the arbitrary positions can affect how full text queries are executed and can cause unexpected results for subsequent full text queries. |
end_document | a_sql_uint32 |
Marks completion of processing of the document by the pipeline, and performs document-specific cleanup. |
fini_all | a_sql_uint32 |
Called by the database server after processing of all the documents is done and the pipeline is about to be closed. fini_all performs the final cleanup steps. |
_context | a_server_context * |
The database server context that is provided to the entry point function within the a_init_term_breaker structure. The term breaker module uses this context to establish direct communication with the database server. |
_my_text_producer | a_text_source * |
Pointer to the a_text_source producer of the term breaker that is provided to the entry point function within the a_init_term_breaker structure. This pointer may be replaced by the database server after the entry point function has been executed if character set conversion is required. Therefore, only this pointer to the text producer can be used by the term breaker. |
_my_word_producer | a_word_source * |
Reserved for future use and should be initialized to NULL. |
_my_text_consumer | a_text_source * |
Reserved for future use and should be initialized to NULL. |
_my_word_consumer | a_word_source * |
Reserved for future use and should be initialized to NULL. |
The a_word_source interface is defined by a header file named exttbapiv1.h, in the SDK\Include subdirectory of your SQL Anywhere installation directory.
The external library should not be holding any operating system synchronization primitives across function calls.
Discuss this page in DocCommentXchange.
|
Copyright © 2010, iAnywhere Solutions, Inc. - SQL Anywhere 12.0.0 |