Using filters on text that contains tags

To perform accurate searches on documents that contain tags (such as HTML or Post Script), the text index must use a filter to strip out the tags. The Enhanced Full-Text Search engine provides filters for a variety of document types (Microsoft Word, FrameMaker, WordPerfect, SGML, HTML, and others).

When you create the text index to use a filter, the data for each type of tag in the document is placed into its own document zone. For example, if you have a tag called “chapter,” all chapter names are placed into one document zone. You can issue a query that searches the entire document, or that searches only for data in the “chapter” zone (for more information, see “in”).

To create a text index that uses a filter, modify the style.dft file for that text index:

  1. Create the text index using sp_create_text_index. Use the word “empty” in the option_string parameter so that the style.dft file is created for the text index, but the Verity collections are not populated with data. For example, to create a text index for the copy column of the blurbs table, use the following syntax:

    sp_create_text_index "KRAZYKAT", "i_blurbs", "blurbs", "empty", "copy"
    
  2. Drop the text index that you create in step 1. This drops the text index, but not the style.dft file. For example, use the following command to drop the i_blurbs text index:

    sp_drop_text_index i_blurbs
    
  3. Edit the style.dft file. The style.dft file is in the directory$SYBASE/$SYBASE_FTS/collections/db.owner.index/style, where db.owner.index is the database, the database owner, and the index created using sp_create_text_index. For example, if you created a text index called i_blurbs on the pubs2 database, the full path to the style.dft file would be:

    $SYBASE/$SYBASE_FTS/collections/pubs2.dbo.i_blurbs/style

    Following this line:

    field: f0
    

    add syntax to use a filter:

    Use the following syntax for all document types:

    /filter="universal"
    

    For example, your style.dft file for an SGML document will look like this:

    $control: 1
    dft:
    {
         field: f0
              /filter="zone -nocharmap"
         field: f1
         field: f2
         .
         .
         field: f15
    {
    

    Your style.dft file for an SGML document will look like this:

    $control: 1
    dft:
    {
         field: f0
              /filter="universal"
         field: f1
         field: f2
         .
         .
         field: f15
    {
    

    NoteUse getsend to load the database with document data. getsend takes the following arguments: database, table, column and row id. Insert a null value for the rowid for each row of text you want to insert. getsend must insert into an image column for filtering to work. For more information on getsend, refer to the README.TXT file and getsend.c file in $SYBASE/$SYBASE_FTS/sample/source directory.

  4. Re-create the index, using sp_create_text_index. For example:

sp_create_text_index "KRAZYKAT", "i_blurbs", "blurbs", "", "copy"