Tutorial: Performing a full text search on an NGRAM text index

Use the following procedure to perform a full text search on a text index that uses an NGRAM term breaker. This procedure can also be used to create a full text search of Chinese, Japanese, or Korean data.

In databases with multibyte character sets, some punctuation and space characters such as full width commas and full width spaces may be treated as alphanumeric characters.

 Perform a full text search on an NGRAM text index
  1. Start Interactive SQL and connect to the sample database using the SQL Anywhere 12 Demo data source.

  2. Execute the following statement to create an NCHAR text configuration object named myNcharNGRAMTextConfig:

    CREATE TEXT CONFIGURATION myNcharNGRAMTextConfig FROM default_nchar;
  3. Execute the following statements to change the TERM BREAKER algorithm to NGRAM, and to set MAXIMUM TERM LENGTH (N) to 2.

    ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig 
       TERM BREAKER NGRAM;
    ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig 
       MAXIMUM TERM LENGTH 2;

    For Chinese, Japanese, and Korean data, the recommended value for N is 2 or 3. For searches limited to one or two characters, set the N value to 1. Setting the N value to 1 can cause slower execution of long queries.

  4. Start Sybase Central and connect to the sample database using the SQL Anywhere 12 Demo data source.

  5. Create a copy of the MarketingInformation table.

    1. Expand the Tables folder.

    2. Right-click MarketingInformation and click Copy.

    3. Right-click the Tables folder and click Paste.

    4. In the Name field, type MarketingInformationNgram.

    5. Click OK.

  6. In Interactive SQL, execute the following statement to add data to the MarketingInformationNgram table:

    INSERT INTO MarketingInformationNgram
       SELECT *
          FROM MarketingInformation;
    COMMIT;
  7. Execute the following statement to create an IMMEDIATE REFRESH text index on the MarketingInformationNgram.Description column using the myNcharNGRAMTextConfig text configuration object:

    CREATE TEXT INDEX ncharNGRAMTextIndex 
       ON MarketingInformationNgram( Description ) 
          CONFIGURATION myNcharNGRAMTextConfig;
  8. Execute the following statements to test the text index.

    1. The following statement searches the 2-GRAM text index for terms containing sw. The results are sorted by score in descending order.

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'sw' ) ct
         ORDER BY ct.score DESC;
      Description Score
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded Sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 2.262071918398649
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 1.5556043490424176
    2. The following statement searches for terms containing ams. The results are sorted by score in descending order.

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'ams' ) ct
         ORDER BY ct.score DESC;

      With the 2-GRAM text index, the previous statement is semantically equivalent to:

      SELECT M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, '"am ms"' ) ct
         ORDER BY ct.score DESC;
      Description Score
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 1.6619019465461564
      <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 1.5556043490424176
    3. The following statement searches for terms with v followed by any alphanumeric character. Because ve occurs more frequently in the indexed data, rows that contain the 2-gram ve are assigned a lower score than rows containing vi. The results are sorted by score in descending order.

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'v*' ) ct
         ORDER BY ct.score DESC;
      
      ID Description Score
      901 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favvorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html> 3.3416789108071976
      907 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html> 2.1123084896159376
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
      910 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html> 0.9244136988525732
      906 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help neutralize odor.</span></p></body></html> 0.9134171046194403
      904 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 0.7313071661212746
      903 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html> 0.6799436746197272
    4. The following statements search each row for any terms containing v. After the second statement, the variable contains the string av OR ev OR iv OR ov OR rv OR ve OR vi OR vo. The results are sorted by score in descending order. When an n-gram appears in all indexed rows, it is assigned a score of zero.

      This is the only method that allows a single character to be located if it appears before a whitespace or a non-alphanumeric character.

      CREATE VARIABLE query NVARCHAR (100);
      SELECT LIST (term, ' OR ' )
      INTO query
         FROM sa_text_index_vocab_nchar( 'ncharNGRAMTextIndex', 'MarketingInformationNgram', 'dba' )
         WHERE term LIKE '%v%';
      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
         CONTAINS( M.Description, query ) ct
         ORDER BY ct.score DESC;
      ID Description Score
      901 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html> 6.654350268810443
      907 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html> 4.265623837817126
      903 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html> 2.9386676702799504
      910 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html> 2.5481193655722336
      904 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 2.4293498211307214
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
      906 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help neutralize odor.</span></p></body></html> 0.9134171046194403
      902 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This simple, sleek, and lightweight technical shirt is designed for high-intensity workouts in hot and humid weather. The recycled polyester fabric is gentle on the earth and soft against your skin.</span></p></body></html> 0
      908 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html> 0
      909 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html> 0
    5. The following statement searches the Description column for rows that contain ea, ka, and ki.

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 'ea ka ki' ) ct
        ORDER BY ct.score DESC;
      ID Description Score
      904 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html> 3.4151032739119733
    6. The following statement searches the Description column for rows that contain ve and vi, but not gg.

      SELECT M.ID, M.Description, ct.*
         FROM MarketingInformationNgram AS M 
      CONTAINS( M.Description, 've & vi -gg' ) ct
         ORDER BY ct.score DESC;
      ID Description Score
      905 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html> 1.6750365447462499
  9. Disconnect from Interactive SQL and Sybase Central.

  10. (optional) Restore the sample database (demo.db) to its original state.

 See also