Use the following procedure to perform a full text search on a text index that uses an NGRAM term breaker. This procedure can also be used to create a full text search of Chinese, Japanese, or Korean data.
In databases with multibyte character sets, some punctuation and space characters such as full width commas and full width spaces may be treated as alphanumeric characters.
See also: Tutorial: Performing a fuzzy full text search.
Start Interactive SQL and connect to the sample database using the SQL Anywhere 12 Demo data source.
Execute the following statement to create an NCHAR text configuration object named myNcharNGRAMTextConfig:
CREATE TEXT CONFIGURATION myNcharNGRAMTextConfig FROM default_nchar; |
Execute the following statements to change the TERM BREAKER algorithm to NGRAM, and to set MAXIMUM TERM LENGTH (N) to 2.
ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig TERM BREAKER NGRAM; |
ALTER TEXT CONFIGURATION myNcharNGRAMTextConfig MAXIMUM TERM LENGTH 2; |
For Chinese, Japanese, and Korean data, the recommended value for N is 2 or 3. For searches limited to one or two characters, set the N value to 1. Setting the N value to 1 can cause slower execution of long queries.
Start Sybase Central and connect to the sample database using the SQL Anywhere 12 Demo data source.
Create a copy of the MarketingInformation table.
Expand the Tables folder.
Right-click MarketingInformation and choose Copy.
Right-click the Tables folder and choose Paste.
In the Name field, type MarketingInformationNgram.
Click OK.
In Interactive SQL, execute the following statement to add data to the MarketingInformationNgram table:
INSERT INTO MarketingInformationNgram SELECT * FROM MarketingInformation; COMMIT; |
Execute the following statement to create an IMMEDIATE REFRESH text index on the MarketingInformationNgram.Description column using the myNcharNGRAMTextConfig text configuration object:
CREATE TEXT INDEX ncharNGRAMTextIndex ON MarketingInformationNgram( Description ) CONFIGURATION myNcharNGRAMTextConfig; |
Execute the following statements to test the text index.
The following statement searches the 2-GRAM text index for terms containing sw. The results are sorted by score in descending order.
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'sw' ) ct ORDER BY ct.score DESC; |
Description | Score |
---|---|
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton hooded Sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
2.262071918398649 |
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes
durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester
blend makes it easy to keep them clean.</span></p></body></html>
|
1.5556043490424176 |
The following statement searches for terms containing ams. The results are sorted by score in descending order.
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'ams' ) ct ORDER BY ct.score DESC; |
With the 2-GRAM text index, the previous statement is semantically equivalent to:
SELECT M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, '"am ms"' ) ct ORDER BY ct.score DESC; |
Description | Score |
---|---|
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown cotton hooded sweatshirt
with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
1.6619019465461564 |
<html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An 80% cotton/20% polyester blend makes it easy
to keep them clean.</span></p></body></html>
|
1.5556043490424176 |
The following statement searches for terms with v followed by any alphanumeric character. Because ve occurs more frequently in the indexed data, rows that contain the 2-gram ve are assigned a lower score than rows containing vi. The results are sorted by score in descending order.
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'v*' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
901 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span
style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favvorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html>
|
3.3416789108071976 |
907 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic
headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html>
|
2.1123084896159376 |
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
910 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span
style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html>
|
0.9244136988525732 |
906 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help
neutralize odor.</span></p></body></html>
|
0.9134171046194403 |
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
0.7313071661212746 |
903 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span
style='font-size:10.0pt;font-family:Arial'>A sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html>
|
0.6799436746197272 |
The following statements search each row for any terms containing v. After the second statement, the variable contains the
string av OR ev OR iv OR ov OR rv OR ve OR vi OR vo
. The results are sorted by score in descending order. When an n-gram appears in all indexed rows, it is assigned a score
of zero.
This is the only method that allows a single character to be located if it appears before a whitespace or a non-alphanumeric character.
CREATE VARIABLE query NVARCHAR (100); SELECT LIST (term, ' OR ' ) INTO query FROM sa_text_index_vocab( 'ncharNGRAMTextIndex', 'MarketingInformationNgram', 'dba' ) WHERE term LIKE '%v%'; SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, query ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
901 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>We've improved the design of this perennial favorite. A sleek and technical shirt built for the trail, track, or sidewalk. UPF rating of 50+.</span></p></body></html>
|
6.654350268810443 |
907 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic
headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html>
|
4.265623837817126 |
903 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A
sporty, casual shirt made of recycled water bottles. It will serve you equally well on trails or around town. The fabric has a wicking finish to pull perspiration away from your skin.</span></p></body></html>
|
2.9386676702799504 |
910 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These
quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html>
|
2.5481193655722336 |
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This
fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
2.4293498211307214 |
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A
lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
906 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight 100% organically grown
cotton construction. Shields against sun and precipitation. Metallic ions in the fibers inhibit bacterial growth, and help
neutralize odor.</span></p></body></html>
|
0.9134171046194403 |
902 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Tee Shirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This
simple, sleek, and lightweight technical shirt is designed for high-intensity workouts in hot and humid weather. The recycled
polyester fabric is gentle on the earth and soft against your skin.</span></p></body></html>
|
0 |
908 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Lightweight
100% organically grown cotton hooded sweatshirt with taped neck seams. Comes pre-washed for softness and to lessen shrinkage.</span></p></body></html>
|
0 |
909 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Sweatshirt</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>Top-notch
construction includes durable topstitched seams for strength with low-bulk, resilient rib-knit collar, cuffs and bottom. An
80% cotton/20% polyester blend makes it easy to keep them clean.</span></p></body></html>
|
0 |
The following statement searches the Description column for rows that contain ea, ka, and ki.
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 'ea ka ki' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
904 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>This fashionable hat is ideal for glacier travel, sea-kayaking, and hiking. With concealed draw cord for windy days.</span></p></body></html>
|
3.4151032739119733 |
The following statement searches the Description column for rows that contain ve and vi, but not gg.
SELECT M.ID, M.Description, ct.* FROM MarketingInformationNgram AS M CONTAINS( M.Description, 've & vi -gg' ) ct ORDER BY ct.score DESC; |
ID | Description | Score |
---|---|---|
905 | <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body
lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>
|
1.6750365447462499 |
Disconnect from Interactive SQL and Sybase Central.
(optional) Restore the sample database (demo.db) to its original state by following the steps found here: Recreate the sample database (demo.db).
Discuss this page in DocCommentXchange.
|
Copyright © 2010, iAnywhere Solutions, Inc. - SQL Anywhere 12.0.0 |