Scoring full text search results

When you include a CONTAINS clause in the FROM clause of a query, each match has a score associated with it. The score indicates how close the match is, and you can use score information to sort the data.

Scoring is based on two main criteria:

  • Number of times a term appears in the indexed row   The more times a term appears in an indexed row, the higher its score.

  • Number of times a term appears in the text index   The more times a term appears in a text index, the lower its score. In Sybase Central, you can view how many times a term appears in the text index by viewing the Vocabulary tab for the text index. Choose the term column to sort the terms alphabetically. The freq column tells you how many times the term appears in the text index.

Then, depending on the type of full text search, other criteria impact scoring. For example, in proximity searches, the proximity of search terms impacts scoring.

How to use scores

By default, the result set of a CONTAINS clause has the correlation name contains that has a single column in it called score. You can refer to "contains".score in the SELECT list, ORDER BY clause, or other parts of the query. However, because contains is a SQL reserved word, you must remember to put it in double quotes. Alternatively, you can specify another correlation name such (for example, CONTAINS ( expression ) AS ct). In the documentation examples for full text search, the score column is referred to as ct.score.

The following statement searches MarketingInformation.Description for terms starting with stretch or terms starting with comfort:

SELECT ID, ct.score, Description  
    FROM MarketingInformation CONTAINS ( MarketingInformation.Description, 'stretch* | comfort*' ) AS ct 
    ORDER BY ct.score DESC;
ID score Description
910 5.570408968026068 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Shorts</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>These quick-drying cotton shorts provide all day comfort on or off the trails. Now with a more comfortable and stretchy fabric and an adjustable drawstring waist.</span></p></body></html>
907 3.658418186470189 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Visor</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A polycarbonate visor with an abrasion-resistant coating on the outside. Great for jogging in the spring, summer, and early fall. The elastic headband has plenty of stretch to give you a snug yet comfortable fit every time you wear it.</span></p></body></html>
905 1.6750365447462499 <html><head><meta http-equiv=Content-Type content="text/html; charset=windows-1252"><title>Baseball Cap</title></head><body lang=EN-US><p><span style='font-size:10.0pt;font-family:Arial'>A lightweight wool cap with mesh side vents for breathable comfort during aerobic activities. Moisture-absorbing headband liner.</span></p></body></html>

Item 910 has the highest score because it contains two instances of the prefix term comfort, whereas the others only have one instance. As well, item 910 has an instance of the prefix term stretch.

Example 2: Searching multiple columns

The following example shows you how to perform a full text search across multiple columns and score the results:

  1. Create an immediate text index on the Products table as follows:

    CREATE TEXT INDEX scoringExampleMult 
       ON Products ( Description, Name );
  2. Perform a full text search on the Description and Name columns for the terms cap or visor, as follows. The result of the CONTAINS clause is assigned the correlation name ct, and is referenced in the SELECT list so that it is included in the results. Also, the ct.score column is referenced in the ORDER BY clause to sort the results in descending order by score.

    SELECT Products.Description, Products.Name, ct.score 
        FROM Products CONTAINS ( Products.Description, Products.Name, 'cap OR visor' ) ct 
        ORDER BY ct.score DESC;
    Description Name score
    Cloth Visor Visor 3.5635154905713042
    Plastic Visor Visor 3.4507856451176244
    Wool cap Baseball Cap 3.2340501745357333
    Cotton Cap Baseball Cap 3.090467108972918

    The scores for a multi-column search are calculated as if the column values were concatenated together and indexed as a single value. Note, however, that phrases and NEAR operators never match across column boundaries, and that a search term that appears in more than one column increases the score more than it would in a single concatenated value.

  3. For other examples in the documentation to work properly, you must delete the text index you created on the Products table. To do so, execute the following statement:

    DROP TEXT INDEX scoringExampleMult ON Products;