Topic scoring and document importance

When you use a topic to perform a search, the search agent starts its analysis by considering the evidence topics for that topic. If the evidence topic is present, it is given 1.00 score and is considered relevant to the search. If the evidence topic is absent, it is given a 0.00 score and is considered irrelevant to the search. If the evidence topics are weighted, the scores of the evidence topics are multiplied by the weights, then combines the resulting products in a manner specified by the operator of the parent topic. If this parent topic is, in turn, the child of another topic which is being searched, its score is multiplied by its assigned weight, and the resulting product is combined with the products of its siblings in a manner specified by the operator assigned to the parent topic. This process continues until the parent topic is reached.

The operators you use determine how parent and child scores contribute to the importance of a selected document. As each child in the topic is given an importance score, the following calculations are performed:

Once the final calculations for the parent topic have been performed, a matched document becomes available to the Verity application so that users can view it with its highlights.

The following example provides a breakdown of how evidence topics and subtopics are calculated to illustrate the process by which importance is assigned to selected documents.

In the following illustration, the parent topic BOEINGCO is being used in a search.

The evidence topics of each subtopic are first checked against the documents to determine if they are present. Evidence topics that are present are assigned scores of 1.00; evidence topics that are absent are assigned a score of 0.00.

The operators at the next level of a topic structure are used to combine the scores of the evidence topics. Because the operatorsat this level are all proximity operators (thus, no weights assigned), they all produce scores that are either 0.00 or 1.00.

For example, assume that the following evidence topics appear within a given document:

The other evidence topics are only partially present, or are absent. Table 8-9 shows how the presence or absence of these evidence topics affect topic scores. The score for each topic reflects the presence of all related evidence topics, based on the operators that have been assigned to the parent topics.

Table 8-9: Evidence topics and scores

Topic

Evidence topic

Evidence topic present

Evidence topic absent

Topic score

boeing-comp-services

boeing computer services

1

1

1

1

boeing-aerospace

boeing aerospace electronics

1

1

1

0

boeing-defense

boeing defense

1

1

1

boeing-label

boeing company

1

1

1

paul-binder

paul binder

1

1

0

frank-shrontz

frank shrontz

1

1

0

ron-woodard

ron woodard

1

1

1

Given the above topic scores, the operators at the next level of topics in the structure are calculated as follows:

Finally, the topic BOEINGCO, which uses the OR operator, compares the products of each child’s weight and score, and takes the highest product (the maximum) as its score. The selected document is thus scored as 0.50.

This process is repeated for each document. The documents are sorted by the scores of the BOEINGCO topic, and displayed in ranked order.