Understanding histogram output

A histogram is a set of cells in which each cell is assigned a weight. Each cell has an upper bound and a lower bound, which are distinct column values. The weight of the cell is a floating-point value between 0 and 1, representing either:

The fraction of rows in the table within the range of values, if the operator is <=, or,
The number of values that match the step, if the operator is =.

The optimizer uses the combination of ranges, weights, and density values to estimate the number of rows in the table that are to be returned for a query clause on the column.

Adaptive Server uses equiheight histograms, where the number of rows represented by each cell is approximately equal. For example, the following histogram on the city column on pubtune..authors has 20 steps; each step in the histogram represents approximately 5 percent of the table:

Step     Weight                Value
 
   1     0.00000000     <=     "APO Miami\377\377\377\377\377\377\377"
   2     0.05460000     <=     "Atlanta"
   3     0.05280000     <=     "Boston"
   4     0.05400000     <=     "Charlotte"
   5     0.05260000     <=     "Crown"
   6     0.05260000     <=     "Eddy"
   7     0.05260000     <=     "Fort Dodge"
   8     0.05260000     <=     "Groveton"
   9     0.05340000     <=     "Hyattsville"
  10     0.05260000     <=     "Kunkle"
  11     0.05260000     <=     "Luthersburg"
  12     0.05340000     <=     "Milwaukee"
  13     0.05260000     <=     "Newbern"
  14     0.05260000     <=     "Park Hill"
  15     0.05260000     <=     "Quicksburg"
  16     0.05260000     <=     "Saint David"
  17     0.05260000     <=     "Solana Beach"
  18     0.05260000     <=     "Thornwood"
  19     0.05260000     <=     "Washington"
  20     0.04800000      <=     "Zumbrota"

The first step in a histogram represents the proportion of null values in the table. Since there are no null values for city, the weight is 0. The value for the step that represents null values is represented by the highest value that is less than the minimum column value.

For character strings, the value for the first cell is the highest possible string value that is less than the minimum column value (in the output above, “APO Miami”), padded to the defined column length with the highest character in the character set used by the server. What you actually see in your output depends on the character set, type of terminal, and software that you are using to view optdiag output files.

In the above histogram, the value represented by each cell includes the upper bound, but excludes the lower bound. The cells in this histogram are called range cells, because each cell represents a range of values.

The range of values included in a range cell can be represented as follows:

lowerbound < (values for cell) <= upper bound

In optdiag output, the lower bound is the value of the previous step, and the upper bound is the value of the current step.

For example, in the histogram above, step 4 includes Charlotte (the upper bound), but excludes Boston (the lower bound). The weight for this step is .0540, indicating that 5.4 percent of the table matches the query clause:

where city > Boston and city <= "Charlotte"

The operator column in the optdiag histogram output shows the <= operator. Different operators are used for histograms with highly duplicated values.