When additional statistics may be useful

To determine when additional statistics are useful, run queries using set option commands and set statistics io. If there are significant discrepancies between the “rows to be returned” and I/O estimates displayed by set commands and the actual I/O displayed by statistics io, examine these queries for places where additional statistics can improve the estimates. Look especially for the use of default density values for search arguments and join columns.

The set option show_missing_stats command prints the names of columns that could have used histograms, and groups of columns that could have used multiattribute densities. This is particularly useful in pointing out where additional statistics can be useful.

Example 1

set option show_missing_stats long
go
dbcc traceon(3604)
go

DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.

select * from part, partsupp
where p_partkey = ps_partkey and p_itemtype = ps_itemtype
go

NO STATS on column part.p_partkey
NO STATS on column part.p_itemtype
NO STATS on column partsupp.pa_itemtype
NO STATS on density set for E={p_partkey, p_itemtype}
NO STATS on density set for F={ps_partkey, ps_itemtype}
- - - - - - - - - - - - - - -
(200 rows affected)

You can get the same information using the show_final_plan_xml option. The set plan uses the client option and trace flag 3604 to get the output on the client side. This differs from the way the message option of set plan is used.

Example 2

dbcc traceon(3604)

DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.

set plan for show_final_plan_xml to client on
go
select * from part, partsupp
where p_partkey = ps_partkey and p_itemtype = ps_itemtype
go

<?xml version=”1.0” encoding=”UTF-8”?>
<query>
		<planVersion> 1.0 </planVersion>
- - - - - - - - - - - - - - - - - -
<optimizerStatistics>
    <statInfo>
        <objName>part</objName>
        <missingHistogram>
            <column>p_partkey</column>
            <column>p_itemtype</column>
        </missingHistogram>
        <missingDensity>
            <column>p_partkey</column>
            <column>p_itemtype</column>
        </missingDensity>
    </statInfo>
    <statInfo>
<objName>partsupp</objName>
        <missingHistogram>
            <column>ps_partkey</column>
            <column>ps_itemtype</column>
        </missingHistogram>
        <missingDensity>
            <column>ps_partkey</column>
            <column>ps_itemtype</column>
        </missingDensity>
    </statInfo>
</optimizerStatistics>

Use update statistics on part and partsupp to create statistics on p_partkey and p_itemtype, thus creating a histogram on the leading column (p_partkey) and the density (p_partkey, p_itemtype). Create a histogram on p_itemtype as well. Use:

update statistics part(p_partkey, p_itemtype)
go
update statistics part(p_itemtype)
go

Since partsupp has a histogram on ps_partkey, you can create a histogram on ps_itemtype and a density on (ps_itemtype, ps_partkey). The columns used for density may be unordered.

update statistics partsupp(ps_itemtype, ps_partkey)

If this procedure is successful, you will not see the “NO STATS” messages shown in Example 1 when you run the query again.