Parallel join optimization and join orders

This example illustrates how the optimizer devises a query plan for a join query that is eligible for parallel execution. The configuration and table layout are as follows:

Configuration parameter values
Parameter	Setting
max parallel degree	15 worker processes
max scan parallel degree	3 worker processes

Table layout
Table name	Number of partitions	Number of pages	Number of rows
publishers	1 (not partitioned)	1,000	80,000
titles	10	10,000 (distributed evenly over partitions)	800,000

The example query involves a simple join between these two tables:

select * 
    from publishers, titles 
    where publishers.pub_id = titles.pub_id

In theory, the optimizer considers the costs of all the possible combinations:

titles as the outer table and publishers as the inner table, with titles accessed in parallel
titles as the outer table and publishers as the inner table, with titles accessed serially
publishers as the outer table and titles as the inner table, with titles accessed in parallel
publishers as the outer table and titles as the inner table, with titles accessed serially
publishers as the outer table and titles as the inner table, with publishers accessed in parallel

For example, the cost of a join order in which titles is the outer table and is accessed in parallel is calculated as follows:

The cost of having publishers as the outer table is calculated as follows:

However, other factors are often more important in determining the join order than whether a particular table is eligible for parallel access.

Scenario A: clustered index on publishers

The presence of a useful clustered index is often the most important factor in how the optimizer creates a query plan for a join query. If publishers has a clustered index on pub_id and titles has no useful index, the optimizer can choose the indexed table (publishers) as the inner table. With this join order, each access to the inner table takes only a few reads to find rows.

With publishers as the inner table, the optimizer costs the eligible access methods for each table. For titles, the outer table, it considers:

A parallel partition scan (cost is number of pages in the largest partition)
A serial table scan (cost is number of pages in the table)

For publishers, the inner table, the optimizer considers only a serial clustered index scan.

It also considers performing a merge join, sorting the worktable from titles into order on titles, either a right-merge or left-merge join.

The final cost of the query is the cost of accessing titles in parallel times the number of accesses of the clustered index on publishers.

Scenario B: clustered index on titles

If titles has a clustered index on pub_id, and publishers has no useful index, the optimizer chooses titles as the inner table in the query.

With the join order determined, the optimizer costs the eligible access methods for each table. For publishers, the outer table, it considers:

A hash-based table scan (the initial cost is the same as a serial table scan)

For titles, the inner table, the optimizer considers only aserial clustered index scan.

In this scenario, the optimizer chooses parallel over serial execution of publishers. Even though a hash-based table scan has the same cost as a serial scan, the processing time is cut by one-third, because each worker process can scan the inner table’s clustered index simultaneously.

Scenario C: neither table has a useful index

If neither table has a useful index, a merge join is a very likely choice for the access method. If merge joins are disabled, the table size and available cache space can be more important factors than potential parallel access for join order. The benefits of having a smaller table as the inner table outweigh the benefits of one parallel access method over the other. The optimizer chooses the publishers table as the inner table, because it is small enough to be read once and kept in cache, reducing costly physical I/O.

Then, the optimizer costs the eligible access methods for each table. For titles, the outer table, it considers:

A parallel partition scan (cost is number of pages in the largest partition)
A serial table scan (cost is number of pages in the table)

For publishers, the inner table, it considers only a serial table scan loaded into cache.

The optimizer chooses to access titles in parallel, because it reduces the cost of the query by a factor of 10.

In some cases where neither table has a useful index, the optimizer chooses the reformatting strategy, creating a temporary table and clustered index instead of repeatedly scanning the inner table.