MERGE JOIN

The MERGE JOIN operator is a binary operator. The left and right children are the outer and inner data streams, respectively. Both data streams must be sorted on the MERGE JOIN’s key values.

First, a row from the outer stream is fetched. This initializes the MERGE JOIN’s join key values. Then, rows from the inner stream are fetched until a row with key values that match or are greater than (less than if key column is descending) is encountered. If the join key matches, the qualifying row is passed on for additional processing, and a subsequent next call to the MERGE JOIN operator continues fetching from the currently active stream.

If the new values are greater than the current comparison key, these values are used as the new comparison join key while fetching rows from the other stream. This process continues until one of the data streams is exhausted.

Generally, the MERGE JOIN strategy is effective when a scan of the data streams requires that most of the rows must be processed, and that, if any of the input streams are large, they are already sorted on the join keys.

select ta.title_id
from titleauthor ta, authors a
where a.au_id = ta.au_id
and au_lname = "Bloom"
go

QUERY PLAN FOR STATEMENT 1 (at line 2).

    STEP 1
        The type of query is EXECUTE.
        Executing a newly cached statement.

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is SELECT.

3 operator(s) under root

ROOT:EMIT Operator (VA = 3)

	|MERGE JOIN Operator (Join Type: Inner Join)
	| Using Worktable2 for internal storage.
	|  Key Count: 1
	|  Key Ordering: ASC
	|
	|   |SORT Operator
	|   | Using Worktable1 for internal storage.
	|   |
	|   |   |SCAN Operator
	|   |   |  FROM TABLE
	|   |   |  authors
	|   |   |  a
	|   |   |  Index : aunmind
	|   |   |  Forward Scan.
	|   |   |  Positioning by key.
	|   |   |  Keys are:
	|   |   |    au_lname ASC
	|   |   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |   |  With LRU Buffer Replacement Strategy for index leaf pages.
	|   |   |  Using I/O Size 2 Kbytes for data pages.
	|   |   |  With LRU Buffer Replacement Strategy for data pages.
	|
	|   |SCAN Operator
	|   |  FROM TABLE
	|   |  titleauthor
	|   |  ta
	|   |  Index : auidind
	|   |  Forward Scan.
	|   |  Positioning at index start.
	|   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |  With LRU Buffer Replacement Strategy for index leaf pages.
	|   |  Using I/O Size 2 Kbytes for data pages.
	|   |  With LRU Buffer Replacement Strategy for data pages.

In this example, a sort operator is the left child, or outer stream. The data source for the sort operator is the authors table. The sort operator is required because the authors table has no index on au_id that would otherwise provide the necessary sorted order. A scan of the titleauthor table is the right child/inner stream. The scan uses the auidind index, which provides the necessary ordering for the MERGE JOIN strategy.

A row is fetched from the outer stream (the authors table is the original source) to establish an initial join key comparison value. Then rows are fetched from the titleauthor table until a row with a join key equal to or greater than the comparison key is found.

Inner stream rows with matching keys are stored in a cache in case they need to be refetched. These rows are refetched when the outer stream contains duplicate keys. When a titleauthor.au_id value that is greater than the current join key comparison value is fetched, the MERGE JOIN operator starts fetching from the outer stream until a join key value equal to or greater than the current titleauthor.au_id value is found. The scan of the inner stream resumes at that point.

The MERGE JOIN operator’s showplan output contains a message indicating the worktable to be used for the inner stream’s backing store. The worktable is written to if the inner rows with duplicate join keys no longer fits in cached memory. The width of a cached row is limited to 64 kilobytes.