GROUP SORTED Distinct operator

You can use the GROUP SORTED Distinct operator to apply distinctness. GROUP SORTED Distinct requires that the input stream is already sorted on the distinct columns. It reads a row from its child operator and initializes the current distinct columns’ values to be filtered.

The row is returned to the parent operator. When the GROUP SORTED operator is called again to fetch another row, it fetches another row from its child and compares the values to the current cached values. If the value is a duplicate, the row is discarded and the child is called again to fetch a new row.

This process continues until a new distinct row is found. The distinct columns’ values for this row are cached and are used later to eliminate nondistinct rows. The current row is returned to the parent operator for further processing.

The GROUP SORTED Distinct operator returns a sorted stream. The fact that it returns a sorted and distinct data stream are properties that the optimizer can use to improve performance in additional upstream processing. The GROUP SORTED Distinct operator is a nonblocking operator. It returns a distinct row to its parent as soon as it is fetched. It does not require the entire input stream to be processed before it can start returning rows. The following query collects distinct last and first author’s names:

select distinct au_lname, au_fname
from authors
where au_lname = "Bloom"

QUERY PLAN FOR STATEMENT 1 (at line 2).

STEP 1
The type of query is SELECT.

2 operator(s) under root

ROOT:EMIT Operator (VA = 2)

	|GROUP SORTED Operator (VA = 1)
	|Distinct
	|
	|   |SCAN Operator (VA = 0)
	|   |  FROM TABLE
	|   |  authors
	|   |  Index : aunmind
	|   |  Forward Scan.
	|   |  Positioning by key.
	|   |  Index contains all needed columns. Base table will not be read.
	|   |  Keys are:
	|   |    au_lname ASC
	|   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |  With LRU Buffer Replacement Strategy for index leaf pages.

The GROUP SORTED Distinct operator is chosen in this query plan to apply the distinct property because the scan operator is returning rows in sorted order for the distinct columns au_lname and au_fname. GROUP SORTED incurs no I/O and minimal CPU overhead.

You can use the GROUP SORTED Distinct operator to implement vector aggregation. See “Vector aggregation operators”. The showplan output prints the line Distinct to indicate that this GROUP SORTED Distinct operator is implementing the distinct property.