Vector aggregation operators

There are three unary operators used for vector aggregation. They are the GROUP SORTED COUNT AGGREGATE, the HASH VECTOR AGGREGATE, and the GROUP INSERTING operators.

See Table 1-3 for a list and description of all query processor operators.

GROUP SORTED COUNT AGGREGATE operator

The GROUP SORTED COUNT AGGREGATE nonblocking operator is a variant of the GROUP SORTED Distinct operator described in “GROUP SORTED Distinct operator”. The GROUP SORTED COUNT AGGREGATE operator requires that input set to be sorted on the group by columns. The algorithm is very similar to that of GROUP SORTED Distinct.

A row is read from the child operator. If the row is the start of a new vector, its grouping columns are cached and the aggregation results are initialized.

If the row belongs to the current group being processed, the aggregate functions are applied to the aggregate results. When the child operator returns a row that starts a new group or End Of Scan, the current vector and its aggregated values are returned to the parent operator.

The first row in the GROUP SORTED COUNT AGGREGATE operator is returned after an entire group is processed, where the first row in the GROUP SORTED Distinct operator is returned at the start of a new group. This example collects a list of all cities with the number of authors that live in each city.

select city, total_authors = count(*)
from authors
group by city
plan
"(group_sorted
(sort (scan authors))
)"

QUERY PLAN FOR STATEMENT 1 (at line 1).
Optimized using the Abstract Plan in the PLAN clause.

STEP 1
The type of query is SELECT.

3 operator(s) under root

ROOT:EMIT Operator (VA = 3)

    |GROUP SORTED Operator (VA = 2) 
    | Evaluate Grouped COUNT AGGREGATE.
    |
    |    |SORT Operator (VA = 1)
    |    | Using Worktable1 for internal storage.
    |    |
    |    |    | SCAN Operator (VA = 0)
    |    |    | FROM TABLE
    |    |    | authors
    |    |    | Table Scan.
    |    |    | Forward Scan.
    |    |    | Positioning at start of table.
    |    |    | Using I/O Size 2 Kbytes for data pages.
    |    |    | With LRU Buffer Replacement Strategy for data pages.

In this query plan, the scan of authors does not return rows in grouping order. A SORT operator is applied to order the stream based on the grouping column city. At this point, a GROUP SORTED COUNT AGGREGATE operator can be applied to evaluate the count aggregate.

The GROUP SORTED COUNT AGGREGATE operator showplan output reports the aggregate functions being applied as:

 |  Evaluate Grouped COUNT AGGREGATE.

HASH VECTOR AGGREGATE operator

The HASH VECTOR AGGREGATE operator is a blocking operator. All rows from the child operator must be processed before the first row from the HASH VECTOR AGGREGATE operator can be returned to its parent operator. Other than this, the algorithm is similar to the HASH Distinct operator’s algorithm.

Rows are fetched from the child operator. Each row is hashed on the query’s grouping columns. The bucket that is hashed is searched to see if the vector already exists.

If the group by values do not exist, the vector is added and the aggregate values are initialized using this first row. If the group by values do exist, the current row is aggregated to the existing values. This example collects a list of all cities with the number of authors that live in each city.

select city, total_authors = count(*)
from authors
group by city

QUERY PLAN FOR STATEMENT 1 (at line 1).

STEP 1
The type of query is SELECT.

2 operator(s) under root

ROOT:EMIT Operator (VA = 2)

    |HASH VECTOR AGGREGATE Operator (VA = 1)
    | GROUP BY
    | Evaluate Grouped COUNT AGGREGATE.
    | Using Worktable1 for internal storage.
    | Key Count: 1
    |
    |    |SCAN Operator (VA = 0)
    |    | FROM TABLE
    |    | authors
    |    | Table Scan.
    |    | Forward Scan.
    |    | Using I/O Size 2 Kbytes for data pages.
    |    | With LRU Buffer Replacement Strategy for data pages.

In this query plan, the HASH VECTOR AGGREGATE operator reads all of the rows from its child operator, which is scanning the authors table. Each row is checked to see if there is already an entry bucket entry for the current city value. If there is not, a hash entry row is added with the new city grouping value and the count result is initialized to 1. If there is already a hash entry for the new row’s city value, the aggregation function is applied. In this case, the count result is incriminated.

The showplan output prints a group by message specifically for the HASH VECTOR AGGREGATE operator, then prints the grouped aggregation messages:

 |  Evaluate Grouped COUNT AGGREGATE.

The showplan output reports used to store spilled groups and unprocessed rows:

 | Using Worktable1 for internal storage.

GROUP INSERTING

GROUP INSERTING is a blocking operator. All rows from the child operator must be processed before the first row can be returned from the GROUP INSERTING.

GROUP INSERTING is limited to 31 or fewer columns in the group by clause. The operator starts by creating a worktable with a clustered index of the grouping columns. As each row is fetched from the child, a lookup into the work table is done based on the grouping columns. If no row is found, then the row is inserted. This effectively creates a new group and initializes its aggregate values. If a row is found, the new aggregate values are updated based on evaluating the new values. The GROUP INSERTING operator returns rows ordered by the grouping columns.

select city, total_authors = count(*)
from authors
group by city
plan
'(group_inserting (i_scan auidind authors ))'

QUERY PLAN FOR STATEMENT 1 (at line 1).
Optimized using the Abstract Plan in the PLAN clause.

STEP 1
The type of query is SELECT.

2 operator(s) under root
   |ROOT:EMIT Operator (VA = 2)
|
|   |GROUP INSERTING Operator (VA = 1)
|   | GROUP BY
|   | Evaluate Grouped COUNT AGGREGATE
|   | Using Worktable1 for internal storage.
|   |
|   |   |SCAN Operator (VA = 0)
|   |   | FROM TABLE
|   |   | authors
|   |   | Table Scan.
|   |   | Forward Scan.
|   |   | Positioning at start of table.
|   |   | Using I/O Size 2 Kbytes for data pages.
|   |   | With LRU Buffer Replacement Strategy for data pages.

In this example, the group inserting operator starts by building a worktable with a clustered index keyed on the city column. The group inserting operator proceeds to drain the authors table. For each row, a lookup is done on the city value. If there is no row in the aggregation worktable with the current city value, then the row is inserted. This creates a new group for the current city value with an initialized count value. If the row for the current city value is found, then an evaluation is done to increment the COUNT AGGREGATE value.