Defining an Aggregate UDF

The C/C++ code for defining an aggregate user-defined function includes eight mandatory pieces.

  • extfnapiv3.h – the UDF interface definition header file. The file is extfnapiv4.h for the v4 API.

  • _start_extfn – an initialization function invoked once per SQL usage. All initialization functions take one argument: a pointer to the aggregate UDF context structure that is unique to each usage of an aggregate UDF. The context structure passed is the same one that is passed to all the supplied functions for that usage.

  • _finish_extfn – a shutdown function invoked once per SQL usage. All shutdown functions take one argument: a pointer to the aggregate UDF context structure that is unique to each usage of an aggregate UDF.

  • _reset_extfn – a reset function called once at the start of each new group, new partition, and if necessary, at the start of each window motion. All reset functions take one argument: a pointer to the aggregate UDF context structure that is unique to each usage of an aggregate UDF.

  • _next_value_extfn – a function called for each new set of input arguments. _next_value_extfn takes two arguments:

    • A pointer to the aggregate UDF context, and

    • An args_handle.

    As in scalar UDFs, the arg_handle is used with the supplied callback function pointers to access the actual argument values.

  • _evaluate_extfn – an evaluation function similar to the scalar UDF evaluation function. All evaluation functions take two arguments:

    • A pointer to the aggregate UDF context structure, and

    • An args_handle.

  • a_v3_extfn_aggregate – an instance of the aggregate UDF descriptor structure that contains the pointers to all of the supplied functions for this UDF.

  • Descriptor function – a descriptor function that returns a pointer to that aggregate UDF descriptor structure.

In addition to the mandatory pieces, there are several optional pieces that enable more optimized access for specific usage situations:

  • _drop_value_extfn – an optional function pointer that is called for each input set of argument values that has fallen out of a moving window frame. This function should not set the result of the aggregation. Use the get_value callback function to access the input argument values, and, if necessary, through repeated calls to the get_piece callback function.

    Set the function pointer to the null pointer if:
    • This aggregate cannot be used with a window frame,

    • The aggregate is not reversible in some way, or

    • The user is not interested in optimal performance.

    If _drop_value_extfn is not supplied and the user has specified a moving window, each time the window frame moves, the reset function is called and each row within the window is included by a call to the next_value function, and finally the evaluate function is called.

    If _drop_value_extfn is supplied, then each time the window frame moves, this drop value function is called for each row falling out of the window frame, then the next_value function is called for each row that has just been added into the window frame, and finally the evaluate function is called to produce the aggregate result.

  • _evaluate_cumulative_extfn – an optional function pointer that may be called for each new input set of argument values. If this function is supplied, and the usage is in a row-based window frame that spans UNBOUNDED PRECEDING to CURRENT ROW, then this function is called instead of calling the next value function immediately followed by calling the evaluate function.

    _evalutate_cumulative_extfn must set the result of the aggregation through the set_value callback. Access to its set of input argument values is through the usual get_value callback function. This function pointer should be set to the null pointer if:

    • This aggregate will never be used is this manner, or

    • The user is not worried about optimal performance.

  • _next_subaggregate_extfn – an optional callback function pointer that works together with an _evaluate_superaggregate_extfn to enable some usages of this aggregate to be optimized by running in parallel.

    Some aggregates, when used as simple aggregates (in other words, not OLAP-style aggregates with an OVER clause) can be partitioned by first producing a set of intermediate aggregate results where each intermediate result is computed from a disjointed subset of the input rows.

    Examples of such partitionable aggregates include:

    • SUM, where the final SUM can be computed by performing a SUM for each disjointed subset of the input rows and then performing a SUM over the sub-SUMs; and

    • COUNT(*), where the final COUNT can be computed by performing a COUNT for each disjoint subset of the input rows and then performing a SUM over the COUNTs from each partition.

    When an aggregate satisfies the above conditions, the server may choose to make the computation of that aggregate parallel. For aggregate UDFs, this parallel optimization can be applied only if both the _next_subaggregate_extfn function pointer and the _evaluate_superaggregate_extfn pointer are supplied.

    The _reset_extfn function does not set the final result of the aggregation, and by definition, has exactly one input argument value that is the same data type as the defined return value of the aggregate UDF.

    Access to the subaggregate input value is through the normal get_value callback function. Direct communication between subaggregates and the superaggregate is impossible; the server handles all such communication. The sub-aggregates and the super-aggregate do not share a context structure. Instead, individual sub-aggregates are treated exactly the same as nonpartitioned aggregates. The independent super-aggregate sees a calling pattern that looks like this:

    _start_extfn
    _reset_extfn
    _next_subaggregate_extfn (repeated 0 to N times)
    _evaluate_superaggregate_extfn
    _finish_extfn

    Or like this:

    _start_extfn
    _reset_extfn
    _next_subaggregate_extfn (repeated 0 to N times)
    _evaluate_superaggregate_extfn
    _reset_extfn
    _next_subaggregate_extfn (repeated 0 to N times)
    _evaluate_superaggregate_extfn
    _reset_extfn
    _next_subaggregate_extfn (repeated 0 to N times)
    _evaluate_superaggregate_extfn
    _finish_extfn

    If neither _evaluate_superaggregate_extfn or _next_subaggregate_extfn is supplied, then the aggregate UDF is restricted, and not allowed as a simple aggregate within a query block containing GROUP BY CUBE or GROUP BY ROLLUP.

  • _evaluate_superaggregate_extfn – the optional callback function pointer that works with the _next_subaggregate_extfn to enable some usages as a simple aggregate to be optimized through parallelization. _evaluate_superaggregate_extfn is called to return the result of a partitioned aggregate. The result value is sent to the server using the normal set_value callback function from the a_v3_extfn_aggregate_context structure.

Related concepts
Context Storage of Aggregate User-Defined Functions
Related tasks
Declaring an Aggregate UDF
Related reference
Blob (a_v4_extfn_blob)
Blob Input Stream (a_v4_extfn_blob_istream)