Statistical aggregate functions

The ANSI SQL/OLAP extensions provide a number of additional aggregate functions that permit statistical analysis of numeric data. This support includes functions to compute variance, standard deviation, correlation, and linear regression.

Standard deviation and variance

The SQL/OLAP general set functions that take one argument include those appearing in bold in this syntax statement:

<SIMPLE WINDOW AGGREGATE FUNCTION TYPE> ::=
  <BASIC AGGREGATE FUNCTION TYPE>
  | STDDEV | STDDEV_POP | STDDEV_SAMP
  | VARIANCE | VARIANCE_POP | VARIANCE_SAMP

These functions, including STDDEV and VARIANCE, are true aggregate functions in that they can compute values for a partition of rows as determined by the query’s ORDER BY clause. As with other basic aggregate functions such as MAX or MIN, their computation ignores NULL values in the input. Also, regardless of the domain of the expression being analyzed, all variance and standard deviation computation uses IEEE double-precision floating point. If the input to any variance or standard deviation function is the empty set, then each function returns NULL as its result. If VAR_SAMP is computed for a single row, it returns NULL, while VAR_POP returns the value 0.

Correlation

The SQL/OLAP function that computes a correlation coefficient is:

You can use the CORR function either as a windowing aggregate function (where you specify a window function type over a window name or specification) or as a simple aggregate function with no OVER clause.

Covariance

The SQL/OLAP functions that compute covariances include:

The covariance functions eliminate all pairs where expression1 or expression2 has a null value.

You can use the covariance functions either as windowing aggregate functions (where you specify a window function type over a window name or specification) or as simple aggregate functions with no OVER clause.

Cumulative distribution

The SQL/OLAP function that calculates the relative position of a single value among a group of rows is CUME_DIST.

The window specification must contain an ORDER_BY clause.

Composite sort keys are not allowed in the CUME_DIST function.

Regression analysis

The regression analysis functions calculate the relationship between an independent variable and a dependent variable using a linear regression equation. The SQL/OLAP linear regression functions include:

You can use the regression analysis functions either as windowing aggregate functions (where you specify a window function type over a window name or specification) or as simple aggregate functions with no OVER clause.

Weighted OLAP aggregates

The weighted OLAP aggregate functions calculate weighted moving averages:

The window specification must contain an ORDER_BY clause.

Nonstandard database industry extensions

Non-ANSI SQL/OLAP aggregate function extensions used in the database industry include FIRST_VALUE, MEDIAN, and LAST_VALUE.

The FIRST_VALUE and LAST_VALUE functions require a window specification. You can use the MEDIAN function either as windowing aggregate function (where you specify a window function type over a window name or specification) or as a simple aggregate function with no OVER clause.