Standard deviation and variance functions

SQL Anywhere supports two versions of variance and standard deviation functions: a sampling version, and a population version. Choosing between the two versions depends on the statistical context in which the function is to be used.

All the variance and standard deviation functions are true aggregate functions in that they can compute values for a partition of rows as determined by the query's GROUP BY clause. As with other basic aggregate functions such as MAX or MIN, their computation also ignores NULL values in the input.

For improved performance, SQL Anywhere calculates the mean, and the deviation from mean, in one step. This means that only one pass over the data is required.

Also, regardless of the domain of the expression being analyzed, all variance and standard deviation computation is done using IEEE double-precision floating point. If the input to any variance or standard deviation function is the empty set, then each function returns NULL as its result. If VAR_SAMP is computed for a single row, then it returns NULL, while VAR_POP returns the value 0.

Following are the standard deviation and variance functions offered in SQL Anywhere:

To review the mathematical formulas represented by these functions see Mathematical formulas for the aggregate functions.

STDDEV function

This function is an alias for the STDDEV_SAMP function. See STDDEV_SAMP function [Aggregate].

STDDEV_POP function

This function computes the standard deviation of a population consisting of a numeric expression, as a DOUBLE.

Example 1

The following query returns a result set that shows the employees whose salary is one standard deviation greater than the average salary of their department. Standard deviation is a measure of how much the data varies from the mean.

SELECT *
FROM ( SELECT 
    Surname AS Employee, 
    DepartmentID AS Department, 
    CAST( Salary as DECIMAL( 10, 2 ) ) 
        AS Salary,
    CAST( AVG( Salary ) 
        OVER ( PARTITION BY DepartmentID ) AS DECIMAL ( 10, 2 ) ) 
        AS Average,
    CAST( STDDEV_POP( Salary )
        OVER ( PARTITION BY DepartmentID ) AS DECIMAL ( 10, 2 ) ) 
        AS StandardDeviation
    FROM Employees
    GROUP BY Department, Employee, Salary )
    AS DerivedTable
WHERE Salary > Average + StandardDeviation
ORDER BY Department, Salary, Employee;

The table that follows represents the result set from the query. Every department has at least one employee whose salary significantly deviates from the mean.

Employee Department Salary Average StandardDeviation
1 Lull 100 87900.00 58736.28 16829.60
2 Scheffield 100 87900.00 58736.28 16829.60
3 Scott 100 96300.00 58736.28 16829.60
4 Sterling 200 64900.00 48390.95 13869.60
5 Savarino 200 72300.00 48390.95 13869.60
6 Kelly 200 87500.00 48390.95 13869.60
7 Shea 300 138948.00 59500.00 30752.40
8 Blaikie 400 54900.00 43640.67 11194.02
9 Morris 400 61300.00 43640.67 11194.02
10 Evans 400 68940.00 43640.67 11194.02
11 Martinez 500 55500.00 33752.20 9084.50

Employee Scott earns $96,300.00, while the departmental average is $58,736.28. The standard deviation for that department is $16,829.00, which means that salaries less than $75,565.88 (58736.28 + 16829.60 = 75565.88) fall within one standard deviation of the mean. At $96,300.00, employee Scott is well above that figure.

This example assumes that Surname and Salary are unique for each employee, which isn't necessarily true. To ensure uniqueness, you could add EmployeeID to the GROUP BY clause.

Example 2

The following statement lists the average and variance in the number of items per order in different time periods:

SELECT YEAR( ShipDate ) AS Year,
    QUARTER( ShipDate ) AS Quarter,
    AVG( Quantity ) AS Average,
    STDDEV_POP( Quantity ) AS Variance
FROM SalesOrderItems
GROUP BY Year, Quarter
ORDER BY Year, Quarter;

This query returns the following result:

Year Quarter Average Variance
2000 1 25.775148 14.2794...
2000 2 27.050847 15.0270...
... ... ... ...

For more information about the syntax for this function, see STDDEV_SAMP function [Aggregate].

STDDEV_SAMP function

This function computes the standard deviation of a sample consisting of a numeric expression, as a DOUBLE. For example, the following statement returns the average and variance in the number of items per order in different quarters:

SELECT YEAR( ShipDate ) AS Year,
    QUARTER( ShipDate ) AS Quarter,
    AVG( Quantity ) AS Average,
    STDDEV_SAMP( Quantity ) AS Variance
FROM SalesOrderItems
GROUP BY Year, Quarter
ORDER BY Year, Quarter;

This query returns the following result:

Year Quarter Average Variance
2000 1 25.775148 14.3218...
2000 2 27.050847 15.0696...
... ... ... ...

For more information about the syntax for this function, see STDDEV_POP function [Aggregate].

VARIANCE function

This function is an alias for the VAR_SAMP function. See VAR_SAMP function [Aggregate].

VAR_POP function

This function computes the statistical variance of a population consisting of a numeric expression, as a DOUBLE. For example, the following statement lists the average and variance in the number of items per order in different time periods:

SELECT YEAR( ShipDate ) AS Year,
    QUARTER( ShipDate ) AS Quarter,
    AVG( Quantity ) AS Average,
    VAR_POP( quantity ) AS Variance
FROM SalesOrderItems
GROUP BY Year, Quarter
ORDER BY Year, Quarter;

This query returns the following result:

Year Quarter Average Variance
2000 1 25.775148 203.9021...
2000 2 27.050847 225.8109...
... ... ... ...

If VAR_POP is computed for a single row, then it returns the value 0.

For more information about the syntax for this function, see VAR_POP function [Aggregate].

VAR_SAMP function

This function computes the statistical variance of a sample consisting of a numeric expression, as a DOUBLE.

For example, the following statement lists the average and variance in the number of items per order in different time periods:

SELECT YEAR( ShipDate ) AS Year,
    QUARTER( ShipDate ) AS Quarter,
    AVG( Quantity ) AS Average,
    VAR_SAMP( Quantity ) AS Variance
FROM SalesOrderItems
GROUP BY Year, Quarter
ORDER BY Year, Quarter;

This query returns the following result:

Year Quarter Average Variance
2000 1 25.775148 205.1158...
2000 2 27.050847 227.0939...
... ... ... ...

If VAR_SAMP is computed for a single row, then it returns NULL.

For more information about the syntax for this function, see VAR_SAMP function [Aggregate].