SQL Anywhere supports two versions of variance and standard deviation functions: a sampling version, and a population version. Choosing between the two versions depends on the statistical context in which the function is to be used.
All the variance and standard deviation functions are true aggregate functions in that they can compute values for a partition of rows as determined by the query's GROUP BY clause. As with other basic aggregate functions such as MAX or MIN, their computation also ignores NULL values in the input.
For improved performance, SQL Anywhere calculates the mean, and the deviation from mean, in one step. This means that only one pass over the data is required.
Also, regardless of the domain of the expression being analyzed, all variance and standard deviation computation is done using IEEE double-precision floating point. If the input to any variance or standard deviation function is the empty set, then each function returns NULL as its result. If VAR_SAMP is computed for a single row, then it returns NULL, while VAR_POP returns the value 0.
Following are the standard deviation and variance functions offered in SQL Anywhere:
To review the mathematical formulas represented by these functions see Mathematical formulas for the aggregate functions.
This function is an alias for the STDDEV_SAMP function. See STDDEV_SAMP function [Aggregate].
This function computes the standard deviation of a population consisting of a numeric expression, as a DOUBLE.
The following query returns a result set that shows the employees whose salary is one standard deviation greater than the average salary of their department. Standard deviation is a measure of how much the data varies from the mean.
SELECT * FROM ( SELECT Surname AS Employee, DepartmentID AS Department, CAST( Salary as DECIMAL( 10, 2 ) ) AS Salary, CAST( AVG( Salary ) OVER ( PARTITION BY DepartmentID ) AS DECIMAL ( 10, 2 ) ) AS Average, CAST( STDDEV_POP( Salary ) OVER ( PARTITION BY DepartmentID ) AS DECIMAL ( 10, 2 ) ) AS StandardDeviation FROM Employees GROUP BY Department, Employee, Salary ) AS DerivedTable WHERE Salary > Average + StandardDeviation ORDER BY Department, Salary, Employee; |
The table that follows represents the result set from the query. Every department has at least one employee whose salary significantly deviates from the mean.
Employee | Department | Salary | Average | StandardDeviation | |
---|---|---|---|---|---|
1 | Lull | 100 | 87900.00 | 58736.28 | 16829.60 |
2 | Scheffield | 100 | 87900.00 | 58736.28 | 16829.60 |
3 | Scott | 100 | 96300.00 | 58736.28 | 16829.60 |
4 | Sterling | 200 | 64900.00 | 48390.95 | 13869.60 |
5 | Savarino | 200 | 72300.00 | 48390.95 | 13869.60 |
6 | Kelly | 200 | 87500.00 | 48390.95 | 13869.60 |
7 | Shea | 300 | 138948.00 | 59500.00 | 30752.40 |
8 | Blaikie | 400 | 54900.00 | 43640.67 | 11194.02 |
9 | Morris | 400 | 61300.00 | 43640.67 | 11194.02 |
10 | Evans | 400 | 68940.00 | 43640.67 | 11194.02 |
11 | Martinez | 500 | 55500.00 | 33752.20 | 9084.50 |
Employee Scott earns $96,300.00, while the departmental average is $58,736.28. The standard deviation for that department
is $16,829.00, which means that salaries less than $75,565.88 (58736.28 + 16829.60 = 75565.88
) fall within one standard deviation of the mean. At $96,300.00, employee Scott is well above that figure.
This example assumes that Surname and Salary are unique for each employee, which isn't necessarily true. To ensure uniqueness, you could add EmployeeID to the GROUP BY clause.
The following statement lists the average and variance in the number of items per order in different time periods:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, STDDEV_POP( Quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 14.2794... |
2000 | 2 | 27.050847 | 15.0270... |
... | ... | ... | ... |
For more information about the syntax for this function, see STDDEV_SAMP function [Aggregate].
This function computes the standard deviation of a sample consisting of a numeric expression, as a DOUBLE. For example, the following statement returns the average and variance in the number of items per order in different quarters:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, STDDEV_SAMP( Quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 14.3218... |
2000 | 2 | 27.050847 | 15.0696... |
... | ... | ... | ... |
For more information about the syntax for this function, see STDDEV_POP function [Aggregate].
This function is an alias for the VAR_SAMP function. See VAR_SAMP function [Aggregate].
This function computes the statistical variance of a population consisting of a numeric expression, as a DOUBLE. For example, the following statement lists the average and variance in the number of items per order in different time periods:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, VAR_POP( quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 203.9021... |
2000 | 2 | 27.050847 | 225.8109... |
... | ... | ... | ... |
If VAR_POP is computed for a single row, then it returns the value 0.
For more information about the syntax for this function, see VAR_POP function [Aggregate].
This function computes the statistical variance of a sample consisting of a numeric expression, as a DOUBLE.
For example, the following statement lists the average and variance in the number of items per order in different time periods:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, VAR_SAMP( Quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 205.1158... |
2000 | 2 | 27.050847 | 227.0939... |
... | ... | ... | ... |
If VAR_SAMP is computed for a single row, then it returns NULL.
For more information about the syntax for this function, see VAR_SAMP function [Aggregate].
Discuss this page in DocCommentXchange. Send feedback about this page using email. |
Copyright © 2009, iAnywhere Solutions, Inc. - SQL Anywhere 11.0.1 |