Two versions of variance and standard deviation functions are supported: a sampling version, and a population version. Choosing between the two versions depends on the statistical context in which the function is to be used.
All the variance and standard deviation functions are true aggregate functions in that they can compute values for a partition of rows as determined by the query's GROUP BY clause. As with other basic aggregate functions such as MAX or MIN, their computation also ignores NULL values in the input.
For improved performance, the database server calculates the mean, and the deviation from mean, in one step. This means that only one pass over the data is required.
Also, regardless of the domain of the expression being analyzed, all variance and standard deviation computation is done using IEEE double-precision floating-point arithmetic. If the input to any variance or standard deviation function is the empty set, then each function returns NULL as its result. If VAR_SAMP is computed for a single row, then it returns NULL, while VAR_POP returns the value 0.
Following are the supported standard deviation and variance functions:
This function is an alias for the STDDEV_SAMP function.
This function computes the standard deviation of a population consisting of a numeric expression, as a DOUBLE.
The following query returns a result set that shows the employees whose salary is one standard deviation greater than the average salary of their department. Standard deviation is a measure of how much the data varies from the mean.
SELECT * FROM ( SELECT Surname AS Employee, DepartmentID AS Department, CAST( Salary as DECIMAL( 10, 2 ) ) AS Salary, CAST( AVG( Salary ) OVER ( PARTITION BY DepartmentID ) AS DECIMAL ( 10, 2 ) ) AS Average, CAST( STDDEV_POP( Salary ) OVER ( PARTITION BY DepartmentID ) AS DECIMAL ( 10, 2 ) ) AS StandardDeviation FROM Employees GROUP BY Department, Employee, Salary ) AS DerivedTable WHERE Salary > Average + StandardDeviation ORDER BY Department, Salary, Employee; |
The table that follows represents the result set from the query. Every department has at least one employee whose salary significantly deviates from the mean.
Employee | Department | Salary | Average | StandardDeviation | |
---|---|---|---|---|---|
1 | Lull | 100 | 87900.00 | 58736.28 | 16829.60 |
2 | Scheffield | 100 | 87900.00 | 58736.28 | 16829.60 |
3 | Scott | 100 | 96300.00 | 58736.28 | 16829.60 |
4 | Sterling | 200 | 64900.00 | 48390.95 | 13869.60 |
5 | Savarino | 200 | 72300.00 | 48390.95 | 13869.60 |
6 | Kelly | 200 | 87500.00 | 48390.95 | 13869.60 |
7 | Shea | 300 | 138948.00 | 59500.00 | 30752.40 |
8 | Blaikie | 400 | 54900.00 | 43640.67 | 11194.02 |
9 | Morris | 400 | 61300.00 | 43640.67 | 11194.02 |
10 | Evans | 400 | 68940.00 | 43640.67 | 11194.02 |
11 | Martinez | 500 | 55500.00 | 33752.20 | 9084.50 |
Employee Scott earns $96,300.00, while the departmental average is $58,736.28. The standard deviation for that department
is $16,829.00, which means that salaries less than $75,565.88 (58736.28 + 16829.60 = 75565.88
) fall within one standard deviation of the mean. At $96,300.00, employee Scott is well above that figure.
This example assumes that Surname and Salary are unique for each employee, which isn't necessarily true. To ensure uniqueness, you could add EmployeeID to the GROUP BY clause.
The following statement lists the average and variance in the number of items per order in different time periods:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, STDDEV_POP( Quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 14.2794... |
2000 | 2 | 27.050847 | 15.0270... |
... | ... | ... | ... |
This function computes the standard deviation of a sample consisting of a numeric expression, as a DOUBLE. For example, the following statement returns the average and variance in the number of items per order in different quarters:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, STDDEV_SAMP( Quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 14.3218... |
2000 | 2 | 27.050847 | 15.0696... |
... | ... | ... | ... |
This function is an alias for the VAR_SAMP function.
This function computes the statistical variance of a population consisting of a numeric expression, as a DOUBLE. For example, the following statement lists the average and variance in the number of items per order in different time periods:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, VAR_POP( quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 203.9021... |
2000 | 2 | 27.050847 | 225.8109... |
... | ... | ... | ... |
If VAR_POP is computed for a single row, then it returns the value 0.
This function computes the statistical variance of a sample consisting of a numeric expression, as a DOUBLE.
For example, the following statement lists the average and variance in the number of items per order in different time periods:
SELECT YEAR( ShipDate ) AS Year, QUARTER( ShipDate ) AS Quarter, AVG( Quantity ) AS Average, VAR_SAMP( Quantity ) AS Variance FROM SalesOrderItems GROUP BY Year, Quarter ORDER BY Year, Quarter; |
This query returns the following result:
Year | Quarter | Average | Variance |
---|---|---|---|
2000 | 1 | 25.775148 | 205.1158... |
2000 | 2 | 27.050847 | 227.0939... |
... | ... | ... | ... |
If VAR_SAMP is computed for a single row, then it returns NULL.
Discuss this page in DocCommentXchange.
|
Copyright © 2012, iAnywhere Solutions, Inc. - SQL Anywhere 12.0.1 |