Ranking functions let you compile a list of values from the data set in ranked order, as well as compose single-statement SQL queries that fulfil requests such as, “Name the top 10 products shipped this year by total sales,” or “Give the top 5% of salespersons who sold orders to at least 15 different companies.”
SQL/OLAP defines five functions that are categorized as ranking functions:
<RANK FUNCTION TYPE> ::= RANK | DENSE_RANK | PERCENT_RANK | ROW_NUMBER | NTILE
Ranking functions let you compute a rank value for each row in a result set based on the order specified in the query. For example, a sales manager might need to identify the top or bottom sales people in the company, the highest- or lowest-performing sales region, or the best- or worst-selling products. Ranking functions can provide this information.
The RANK function returns a number that indicates the rank of the current row among the rows in the row’s partition, as defined by the ORDER BY clause. The first row in a partition has a rank of 1, and the last rank in a partition containing 25 rows is 25. RANK is specified as a syntax transformation, which means that an implementation can choose to actually transform RANK into its equivalent, or it can merely return a result equivalent to the result that transformation would return.
In the following example, ws1 indicates the window specification that defines the window named w1.
RANK() OVER ws
is equivalent to:
( COUNT (*) OVER ( ws RANGE UNBOUNDED PRECEDING ) - COUNT (*) OVER ( ws RANGE CURRENT ROW ) + 1 )
The transformation of the RANK function uses logical aggregation (RANGE). As a result, two or more records that are tied—or have equal values in the ordering column—have the same rank.The next group in the partition that has a different value has a rank that is more than one greater than the rank of the tied rows. For example, if there are rows whose ordering column values are 10, 20, 20, 20, 30, the rank of the first row is 1 and the rank of the second row is 2. The rank of the third and fourth row is also 2, but the rank of the fifth row is 5. There are no rows whose rank is 3 or 4. This algorithm is sometimes known as sparse ranking.
See also “RANK function [Analytical],” Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.
Although RANK returns duplicate values in the ranking sequence when there are ties between values, DENSE_RANK returns ranking values without gaps. The values for rows with ties are still equal, but the ranking of the rows represents the positions of the clusters of rows having equal values in the ordering column, rather than the positions of the individual rows. As in the RANK example, where rows ordering column values are 10, 20, 20, 20, 30, the rank of the first row is still 1 and the rank of the second row is still 2, as are the ranks of the third and fourth rows. The last row, however, is 3, not 5.
DENSE_RANK is computed through a syntax transformation, as well.
DENSE_RANK() OVER ws
is equivalent to:
COUNT ( DISTINCT ROW ( expr_1, . . ., expr_n ) ) OVER ( ws RANGE UNBOUNDED PRECEDING )
In the above example, expr_1 through expr_n represent the list of value expressions in the sort specification list of window w1.
See also “DENSE_RANK function [Analytical],” Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.
The PERCENT_RANK function calculates a percentage for the rank, rather than a fractional amount, and returns a decimal value between 0 and 1. In other words, PERCENT_RANK returns the relative rank of a row, which is a number that indicates the relative position of the current row within the window partition in which it appears. For example, in a partition that contains 10 rows having different values in the ordering columns, the third row is given a PERCENT_RANK value of 0.222 …, because you have covered 2/9 (22.222...%) of rows following the first row of the partition. PERCENT_RANK of a row is defined as one less than the RANK of the row divided by one less than the number of rows in the partition, as seen in the following example (where “ANT” stands for an approximate numeric type, such as REAL or DOUBLE PRECISION).
PERCENT_RANK() OVER ws
is equivalent to:
CASE WHEN COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) = 1 THEN CAST (0 AS ANT) ELSE ( CAST ( RANK () OVER ( ws ) AS ANT ) -1 / ( COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) - 1 ) END
See also PERCENT_RANK function [Analytical] in Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.
The ROW_NUMBER function returns a unique row number for each row. If you define window partitions, ROW_NUMBER starts the row numbering in each partition at 1, and increments each row by 1. If you do not specify a window partition, ROW_NUMBER numbers the complete result set from 1 to the total cardinality of the table.
The ROW_NUMBER function syntax is:
ROW_NUMBER() OVER ([PARTITION BY window partition] ORDER BY window ordering)
ROW_NUMBER does not require an argument, but you must specify the parentheses.
The PARTITION BY clause is optional. The OVER (ORDER_BY) clause cannot contain a window frame ROWS/RANGE specification. See “Window framing”.
Ranking example 1 The SQL query that follows finds the male and female employees from California, and ranks them in descending order according to salary.
SELECT Surname, Sex, Salary, RANK() OVER ( ORDER BY Salary DESC) as RANK FROM Employees WHERE State IN ('CA') AND DepartmentID =200 ORDER BY Salary DESC;
The results from the above query:
Surname Sex Salary RANK ------- --- ------ ---- Savarino F 72300.000 1 Clark F 45000.000 2 Overbey M 39300.000 3
Ranking example 2 Using the query from Ranking example 1, you can change the data by partitioning it by gender. The following example ranks employees in descending order by salary and partitions by gender:
SELECT Surname, Sex, Salary, RANK() OVER (PARTITION BY Sex ORDER BY Salary DESC) AS RANK FROM Employees WHERE State IN ('CA', 'AZ') AND DepartmentID IN (200, 300) ORDER BY Sex, Salary DESC;
The results from the above query:
Surname Sex Salary RANK ------- --- --------- ---- Savarino F 72300.000 1 Jordan F 51432.000 2 Clark F 45000.000 3 Coleman M 42300.000 1 Overbey M 39300.000 2
Ranking example 3 This example ranks a list of female employees in California and Texas in descending order according to salary. The PERCENT_RANK function provides the cumulative total in descending order.
SELECT Surname, Salary, Sex, CAST(PERCENT_RANK() OVER (ORDER BY Salary DESC) AS numeric (4, 2)) AS RANK FROM Employees WHERE State IN ('CA', 'TX') AND Sex ='F' ORDER BY Salary DESC;
The results from the above query:
Surname salary sex RANK --------- --------- --- ---------- Savarino 72300.000 F 0.00 Smith 51411.000 F 0.33 Clark 45000.000 F 0.66 Garcia 39800.000 F 1.00
Ranking example 4 You can use the PERCENT_RANK function to find the top or bottom percentiles in the data set. This query returns male employees whose salary is in the top five percent of the data set.
SELECT * FROM (SELECT Surname, Salary, Sex, CAST(PERCENT_RANK() OVER (ORDER BY salary DESC) as numeric (4, 2)) AS percent FROM Employees WHERE State IN ('CA') AND sex ='F' ) AS DT where percent > 0.5 ORDER BY Salary DESC;
The results from the above query:
Surname salary sex percent --------- ---------- --- --------- Clark 45000.000 F 1.00
Ranking example 5 This example uses the ROW_NUMBER function to return row numbers for each row in all window partitions. The query partitions the Employees table by department ID, and orders the rows in each partition by start date.
SELECT DepartmentID dID, StartDate, Salary , ROW_NUMBER()OVER(PARTITION BY dID ORDER BY StartDate) FROM Employees ORDER BY 1,2;
The results from the above query are:
dID StartDate Salary Row_number() ======== =========== ========== ============= 100 1984-08-28 47500.000 1 100 1985-01-01 62000.500 2 100 1985-06-17 57490.000 3 100 1986-06-07 72995.000 4 100 1986-07-01 48023.690 5 ... ... ... ... 200 1985-02-03 38500.000 1 200 1985-12-06 54800.000 2 200 1987-02-19 39300.000 3 200 1987-07-10 49500.000 4 ... ... ... ... 500 1994-02-27 24903.000 9