Ranking functions let you compile a list of values from the data set in ranked order, as well as compose single-statement SQL queries that answer questions such as, “Name the top 10 products shipped this year by total sales,” or “Give the top 5% of salespersons who sold orders to at least 15 different companies.” The functions include RANK(), DENSE_RANK(), PERCENT_RANK(), and NTILE() with a PARTITION BY clause. See “Ranking functions”.
SQL/OLAP defines four functions that are categorized as ranking functions:
<RANK FUNCTION TYPE> ::= RANK | DENSE RANK | PERCENT RANK | NTILE
Ranking functions let you compute a rank value for each row in a result set based on the order specified in the query. For example, a sales manager might need to identify the top or bottom sales people in the company, the highest- or lowest-performing sales region, or the best- or worst-selling products. Ranking functions can provide this information.
The RANK function returns a number that indicates the rank of the current row among the rows in the row’s partition, as defined by the ORDER BY clause. The first row in a partition has a rank of 1, and the last rank in a partition containing 25 rows is 25. RANK is specified as a syntax transformation, which means that an implementation can choose to actually transform RANK into its equivalent, or it can merely return a result equivalent to the result that transformation would return.
In the following example, ws1 indicates the window specification that defines the window named w1.
RANK() OVER ws
is equivalent to:
( COUNT (*) OVER ( ws RANGE UNBOUNDED PRECEDING ) - COUNT (*) OVER ( ws RANGE CURRENT ROW ) + 1 )
The transformation of the RANK function uses logical aggregation (RANGE). As a result, two or more records that are tied—or have equal values in the ordering column—have the same rank.The next group in the partition that has a different value has a rank that is more than one greater than the rank of the tied rows. For example, if there are rows whose ordering column values are 10, 20, 20, 20, 30, the rank of the first row is 1 and the rank of the second row is 2. The rank of the third and fourth row is also 2, but the rank of the fifth row is 5. There are no rows whose rank is 3 or 4. This algorithm is sometimes known as sparse ranking.
See also “RANK function [Analytical],” Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.
Although RANK returns duplicate values in the ranking sequence when there are ties between values, DENSE_RANK returns ranking values without gaps. The values for rows with ties are still equal, but the ranking of the rows represents the positions of the clusters of rows having equal values in the ordering column, rather than the positions of the individual rows. As in the RANK example, where rows ordering column values are 10, 20, 20, 20, 30, the rank of the first row is still 1 and the rank of the second row is still 2, as are the ranks of the third and fourth rows. The last row, however, is 3, not 5.
DENSE_RANK is computed through a syntax transformation, as well.
DENSE_RANK() OVER ws
is equivalent to:
COUNT ( DISTINCT ROW ( expr_1, . . ., expr_n ) ) OVER ( ws RANGE UNBOUNDED PRECEDING )
In the above example, expr_1 through expr_n represent the list of value expressions in the sort specification list of window w1.
See also “DENSE_RANK function [Analytical],” Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.
The PERCENT_RANK function calculates a percentage for the rank, rather than a fractional amount, and returns a decimal value between 0 and 1. In other words, PERCENT_RANK returns the relative rank of a row, which is a number that indicates the relative position of the current row within the window partition in which it appears. For example, in a partition that contains 10 rows having different values in the ordering columns, the third row is given a PERCENT_RANK value of 0.222 …, because you have covered 2/9 (22.222...%) of rows following the first row of the partition. PERCENT_RANK of a row is defined as one less than the RANK of the row divided by one less than the number of rows in the partition, as seen in the following example (where “ANT” stands for an approximate numeric type, such as REAL or DOUBLE PRECISION).
PERCENT_RANK() OVER ws
is equivalent to:
CASE WHEN COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) = 1 THEN CAST (0 AS ANT) ELSE ( CAST ( RANK () OVER ( ws ) AS ANT ) -1 / ( COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) - 1 ) END
See also PERCENT_RANK function [Analytical] in Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.
Ranking example 1 The SQL query that follows finds the male and female employees from Utah, and ranks them in descending order according to salary.
SELECT Surname, Salary, Sex, RANK() OVER (ORDER BYSalary DESC) AS RankFROM Employees WHERE State IN ('CA') AND DepartmentID =200ORDER BY Salary DESC;
The results from the above query:
Surname salary sex rank --------- -------- --- ---- Scott 96300.000 M 1 Lull 87900.000 M 2 Pastor 74500.000 F 3 Shishov 72995.000 F 4 Samuels 37400.000 M 19
Ranking example 2 Using the query from Ranking example 1, you can change the data by partitioning it by gender. The following example ranks employees in descending order by salary and partitions by gender:
SELECT Surname, Salary, Sex, RANK() OVER (PARTITIONBY Sex ORDER BY Salary DESC) AS RANKFROM Employees WHERE State IN ('CA', 'AZ') AND DepartmentIDIN (200, 300)ORDER BY Sex, Salary DESC;
The results from the above query:
Surname salary sex rank --------- --------- --- ---- Savarino 72300.000 F 1 Clark 45000.000 F 2 Overbey 39300.000 M 3
Ranking example 3 This example ranks a list of female employees in California and Texas in descending order according to salary. The PERCENT_RANK function provides the cumulative total in descending order.
SELECT Surname, Salary, Sex, CAST(PERCENT_RANK() OVER(ORDER BY Salary DESC) AS numeric (4, 2)) AS RANKFROM Employees WHERE State IN ('CA', 'TX') AND Sex ='F'ORDER BY Salary DESC;
The results from the above query:
Surname salary sex percent --------- --------- --- ---------- Savarino 72300.000 F 0.00 Smith 51411.000 F 0.33 Clark 45000.000 F 0.66 Garcia 39800.000 F 1.00
Ranking example 4 You can use the PERCENT_RANK function to find the top or bottom percentiles in the data set. This query returns male employees whose salary is in the top five percent of the data set.
SELECT * FROM (SELECT Surname, Salary, Sex,CAST(PERCENT_RANK() OVER (ORDER BY salary DESC) asnumeric (4, 2)) AS percentFROM Employees WHERE State IN ('CA') AND sex ='F' ) ASDT where percent > 0.5ORDER BY Salary DESC;
The results from the above query:
Surname salary sex percent --------- ---------- --- --------- Clark 45000.000 F 1.00