Ranking functions

Ranking functions let you compile a list of values from the data set in ranked order, as well as compose single-statement SQL queries that fulfil requests such as, “Name the top 10 products shipped this year by total sales,” or “Give the top 5% of salespersons who sold orders to at least 15 different companies.”

SQL/OLAP defines five functions that are categorized as ranking functions:

<RANK FUNCTION TYPE> ::=
  RANK | DENSE_RANK | PERCENT_RANK | ROW_NUMBER | NTILE

Ranking functions let you compute a rank value for each row in a result set based on the order specified in the query. For example, a sales manager might need to identify the top or bottom sales people in the company, the highest- or lowest-performing sales region, or the best- or worst-selling products. Ranking functions can provide this information.


RANK() function

The RANK function returns a number that indicates the rank of the current row among the rows in the row’s partition, as defined by the ORDER BY clause. The first row in a partition has a rank of 1, and the last rank in a partition containing 25 rows is 25. RANK is specified as a syntax transformation, which means that an implementation can choose to actually transform RANK into its equivalent, or it can merely return a result equivalent to the result that transformation would return.

In the following example, ws1 indicates the window specification that defines the window named w1.

RANK() OVER ws

is equivalent to:

( COUNT (*) OVER ( ws RANGE UNBOUNDED PRECEDING )
- COUNT (*) OVER ( ws RANGE CURRENT ROW ) + 1 )

The transformation of the RANK function uses logical aggregation (RANGE). As a result, two or more records that are tied—or have equal values in the ordering column—have the same rank.The next group in the partition that has a different value has a rank that is more than one greater than the rank of the tied rows. For example, if there are rows whose ordering column values are 10, 20, 20, 20, 30, the rank of the first row is 1 and the rank of the second row is 2. The rank of the third and fourth row is also 2, but the rank of the fifth row is 5. There are no rows whose rank is 3 or 4. This algorithm is sometimes known as sparse ranking.

See also “RANK function [Analytical],” Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.


DENSE_RANK() function

Although RANK returns duplicate values in the ranking sequence when there are ties between values, DENSE_RANK returns ranking values without gaps. The values for rows with ties are still equal, but the ranking of the rows represents the positions of the clusters of rows having equal values in the ordering column, rather than the positions of the individual rows. As in the RANK example, where rows ordering column values are 10, 20, 20, 20, 30, the rank of the first row is still 1 and the rank of the second row is still 2, as are the ranks of the third and fourth rows. The last row, however, is 3, not 5.

DENSE_RANK is computed through a syntax transformation, as well.

DENSE_RANK() OVER ws

is equivalent to:

COUNT ( DISTINCT ROW ( expr_1, . . ., expr_n ) )
 OVER ( ws RANGE UNBOUNDED PRECEDING )

In the above example, expr_1 through expr_n represent the list of value expressions in the sort specification list of window w1.

See also “DENSE_RANK function [Analytical],” Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.


PERCENT_RANK() function

The PERCENT_RANK function calculates a percentage for the rank, rather than a fractional amount, and returns a decimal value between 0 and 1. In other words, PERCENT_RANK returns the relative rank of a row, which is a number that indicates the relative position of the current row within the window partition in which it appears. For example, in a partition that contains 10 rows having different values in the ordering columns, the third row is given a PERCENT_RANK value of 0.222 …, because you have covered 2/9 (22.222...%) of rows following the first row of the partition. PERCENT_RANK of a row is defined as one less than the RANK of the row divided by one less than the number of rows in the partition, as seen in the following example (where “ANT” stands for an approximate numeric type, such as REAL or DOUBLE PRECISION).

PERCENT_RANK() OVER ws

is equivalent to:

CASE
  WHEN COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED
  PRECEDING AND UNBOUNDED FOLLOWING ) = 1
  THEN CAST (0 AS ANT)
  ELSE
   ( CAST ( RANK () OVER ( ws ) AS ANT ) -1 /
   ( COUNT (*) OVER ( ws RANGE BETWEEN UNBOUNDED
   PRECEDING AND UNBOUNDED FOLLOWING ) - 1 )
END

See also PERCENT_RANK function [Analytical] in Chapter 4, “SQL Functions,” in Reference: Building Blocks, Tables, and Procedures.


ROW_NUMBER() function

The ROW_NUMBER function returns a unique row number for each row. If you define window partitions, ROW_NUMBER starts the row numbering in each partition at 1, and increments each row by 1. If you do not specify a window partition, ROW_NUMBER numbers the complete result set from 1 to the total cardinality of the table.

The ROW_NUMBER function syntax is:

ROW_NUMBER() OVER ([PARTITION BY window partition] ORDER BY window ordering)

ROW_NUMBER does not require an argument, but you must specify the parentheses.

The PARTITION BY clause is optional. The OVER (ORDER_BY) clause cannot contain a window frame ROWS/RANGE specification. See “Window framing”.


Ranking examples

Ranking example 1 The SQL query that follows finds the male and female employees from California, and ranks them in descending order according to salary.

SELECT Surname, Sex, Salary, RANK() OVER ( 
ORDER BY Salary DESC) as RANK FROM Employees 
WHERE State IN ('CA') AND DepartmentID =200 
ORDER BY Salary DESC;

The results from the above query:

Surname      Sex        Salary    RANK
-------      ---        ------    ----
Savarino      F          72300.000  1
Clark         F          45000.000  2
Overbey       M          39300.000  3

Ranking example 2 Using the query from Ranking example 1, you can change the data by partitioning it by gender. The following example ranks employees in descending order by salary and partitions by gender:

SELECT Surname, Sex, Salary, RANK() OVER (PARTITION BY Sex 
ORDER BY Salary DESC) AS RANK FROM Employees 
WHERE State IN ('CA', 'AZ') AND DepartmentID IN (200, 300)
ORDER BY Sex, Salary DESC;

The results from the above query:

Surname           Sex      Salary     RANK
-------           ---      ---------  ----
Savarino           F        72300.000  1
Jordan             F        51432.000  2
Clark              F        45000.000  3
Coleman            M        42300.000  1
Overbey            M        39300.000  2

Ranking example 3 This example ranks a list of female employees in California and Texas in descending order according to salary. The PERCENT_RANK function provides the cumulative total in descending order.

SELECT Surname, Salary, Sex, CAST(PERCENT_RANK() OVER
(ORDER BY Salary DESC) AS numeric (4, 2)) AS RANK
FROM Employees WHERE State IN ('CA', 'TX') AND Sex ='F'
ORDER BY Salary DESC;

The results from the above query:

Surname         salary   sex           RANK
---------      ---------   ---   ----------
Savarino       72300.000   F           0.00 
Smith          51411.000   F           0.33
Clark          45000.000   F           0.66
Garcia         39800.000   F           1.00

Ranking example 4 You can use the PERCENT_RANK function to find the top or bottom percentiles in the data set. This query returns male employees whose salary is in the top five percent of the data set.

SELECT * FROM (SELECT Surname, Salary, Sex, 
CAST(PERCENT_RANK() OVER (ORDER BY salary DESC) as
numeric (4, 2)) AS percent
FROM Employees WHERE State IN ('CA') AND sex ='F' ) AS
DT where percent > 0.5
ORDER BY Salary DESC;

The results from the above query:

Surname          salary    sex     percent
---------      ----------    ---   ---------
Clark         			 	45000.000     F      1.00

Ranking example 5 This example uses the ROW_NUMBER function to return row numbers for each row in all window partitions. The query partitions the Employees table by department ID, and orders the rows in each partition by start date.

SELECT DepartmentID dID, StartDate, Salary ,
ROW_NUMBER()OVER(PARTITION BY dID ORDER BY StartDate)
FROM  Employees ORDER BY 1,2;

The results from the above query are:

dID        StartDate     Salary      Row_number()
========   ===========   ==========  =============
  100      1984-08-28    47500.000       1
  100      1985-01-01    62000.500       2
  100      1985-06-17    57490.000       3
  100      1986-06-07    72995.000       4
  100      1986-07-01    48023.690       5
  ...      ...           ...             ...
  200      1985-02-03    38500.000       1
  200      1985-12-06    54800.000       2
  200      1987-02-19    39300.000       3
  200      1987-07-10    49500.000       4
  ...      ...           ...             ...
  500      1994-02-27     24903.000      9