SUM function example

The following example shows the SUM function used as a window function. The query returns a result set that partitions the data by DepartmentID, and then provides a cumulative summary (Sum_Salary) of employees' salaries, starting with the employee who has been at the company the longest. The result set includes only those employees who reside in California, Utah, New York, or Arizona. The column Sum_Salary provides the cumulative total of employees' salaries.

SELECT DepartmentID, Surname, StartDate, Salary,
SUM( Salary ) OVER ( PARTITION BY DepartmentID
    ORDER BY StartDate
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) 
AS "Sum_Salary"
FROM Employees
WHERE State IN ( 'CA', 'UT', 'NY', 'AZ' ) 
   AND DepartmentID IN ( '100', '200' )
ORDER BY DepartmentID, StartDate;

The table that follows represents the result set from the query. The result set is partitioned by DepartmentID.

DepartmentID Surname StartDate Salary Sum_Salary
1 100 Whitney 1984-08-28 45700.00 45700.00
2 100 Cobb 1985-01-01 62000.00 107700.00
3 100 Shishov 1986-06-07 72995.00 180695.00
4 100 Driscoll 1986-07-01 48023.69 228718.69
5 100 Guevara 1986-10-14 42998.00 271716.69
6 100 Wang 1988-09-29 68400.00 340116.69
7 100 Soo 1990-07-31 39075.00 379191.69
8 100 Diaz 1990-08-19 54900.00 434091.69
9 200 Overbey 1987-02-19 39300.00 39300.00
10 200 Martel 1989-10-16 55700.00 95000.00
11 200 Savarino 1989-11-07 72300.00 167300.00
12 200 Clark 1990-07-21 45000.00 212300.00
13 200 Goggin 1990-08-05 37900.00 250200.00

For DepartmentID 100, the cumulative total of salaries from employees in California, Utah, New York, and Arizona is $434,091.69 and the cumulative total for employees in department 200 is $250,200.00.

For more information about the exact syntax of the SUM function, see SUM function [Aggregate].

Computing deltas between adjacent rows

Using two windows—one window over the current row, the other over the previous row—you can compute deltas, or changes, between adjacent rows. For example, the following query computes the delta (Delta) between the salary for one employee and the previous employee in the results:

SELECT EmployeeID AS EmployeeNumber, 
   Surname AS LastName,
   SUM( Salary ) OVER ( ORDER BY BirthDate 
      ROWS BETWEEN CURRENT ROW AND CURRENT ROW ) 
      AS CurrentRow,
   SUM( Salary ) OVER ( ORDER BY BirthDate
      ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING ) 
      AS PreviousRow,
   ( CurrentRow - PreviousRow ) AS Delta
FROM Employees
WHERE State IN ( 'NY' );
EmployeeNumber LastName CurrentRow PreviousRow Delta
1 913 Martel 55700.000 (NULL) (NULL)
2 1062 Blaikie 54900.000 55700.000 -800.000
3 249 Guevara 42998.000 54900.000 -11902.000
4 390 Davidson 57090.000 42998.000 14092.000
5 102 Whitney 45700.000 57090.000 -11390.000
6 1507 Wetherby 35745.000 45700.000 -9955.000
7 1751 Ahmed 34992.000 35745.000 -753.000
8 1157 Soo 39075.000 34992.000 4083.000

Note that SUM is performed only on the current row for the CurrentRow window because the window size was set to ROWS BETWEEN CURRENT ROW AND CURRENT ROW. Likewise, SUM is performed only over the previous row for the PreviousRow window, because the window size was set to ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING. Also, the value of PreviousRow is NULL in the first row since it has no predecessor, and hence the Delta value is NULL as well.

Complex analytics

Consider the following query, which lists the top salespeople (defined by total sales) for each product in the database:

SELECT s.ProductID AS Products, o.SalesRepresentative, 
     SUM( s.Quantity ) AS total_quantity,
     SUM( s.Quantity * p.UnitPrice ) AS total_sales
  FROM SalesOrders o KEY JOIN SalesOrderItems s 
    KEY JOIN Products p
  GROUP BY s.ProductID, o.SalesRepresentative
  HAVING total_sales = ( 
    SELECT First SUM( s2.Quantity * p2.UnitPrice ) 
        AS sum_sales
      FROM SalesOrders o2 KEY JOIN
        SalesOrderItems s2 KEY JOIN Products p2
      WHERE s2.ProductID = s.ProductID
      GROUP BY o2.SalesRepresentative
      ORDER BY sum_sales DESC )
  ORDER BY s.ProductID;
Graphical plan

This query returns the result:

Products SalesRepresentative total_quantity total_sales
1 300 299 660 5940.00
2 301 299 516 7224.00
3 302 299 336 4704.00
4 400 299 458 4122.00
5 401 902 360 3600.00
6 500 949 360 2520.00
7 501 690 360 2520.00
8 501 949 360 2520.00
9 600 299 612 14688.00
10 601 299 636 15264.00
11 700 299 1008 15120.00

The original query is formed using a correlated subquery that determines the highest sales for any particular product, as ProductID is the subquery's correlated outer reference. Using a nested query, however, is often an expensive option, as in this case. This is because the subquery involves not only a GROUP BY clause, but also an ORDER BY clause within the GROUP BY clause. This makes it impossible for the query optimizer to rewrite this nested query as a join while retaining the same semantics.

Graphical plan showing how expensive nested queries are to use

Consequently, during query execution the subquery is evaluated for each derived row computed in the outer block. Note the expensive Filter predicate in the graphical plan: the optimizer estimates that 99% of the query's execution cost is because of this plan operator. The plan for the subquery clearly illustrates why the filter operator in the main block is so expensive: the subquery involves two nested loops joins, a hashed GROUP BY operation, and a sort.

Rewriting using a ranking function

A rewrite of the same query, using a ranking function, computes the identical result much more efficiently:

SELECT v.ProductID, v.SalesRepresentative, 
  v.total_quantity, v.total_sales
  FROM ( SELECT o.SalesRepresentative, s.ProductID,
            SUM( s.Quantity ) AS total_quantity,
            SUM( s.Quantity * p.UnitPrice ) AS total_sales,
            RANK() OVER ( PARTITION BY s.ProductID 
              ORDER BY SUM( s.Quantity * p.UnitPrice ) DESC ) 
              AS sales_ranking
           FROM SalesOrders o KEY JOIN SalesOrderItems s KEY JOIN Products p
           GROUP BY o.SalesRepresentative, s.ProductID ) 
           AS v
  WHERE sales_ranking = 1
  ORDER BY v.ProductID;

This rewritten query results in a simpler plan:

Graphical plan showing a more efficient plan due to the ranking function

Recall that a window operator is computed after the processing of a GROUP BY clause and prior to the evaluation of the select list items and the query's ORDER BY clause. As seen in the graphical plan, after the join of the three tables, the joined rows are grouped by the combination of the SalesRepresentative and ProductID attributes. Consequently, the SUM aggregate functions of total_quantity and total_sales can be computed for each combination of SalesRepresentative and ProductID.

Following the evaluation of the GROUP BY clause, the RANK function is then computed to rank the rows in the intermediate result in descending sequence by total_sales, using a window. Note that the WINDOW specification involves a PARTITION BY clause. By doing so, the result of the GROUP BY clause is repartitioned (or regrouped)—this time by ProductID. Hence, the RANK function ranks the rows for each product—in descending order of total sales—but for all sales representatives that have sold that product. With this ranking, determining the top salespeople simply requires restricting the derived table's result to reject those rows where the rank is not 1. In the case of ties (rows 7 and 8 in the result set), RANK returns the same value. Consequently, both salespeople 690 and 949 appear in the final result.

See also