group by and having clauses

Description

Used in select statements to divide a table into groups and to return only groups that match conditions in the having clause. group by is typically used in conjunction with aggregates to specify how to group the unaggregated columns of a select query. having clauses are applied to these groups.

Syntax

Start of select statement
[group by [all] aggregate_free_expression
	[, aggregate_free_expression]...]
[having search_conditions]
End of select statement

Parameters

group by

specifies the groups into which the table is divided, and if aggregate functions are included in the select list, finds a summary value for each group. These summary values appear as columns in the results, one for each group. You can refer to these summary columns in the having clause.

You can use the avg, count, count_big, max, min, and sum aggregate functions in the select list before group by (the expression is usually a column name). See Chapter 2, “Transact-SQL Functions” of Reference Manual: Building Blocks.

A table can be grouped by any combination of columns—that is, groups can be nested within each other, as in Example 2.

all

is a Transact-SQL extension that includes all groups in the results, even those excluded by a where clause. For example:

select type, avg (price) 
from titles 
where advance > 7000 
group by all type

type 
-----------------  ---------- 
UNDECIDED                NULL
business                 2.99
mod_cook                 2.99
popular_comp            20.00
psychology               NULL
trad_cook               14.99

 (6 rows affected)

“NULL” in the aggregate column indicates groups that would be excluded by the where clause. A having clause negates the meaning of all.

aggregate_free_expression

is an expression that includes no aggregates. A Transact-SQL extension allows grouping by an aggregate-free expression as well as by a column name.

You cannot group by column heading or alias. This example is correct:

select Price=avg (price), Pay=avg (advance), 
Total=price * $1.15 
from titles 
group by price * $1.15
having

sets conditions for the group by clause, similar to the way in which where sets conditions for the select clause.

having search conditions can include aggregate expressions; otherwise, having search conditions are identical to where search conditions. This is an example of a having clause with aggregates:

select pub_id, total = sum (total_sales) 
from titles 
where total_sales is not null 
group by pub_id 
having count (*)>5

When Adaptive Server optimizes queries, it evaluates the search conditions in where and having clauses, and determines which conditions are search arguments (SARGs) that can be used to choose the best indexes and query plan. All of the search conditions are used to qualify the rows. For more information on search arguments, see the Performance and Tuning Guide: Optimizer and Abstract Plans.

Examples

Example 1

Calculates the average advance and the sum of the sales for each type of book:

select type, avg (advance), sum (total_sales) 
from titles 
group by type

Example 2

Groups the results by type, then by pub_id within each type:

select type, pub_id, avg (advance), sum (total_sales) 
from titles 
group by type, pub_id

Example 3

Calculates results for all groups, but displays only groups whose type begins with “p”:

select type, avg (price) 
from titles 
group by type 
having type like 'p%'

Example 4

Calculates results for all groups, but displays results for groups matching the multiple conditions in the having clause:

select pub_id, sum (advance), avg (price) 
from titles 
group by pub_id 
having sum (advance) > $15000 
and avg (price) < $10 
and pub_id > "0700"

Example 5

Calculates the total sales for each group (publisher) after joining the titles and publishers tables:

select p.pub_id, sum (t.total_sales)
from publishers p, titles t
where p.pub_id = t.pub_id
group by p.pub_id

Example 6

Displays the titles that have an advance of more than $1000 and a price that is more than the average price of all titles:

select title_id, advance, price
from titles
where advance > 1000
having price > avg (price)

Usage


How group by works with the optimizer

In Adaptive Server version 15.0, there are two possible algorithms (implemened as operators) for doing group by: GroupHashing and GroupSorted. The optimizer chooses which operator to use based on factors such as the requirements these operators place on the input data streams.

The GroupSorted operator requires that the input rows to be aggregated are already sorted on the group by columns. Since the input rows must be sorted, the optimizer uses either of the following:

The optimizer uses the GroupHashing operator if ordering on the group by columns is not available or the row size limitations of the GroupSorted operator are exceeded. The GroupHashing operator applies a hashing function to the values of the group by columns to be able to put rows with the same group by column values into the same hash bucket. Once the input rows have all been hashed into buckets, the rows in the buckets are aggregated to generate the group by results. The only limitation of the GroupHashing operator is that the total row size of group by columns and aggregate results cannot be larger than 64K. There is no limitation on the number of group by columns or the number of aggregation operations, just the total row width.


How group by and having queries with aggregates work

The where clause excludes rows that do not meet its search conditions; its function remains the same for grouped or nongrouped queries.

The group by clause collects the remaining rows into one group for each unique value in the group by expression. Omitting group by creates a single group for the whole table.

Aggregate functions specified in the select list calculate summary values for each group. For scalar aggregates, there is only one value for the table. Vector aggregates calculate values for the distinct groups.

The having clause excludes groups from the results that do not meet its search conditions. Even though the having clause tests only rows, the presence or absence of a group by clause may make it appear to be operating on groups:


Standard group by and having queries

All group by and having queries in the Examples section adhere to the SQL standard, which dictates that queries using group by, having, and vector aggregate functions produce one row and one summary value per group, using these guidelines:

In nongrouped queries, the principle that “where excludes rows” seems straightforward. In grouped queries, the principle expands to “where excludes rows before group by, and having excludes rows from the display of results.”

The SQL standard allows queries that join two or more tables to use group by and having, if they also adhere to the above guidelines. When specifying joins or other complex queries, use the standard syntax of group by and having until you fully comprehend the effect of the Transact-SQL extensions to both clauses.

To help you avoid problems with extensions, Adaptive Server provides the fipsflagger option to the set command that issues a nonfatal warning for each occurrence of a Transact-SQL extension in a query. See set for more information.


Transact-SQL extensions to group by and having

Transact-SQL extensions to standard SQL make displaying data more flexible, by allowing references to columns and expressions that are not used for creating groups or summary calculations:

When the Transact-SQL extensions add rows and columns to a display, or if group by is omitted, query results may be hard to interpret. The examples that follow can help you understand how Transact-SQL extensions can affect query results.

The following examples illustrate the differences between queries that use standard group by and having clauses and queries that use the Transact-SQL extensions:

  1. An example of a standard grouping query:

    select type, avg (price) 
    from titles 
    group by type
    
    type                
    ----------------------  ---------- 
    UNDECIDED                 NULL 
    business                 13.73 
    mod_cook                 11.49 
    popular_comp             21.48 
    psychology               13.50 
    trad_cook                15.96 
    
     (6 rows affected)
    
  2. The Transact-SQL extended column, price (in the select list, but not an aggregate and not in the group by clause), causes all qualified rows to display in each qualified group, even though a standard group by clause produces a single row per group. The group by still affects the vector aggregate, which computes the average price per group displayed on each row of each group (they are the same values that were computed for example a):

    select type, price, avg (price) 
    from titles 
    group by type
    
    type           price
    ------------ ---------------- --------------
    business       19.99            13.73
    business       11.95            13.73
    business        2.99            13.73
    business       19.99            13.73
    mod_cook       19.99            11.49
    mod_cook        2.99            11.49
    UNDECIDED       NULL             NULL
    popular_comp   22.95            21.48
    popular_comp   20.00            21.48
    popular_comp    NULL            21.48
    psychology     21.59            13.50
    psychology     10.95            13.50
    psychology      7.00            13.50
    psychology     19.99            13.50
    psychology      7.99            13.50
    trad_cook      20.95            15.96
    trad_cook      11.95            15.96
    trad_cook      14.99            15.96
    
     (18 rows affected)
    
  3. The way Transact-SQL extended columns are handled can make it look as if a query is ignoring a where clause. This query computes the average prices using only those rows that satisfy the where clause, but it also displays rows that do not match the where clause.

    Adaptive Server first builds a worktable containing only the type and aggregate values using the where clause. This worktable is joined back to the titles table in the grouping column type to include the price column in the results, but the where clause is not used in the join.

    The only row in titles that is not in the results is the lone row with type = “UNDECIDED” and a NULL price, that is, a row for which there were no results in the worktable. If you also want to eliminate the rows from the displayed results that have prices of less than $10.00, you must add a having clause that repeats the where clause, as shown in Example 4:

    select type, price, avg (price) 
    from titles 
    where price > 10.00 
    group by type 
    
    type           price
    ------------ ---------------- --------------
    business        19.99            17.31
    business        11.95            17.31
    business         2.99            17.31
    business        19.99            17.31
    mod_cook        19.99            19.99
    mod_cook         2.99            19.99
    popular_comp    22.95            21.48
    popular_comp    20.00            21.48
    popular_comp     NULL            21.48
    psychology      21.59            17.51
    psychology      10.95            17.51
    psychology       7.00            17.51
    psychology      19.99            17.51
    psychology       7.99            17.51
    trad_cook       20.95            15.96
    trad_cook       11.95            15.96
    trad_cook       14.99            15.96
    
     (17 rows affected)
    
  4. If you are specifying additional conditions, such as aggregates, in the having clause, also include all conditions specified in the where clause. Adaptive Server appears to ignore any where clause conditions that are missing from the having clause:

    select type, price, avg (price)
    from titles
    where price > 10.00
    group by type
    having price > 10.00
    
    type           price
    ----------- ----------------  --------------
    business       19.99           17.31
    business       11.95           17.31
    business       19.99           17.31
    mod_cook       19.99           19.99
    popular_comp   22.95           21.48
    popular_comp   20.00           21.48
    psychology     21.59           17.51
    psychology     10.95           17.51
    psychology     19.99           17.51
    trad_cook      20.95           15.96
    trad_cook      11.95           15.96
    trad_cook      14.99           15.96 
    
     (12 rows affected)
    
  5. This is an example of a standard grouping query using a join between two tables. It groups by pub_id, then by type within each publisher ID, to calculate the vector aggregate for each row:

    select p.pub_id, t.type, sum (t.total_sales)
    from publishers p, titles t
    where p.pub_id = t.pub_id
    group by p.pub_id, t.type
    
    pub_id  type
    ------  ------------  --------
    0736    business       18722
    0736    psychology      9564
    0877    UNDECIDED       NULL
    0877    mod_cook       24278
    0877    psychology       375
    0877    trad_cook      19566
    1389    business       12066
    1389    popular_comp   12875
    
     (8 rows affected)
    

    It may seem that it is only necessary to specify group by for the pub_id and type columns to produce the results, and add extended columns as follows:

    select p.pub_id, p.pub_name, t.type,
        sum (t.total_sales)
    from publishers p, titles t
    where p.pub_id = t.pub_id
    group by p.pub_id, t.type
    

    However, the results for the above query are much different from the results for the first query in this example. After joining the two tables to determine the vector aggregate in a worktable, Adaptive Server joins the worktable to the table (publishers) of the extended column for the final results. Each extended column from a different table invokes an additional join.

    As you can see, using the extended column extension in queries that join tables can easily produce results that are difficult to comprehend. In most cases, use the standard group by to join tables in your queries.

  6. This example uses the Transact-SQL extension to group by to include columns that are not in the select list. Both the pub_id and type columns are used to group the results for the vector aggregate. However, the final results do not include the type within each publisher. In this case, you may only want to know how many distinct title types are sold for each publisher:

    select p.pub_id, sum (t.total_sales)
    from publishers p, titles t
    where p.pub_id = t.pub_id
    group by p.pub_id, t.type
    
    pub_id
    ------  --------
    0736       18722
    0736        9564
    0877        NULL
    0877        24278
    0877          375
    0877        19566
    1389        12066
    1389        12875
    
     (8 rows affected)
    
  7. This example combines two Transact-SQL extension effects. First, it omits the group by clause while including an aggregate in the select list. Second, it includes an extended column. By omitting the group by clause:

    • The table becomes a single group. The scalar aggregate counts three qualified rows.

    • pub_id becomes a Transact-SQL extended column because it does not appear in a group by clause. No having clause is present, so all rows in the group are qualified to be displayed.

    select pub_id, count (pub_id) 
    from publishers
    
    pub_id 
    ---------- --------- 
    0736               3 
    0877               3 
    1389               3 
    
     (3 rows affected)
    
  8. The where clause excludes publishers with a pub_id of 1000 or more from the single group, so the scalar aggregate counts two qualified rows. The extended column pub_id displays all qualified rows from the publishers table:

    select pub_id, count (pub_id) 
    from publishers 
    where pub_id < "1000"
    
    pub_id 
    -------------- ----------- 
    0736                    2 
    0877                    2 
    1389                    2 
    
     (3 rows affected)
    
  9. This example illustrates an effect of a having clause used without a group by clause.

    • The table is considered a single group. No where clause excludes rows, so all the rows in the group (table) are qualified to be counted.

    • The rows in this single-group table are tested by the having clause.

    • These combined effects display the two qualified rows.

    select pub_id, count (pub_id) 
    from publishers 
    having pub_id < "1000"
    
    pub_id 
    -------------- --------- 
    0736                   3 
    0877                   3 
     (2 rows affected)
    
  10. This example uses the extension to having that allows columns or expressions not in the select list and not in the group by clause. It determines the average price for each title type, but it excludes those types that do not have more than $10,000 in total sales, even though the sum aggregate does not appear in the results:

    select type, avg (price)
    from titles
    group by type
    having sum (total_sales) > 10000
    
    type
    ------------  ----------
    business           13.73
    mod_cook           11.49
    popular_comp       21.48
    trad_cook          15.96
    
     (4 rows affected)
    

group by and having and sort orders

If your server has a case-insensitive sort order, group by ignores the case of the grouping columns. For example, given this data on a case-insensitive server:

select lname, amount 
from groupdemo
lname           amount 
---------- ------------------ 
Smith                10.00 
smith                 5.00 
SMITH                 7.00 
Levi                  9.00 
Lévi                 20.00

grouping by lname produces these results:

select lname, sum (amount)
from groupdemo

lname 
---------- ---------------- 
Levi                  9.00
Lévi                 20.00
Smith                22.00

The same query on a case- and accent-insensitive server produces these results:

 lname 
 ---------- ------------------ 
 Levi                  29.00 
 Smith                 22.00

Standards

ANSI SQL – Compliance level: Entry-level compliant.

The use of columns within the select list that are not in the group by list and have no aggregate functions is a Transact-SQL extension.

The use of the all keyword is a Transact-SQL extension.

See also

Commands compute clause, declare, select, where clause

Documentation Chapter 2, “Transact-SQL Functions” of Reference Manual: Building Blocks