As Published In
Oracle Magazine
January/February 2013

TECHNOLOGY: SQL 101

  

Having Sums, Averages, and Other Grouped Data

By Melanie Caffrey

 

Part 9 in a series on the basics of the relational database and SQL

Part 8 in this series, “Selecting a Type That Is Right for You” (Oracle Magazine, November/December 2012), introduced common SQL date functions and showed how your queries can use them to modify the appearance of date result set data. It also introduced the SYSDATE function and date arithmetic and showed how they can be used to manipulate result set data to convey more-meaningful results. So far, all of the functions discussed in this series operate on single-row results. Aggregate functions (also called group functions) operate on multiple rows, enabling you to manipulate data so that it displays differently from how it is stored in the database. This article introduces you to some of the more commonly used SQL group functions, along with the GROUP BY and HAVING clauses.

Next Steps


 READ SQL 101, Parts 1–8  

READ more about
relational database design and concepts
 Oracle Database Concepts 11g Release 2 (11.2)
 Oracle Database SQL Language Reference 11g Release 1 (11.1)
 Oracle SQL Developer User’s Guide Release 3.1
 aggregate functions
 

 DOWNLOAD the sample script for this article

To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system from. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.

Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema that are required for this article’s examples. (View the script in a text editor for execution instructions.)

The Sum of All Rows

All aggregate functions group data to ultimately produce a single result value. Because aggregate functions operate on multiple row values, you can use them to generate summary data such as totals. For example, you can answer budget planning questions such as “What is the total allotted amount of annual salary paid to all employees”? The query in Listing 1 demonstrates use of the SUM aggregate function to answer this question. It adds together all the values in the EMPLOYEE table’s SALARY column, resulting in the total value 970000.

Code Listing 1: Display the sum of all salary values in the EMPLOYEE table 

SQL> set feedback on
SQL> select SUM(salary) from employee;

SUM(SALARY)
—————————————————
     970000

1 row selected.
 

When You Strive for Average

Another example of a business question you can answer by using an aggregate function is, “What is currently the average annual salary for all employees?” Like the query in Listing 1, the query in Listing 2 applies an aggregate function to the EMPLOYEE table’s SALARY column. The AVG function in Listing 2 sums up the salary values and then divides the total by the number of employee records with non-null salary values. With the total of 970000 paid annually divided by 10 employees, the average annual salary value is 97000.

Code Listing 2: Compute the average salary value across all non-null salary values 

SQL> select AVG(salary) from employee;

AVG(SALARY)
—————————————————
      97000

1 row selected.
 

The EMPLOYEE table holds 11 records, but the average-salary computation in Listing 2 considers only 10 records. This happens because the AVG aggregate function ignores null values. (In the EMPLOYEE table, the null salary value is for the employee Lori Dovichi.) To substitute a non-null value for any null values, you can nest a NVL function call (introduced in Part 7 of this series) inside the call to the AVG function, as demonstrated in Listing 3. The average salary value returned in Listing 3 is lower than the value returned in Listing 2, because the null salary value for Lori Dovichi has been replaced with a 0 and evaluated along with all other non-null salary values. Substitute non-null values for null values only when it makes sense to do so from a business perspective.

Code Listing 3: Substitute a non-null value for any null values 

SQL> select AVG(NVL(salary, 0)) avg_salary
  2    from employee;

AVG_SALARY
———————————————
88181.8182

1 row selected.
 

Keeping Count

As you know from previous articles in this series, the SQL*Plus set feedback on command displays a count of records that satisfy your query criteria. This method works well when the database quickly returns a small number of records that can display easily on your screen. But it’s unwieldy when you evaluate hundreds, thousands, or millions of records, because you must wait for all the records in the result set to be fetched from the database and returned to your client. A more efficient alternative is to use the COUNT aggregate function, demonstrated in Listing 4.

Code Listing 4: Obtain a count of all employees by using COUNT(*) 

SQL> select COUNT(*)
  2    from employee;

  COUNT(*)
———————————————
        11

1 row selected.
 

The COUNT aggregate function counts the number of records that satisfy the query condition. The query in Listing 4 uses COUNT(*)—which returns the count of all rows that satisfy the query condition—to obtain a count of all the records in the EMPLOYEE table. COUNT(*) does not ignore null values, whereas COUNT with a column input does. By comparing Listings 4 and 5, you can see that the returned value is the same whether you count all columns in the EMPLOYEE table with COUNT(*) or count just the primary key column with COUNT(employee_id).

Code Listing 5: Obtain a count of all employees by applying COUNT to the primary key column

SQL> select COUNT(employee_id)
  2    from employee;

COUNT(EMPLOYEE_ID)
———————————————————————————
                11

1 row selected.


However, contrast a call to COUNT(*) or COUNT(employee_id) with a call to COUNT(manager), demonstrated in Listing 6. In the EMPLOYEE table, 5 of the 11 records have no value for the MANAGER column, so they are not included in the single count value that is returned.

Code Listing 6: Apply COUNT to a column that contains null values 

SQL> select COUNT(manager)
  2    from employee;

COUNT(MANAGER)
—————————————————————
             7

1 row selected.

 
The query in Listing 7 demonstrates that a call to COUNT(*) or COUNT(column_name) returns a result value of 0 if no rows match the query condition. The query requests a count of all rows and a count of all manager values for all employee records with a hire date matching the current day’s system date, SYSDATE. Because no one was hired on the date the query was run, a count value of 0 is returned.

Code Listing 7: COUNT(*) and COUNT(column_name) both return 0 when no rows match 

SQL> select COUNT(*), COUNT(manager)
  2    from employee
  3   where hire_date > TRUNC(SYSDATE);

  COUNT(*)  COUNT(MANAGER)
——————————— ——————————————
         0               0

1 row selected.

 
If your goal is to determine a count of distinct values, you can combine the COUNT aggregate function with the DISTINCT keyword. Listing 8 shows a query that determines the count of DISTINCT or UNIQUE (a keyword you can use instead of DISTINCT) MANAGER column values in the EMPLOYEE table. A count of 3 is returned. The null value that exists in the MANAGER column for several employees is not included in the count of distinct manager values by the call to the COUNT aggregate function.

Code Listing 8: COUNT and DISTINCT obtain a count of distinct values 

SQL> select COUNT(DISTINCT manager) num_distinct_managers
  2    from employee;

NUM_DISTINCT_MANAGERS
—————————————————————
                    3

1 row selected.
 

Maximizing and Minimizing

You can certainly locate the maximum and minimum values within a set of row values you’ve fetched with a well-ordered SQL statement. But if your result set is voluminous and all you want is the maximum or the minimum result, you don’t want to scroll to the top or the bottom of the result set to see it. You can use the MIN and MAX aggregate functions instead. The query in Listing 9 uses them to display the EMPLOYEE table’s maximum and minimum salary values.

Code Listing 9: MAX and MIN obtain maximum and minimum column values 

SQL> select MAX(salary), MIN(salary)
  2    from employee;

MAX(SALARY)  MIN(SALARY)
———————————  ———————————
     300000        60000

1 row selected.
 

One by One and Group by Group

So far the examples in this article have discussed aggregate functions working on all rows for a particular aggregation criterion. But you might want to do further categorizations and aggregations within your data. The GROUP BY clause enables you to collect data across multiple records and group the results by one or more columns. Aggregate functions and GROUP BY clauses are used in tandem to determine and return an aggregate value for every group. For example, the query in Listing 10 obtains a count of employees in each department.

In Listing 10, note that one employee has not been assigned to a department in the EMPLOYEE table and that person is included as a group in the results. Note also that the query uses an ORDER BY clause. Although the GROUP BY clause groups data, it does not sort the results in any particular order. The query in Listing 11 shows the query from Listing 10 without the ORDER BY clause.

Code Listing 10: GROUP BY creates grouped categorizations 

SQL> select COUNT(employee_id), department_id
  2    from employee
  3  GROUP BY department_id
  4  ORDER BY department_id;

COUNT(EMPLOYEE_ID)  DEPARTMENT_ID
——————————————————  —————————————
                 6             10
                 2             20
                 2             30
                 1 

4 rows selected.
 

Code Listing 11: No ORDER BY clause with GROUP BY clause 

SQL> select COUNT(employee_id), department_id
  2    from employee
  3  GROUP BY department_id;

COUNT(EMPLOYEE_ID)  DEPARTMENT_ID
——————————————————  —————————————
                 2             30
                 1 
                 2             20
                 6             10

4 rows selected.

 
When a GROUP BY clause is followed by an ORDER BY clause, the columns listed in the ORDER BY clause must also be included in the SELECT list. The query in Listing 12 demonstrates the error that will occur if the SELECT list column list and the ORDER BY column list do not match. Similarly, an error will occur if you do not use GROUP BY for every column in the SELECT list that is not part of an aggregation operation, as shown by the query in Listing 13. This query is the same as the query in Listing 12, minus the GROUP BY clause.

Code Listing 12: Error when ORDER BY clause column list doesn’t also appear in the SELECT list 

SQL> select COUNT(employee_id), department_id
  2    from employee
  3  GROUP BY department_id
  4  ORDER BY hire_date DESC;
ORDER BY hire_date DESC
         *
ERROR at line 4:
ORA-00979: not a GROUP BY expression
 

Code Listing 13: Error when GROUP BY does not list required column 

SQL> select COUNT(employee_id), department_id
  2    from employee
  3  ORDER BY department_id;
select COUNT(employee_id), department_id
                           *
ERROR at line 1:
ORA-00937: not a single-group group function

 
The GROUP BY clause is necessary if your intent is to return multiple groups. And your intent to return multiple groups is determined by the inclusion in the SELECT list of any column that is not part of an aggregation operation. In the query in Listing 13, that column is DEPARTMENT_ID in the EMPLOYEE table. A query that uses aggregate functions and no GROUP BY clause always returns exactly one row, even if the table you query contains no rows at the time you query it.

Having the Last Word

Just as a SELECT list can use a WHERE clause to filter the result set to include only records that meet certain criteria, the GROUP BY clause can use a similar clause to filter groups. The HAVING clause works with the GROUP BY clause to limit the results to groups that meet the criteria you specify. Listing 14 expands on the query in Listing 10. Inclusion of the HAVING clause in this query eliminates any groups with fewer than two employees from the result set. As you can see, the group with no assigned department is not returned in Listing 14’s result set, because that group contains only one employee.

Code Listing 14: HAVING clause filters groups 

SQL> select COUNT(employee_id), department_id
  2    from employee
  3  GROUP BY department_id
  4  HAVING COUNT(employee_id) > 1
  5  ORDER BY department_id;

COUNT(EMPLOYEE_ID)  DEPARTMENT_ID
——————————————————  —————————————
                 6             10
                 2             20
                 2             30

3 rows selected.

 
The HAVING clause works primarily on aggregate function columns, whereas the WHERE clause works on columns and other expressions without an aggregation operation. I say primarily because the HAVING clause can use multiple operators in its filtering operation. For example, the query in Listing 15 displays a count of employees-per-department-and-salary groups that satisfy one of two criteria, only the first of which uses an aggregate function:

  • The department has two or more employees.

  • The salaries of the employees in the department are less than 100000. 

Code Listing 15: HAVING clause can use multiple operators 

SQL> select COUNT(employee_id), department_id, salary
  2    from employee
  3  GROUP BY department_id, salary
  4  HAVING (COUNT(employee_id) > 1
  5             OR salary < 100000)
  6  ORDER BY department_id, salary desc;

COUNT(EMPLOYEE_ID)  DEPARTMENT_ID      SALARY
——————————————————  —————————————  ——————————
                 1             10       80000
                 1             10       70000
                 2             10       60000
                 1             20       90000
                 1             20       65000
                 1             30       70000
                 1                      75000

7 rows selected.
 

Odds and Ends

Although every column included in the SELECT list must also be listed in a GROUP BY clause, this restriction doesn’t apply to number and string literals, constant expressions (expressions that do not use column values), and functions such as SYSDATE. Listing 16 shows a query that, for demonstration purposes, expands on the query in Listing 15. It includes a literal, a constant expression, and the SYSDATE function in the SELECT list, but it doesn’t need to list these items in the GROUP BY or ORDER BY clause.

Code Listing 16: Literals, expressions, and functions not listed in GROUP BY or ORDER BY 

SQL> select COUNT(employee_id), department_id, salary, 
  2  SYSDATE, ‘String Literal’, 42*37 Expression
  3    from employee
  4  GROUP BY department_id, salary
  5  HAVING (COUNT(employee_id) > 1
  6             OR salary < 100000)
  7  ORDER BY department_id, salary desc;

COUNT(EMPLOYEE_ID) DEPARTMENT_ID  SALARY SYSDATE   ‘STRINGLITERAL' EXPRESSION
—————————————————— —————————————  —————— —————————  —————————————— ——————————
                 1            10   80000 29-SEP-12  String Literal       1554
                 1            10   70000 29-SEP-12  String Literal       1554
                 2            10   60000 29-SEP-12  String Literal       1554
                 1            20   90000 29-SEP-12  String Literal       1554
                 1            20   65000 29-SEP-12  String Literal       1554
                 1            30   70000 29-SEP-12  String Literal       1554
                 1                 75000 29-SEP-12  String Literal       1554

7 rows selected.
 

As with the other types of functions you’ve learned about in this article series, you can nest aggregate functions inside one another. The query in Listing 17 obtains a sum of all salaries per department. Then it applies the MIN aggregate function to each department salary summary value to obtain the lowest department salary summary value. This query also demonstrates that you don’t have to list every column in the SELECT list that you list in the GROUP BY clause, even if it’s mandatory to do the reverse.

Code Listing 17: Nested aggregate functions 

SQL> select MIN(SUM(salary)) min_department_salary_sum
  2    from employee
  3   where department_id is not null
  4  GROUP by department_id;

MIN_DEPARTMENT_SALARY_SUM
———————————————————————————
                   155000

1 row selected.
 

Conclusion and Anticipation

This article has shown you a few of the most common aggregate functions and how you can use them to manipulate how your data is displayed. You’ve seen how to use single-group aggregate functions such as MAX, MIN, and AVG as well as multigroup functions such as COUNT and SUM. You now know how these functions operate when null values are present in your data and how such operations can affect your results. You’ve been introduced to the GROUP BY and HAVING clauses and have been shown how these clauses can help you further filter and categorize your summary data. Last, but not least, you know what pitfalls to look for when using an ORDER BY clause with a GROUP BY clause and that column values listed in the SELECT list must also appear in any GROUP BY clause. By no means does this article provide an exhaustive list of the Oracle Database aggregate functions. Review the documentation for more details: bit.ly/WxKZFu.The next installment of SQL 101 will discuss analytic functions. 


Melanie Caffrey Headshot

Melanie Caffrey
is a senior development manager at Oracle. She is a coauthor of Expert PL/SQL Practices for Oracle Developers and DBAs (Apress, 2011) and Expert Oracle Practices: Oracle Database Administration from the Oak Table (Apress, 2010).

Send us your comments