TECHNOLOGY: SQL 101
Having Sums, Averages, and Other Grouped DataBy Melanie Caffrey
Part 9 in a series on the basics of the relational database and SQL
Part 8 in this series, “Selecting a Type That Is Right for You” (Oracle Magazine, November/December 2012), introduced common SQL date functions and showed how your queries can use them to modify the appearance of date result set data. It also introduced the SYSDATE function and date arithmetic and showed how they can be used to manipulate result set data to convey more-meaningful results. So far, all of the functions discussed in this series operate on single-row results. Aggregate functions (also called group functions) operate on multiple rows, enabling you to manipulate data so that it displays differently from how it is stored in the database. This article introduces you to some of the more commonly used SQL group functions, along with the GROUP BY and HAVING clauses.
To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system from. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.
Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema that are required for this article’s examples. (View the script in a text editor for execution instructions.)
The Sum of All Rows
All aggregate functions group data to ultimately produce a single result value. Because aggregate functions operate on multiple row values, you can use them to generate summary data such as totals. For example, you can answer budget planning questions such as “What is the total allotted amount of annual salary paid to all employees”? The query in Listing 1 demonstrates use of the SUM aggregate function to answer this question. It adds together all the values in the EMPLOYEE table’s SALARY column, resulting in the total value 970000.
Code Listing 1: Display the sum of all salary values in the EMPLOYEE table
SQL> set feedback on SQL> select SUM(salary) from employee; SUM(SALARY) ————————————————— 970000 1 row selected.
When You Strive for Average
Another example of a business question you can answer by using an aggregate function is, “What is currently the average annual salary for all employees?” Like the query in Listing 1, the query in Listing 2 applies an aggregate function to the EMPLOYEE table’s SALARY column. The AVG function in Listing 2 sums up the salary values and then divides the total by the number of employee records with non-null salary values. With the total of 970000 paid annually divided by 10 employees, the average annual salary value is 97000.
Code Listing 2: Compute the average salary value across all non-null salary values
SQL> select AVG(salary) from employee; AVG(SALARY) ————————————————— 97000 1 row selected.
The EMPLOYEE table holds 11 records, but the average-salary computation in Listing 2 considers only 10 records. This happens because the AVG aggregate function ignores null values. (In the EMPLOYEE table, the null salary value is for the employee Lori Dovichi.) To substitute a non-null value for any null values, you can nest a NVL function call (introduced in Part 7 of this series) inside the call to the AVG function, as demonstrated in Listing 3. The average salary value returned in Listing 3 is lower than the value returned in Listing 2, because the null salary value for Lori Dovichi has been replaced with a 0 and evaluated along with all other non-null salary values. Substitute non-null values for null values only when it makes sense to do so from a business perspective.
Code Listing 3: Substitute a non-null value for any null values
SQL> select AVG(NVL(salary, 0)) avg_salary 2 from employee; AVG_SALARY ——————————————— 88181.8182 1 row selected.
As you know from previous articles in this series, the SQL*Plus set feedback on command displays a count of records that satisfy your query criteria. This method works well when the database quickly returns a small number of records that can display easily on your screen. But it’s unwieldy when you evaluate hundreds, thousands, or millions of records, because you must wait for all the records in the result set to be fetched from the database and returned to your client. A more efficient alternative is to use the COUNT aggregate function, demonstrated in Listing 4.
Code Listing 4: Obtain a count of all employees by using COUNT(*)
SQL> select COUNT(*) 2 from employee; COUNT(*) ——————————————— 11 1 row selected.
The COUNT aggregate function counts the number of records that satisfy the query condition. The query in Listing 4 uses COUNT(*)—which returns the count of all rows that satisfy the query condition—to obtain a count of all the records in the EMPLOYEE table. COUNT(*) does not ignore null values, whereas COUNT with a column input does. By comparing Listings 4 and 5, you can see that the returned value is the same whether you count all columns in the EMPLOYEE table with COUNT(*) or count just the primary key column with COUNT(employee_id).Code Listing 5: Obtain a count of all employees by applying COUNT to the primary key column
SQL> select COUNT(employee_id) 2 from employee; COUNT(EMPLOYEE_ID) ——————————————————————————— 11 1 row selected.
Code Listing 6: Apply COUNT to a column that contains null values
SQL> select COUNT(manager) 2 from employee; COUNT(MANAGER) ————————————————————— 7 1 row selected.
Code Listing 7: COUNT(*) and COUNT(column_name) both return 0 when no rows match
SQL> select COUNT(*), COUNT(manager) 2 from employee 3 where hire_date > TRUNC(SYSDATE); COUNT(*) COUNT(MANAGER) ——————————— —————————————— 0 0 1 row selected.
Code Listing 8: COUNT and DISTINCT obtain a count of distinct values
SQL> select COUNT(DISTINCT manager) num_distinct_managers 2 from employee; NUM_DISTINCT_MANAGERS ————————————————————— 3 1 row selected.
Maximizing and Minimizing
You can certainly locate the maximum and minimum values within a set of row values you’ve fetched with a well-ordered SQL statement. But if your result set is voluminous and all you want is the maximum or the minimum result, you don’t want to scroll to the top or the bottom of the result set to see it. You can use the MIN and MAX aggregate functions instead. The query in Listing 9 uses them to display the EMPLOYEE table’s maximum and minimum salary values.
Code Listing 9: MAX and MIN obtain maximum and minimum column values
SQL> select MAX(salary), MIN(salary) 2 from employee; MAX(SALARY) MIN(SALARY) ——————————— ——————————— 300000 60000 1 row selected.
One by One and Group by Group
So far the examples in this article have discussed aggregate functions working on all rows for a particular aggregation criterion. But you might want to do further categorizations and aggregations within your data. The GROUP BY clause enables you to collect data across multiple records and group the results by one or more columns. Aggregate functions and GROUP BY clauses are used in tandem to determine and return an aggregate value for every group. For example, the query in Listing 10 obtains a count of employees in each department.
In Listing 10, note that one employee has not been assigned to a department in the EMPLOYEE table and that person is included as a group in the results. Note also that the query uses an ORDER BY clause. Although the GROUP BY clause groups data, it does not sort the results in any particular order. The query in Listing 11 shows the query from Listing 10 without the ORDER BY clause.
Code Listing 10: GROUP BY creates grouped categorizations
SQL> select COUNT(employee_id), department_id 2 from employee 3 GROUP BY department_id 4 ORDER BY department_id; COUNT(EMPLOYEE_ID) DEPARTMENT_ID —————————————————— ————————————— 6 10 2 20 2 30 1 4 rows selected.
Code Listing 11: No ORDER BY clause with GROUP BY clause
SQL> select COUNT(employee_id), department_id 2 from employee 3 GROUP BY department_id; COUNT(EMPLOYEE_ID) DEPARTMENT_ID —————————————————— ————————————— 2 30 1 2 20 6 10 4 rows selected.
Code Listing 12: Error when ORDER BY clause column list doesn’t also appear in the SELECT list
SQL> select COUNT(employee_id), department_id 2 from employee 3 GROUP BY department_id 4 ORDER BY hire_date DESC; ORDER BY hire_date DESC * ERROR at line 4: ORA-00979: not a GROUP BY expression
Code Listing 13: Error when GROUP BY does not list required column
SQL> select COUNT(employee_id), department_id 2 from employee 3 ORDER BY department_id; select COUNT(employee_id), department_id * ERROR at line 1: ORA-00937: not a single-group group function
Having the Last Word
Just as a SELECT list can use a WHERE clause to filter the result set to include only records that meet certain criteria, the GROUP BY clause can use a similar clause to filter groups. The HAVING clause works with the GROUP BY clause to limit the results to groups that meet the criteria you specify. Listing 14 expands on the query in Listing 10. Inclusion of the HAVING clause in this query eliminates any groups with fewer than two employees from the result set. As you can see, the group with no assigned department is not returned in Listing 14’s result set, because that group contains only one employee.
Code Listing 14: HAVING clause filters groups
SQL> select COUNT(employee_id), department_id 2 from employee 3 GROUP BY department_id 4 HAVING COUNT(employee_id) > 1 5 ORDER BY department_id; COUNT(EMPLOYEE_ID) DEPARTMENT_ID —————————————————— ————————————— 6 10 2 20 2 30 3 rows selected.
Code Listing 15: HAVING clause can use multiple operators
SQL> select COUNT(employee_id), department_id, salary 2 from employee 3 GROUP BY department_id, salary 4 HAVING (COUNT(employee_id) > 1 5 OR salary < 100000) 6 ORDER BY department_id, salary desc; COUNT(EMPLOYEE_ID) DEPARTMENT_ID SALARY —————————————————— ————————————— —————————— 1 10 80000 1 10 70000 2 10 60000 1 20 90000 1 20 65000 1 30 70000 1 75000 7 rows selected.
Odds and Ends
Although every column included in the SELECT list must also be listed in a GROUP BY clause, this restriction doesn’t apply to number and string literals, constant expressions (expressions that do not use column values), and functions such as SYSDATE. Listing 16 shows a query that, for demonstration purposes, expands on the query in Listing 15. It includes a literal, a constant expression, and the SYSDATE function in the SELECT list, but it doesn’t need to list these items in the GROUP BY or ORDER BY clause.
Code Listing 16: Literals, expressions, and functions not listed in GROUP BY or ORDER BY
SQL> select COUNT(employee_id), department_id, salary, 2 SYSDATE, ‘String Literal’, 42*37 Expression 3 from employee 4 GROUP BY department_id, salary 5 HAVING (COUNT(employee_id) > 1 6 OR salary < 100000) 7 ORDER BY department_id, salary desc; COUNT(EMPLOYEE_ID) DEPARTMENT_ID SALARY SYSDATE ‘STRINGLITERAL' EXPRESSION —————————————————— ————————————— —————— ————————— —————————————— —————————— 1 10 80000 29-SEP-12 String Literal 1554 1 10 70000 29-SEP-12 String Literal 1554 2 10 60000 29-SEP-12 String Literal 1554 1 20 90000 29-SEP-12 String Literal 1554 1 20 65000 29-SEP-12 String Literal 1554 1 30 70000 29-SEP-12 String Literal 1554 1 75000 29-SEP-12 String Literal 1554 7 rows selected.
As with the other types of functions you’ve learned about in this article series, you can nest aggregate functions inside one another. The query in Listing 17 obtains a sum of all salaries per department. Then it applies the MIN aggregate function to each department salary summary value to obtain the lowest department salary summary value. This query also demonstrates that you don’t have to list every column in the SELECT list that you list in the GROUP BY clause, even if it’s mandatory to do the reverse.
Code Listing 17: Nested aggregate functions
SQL> select MIN(SUM(salary)) min_department_salary_sum 2 from employee 3 where department_id is not null 4 GROUP by department_id; MIN_DEPARTMENT_SALARY_SUM ——————————————————————————— 155000 1 row selected.
Conclusion and Anticipation
This article has shown you a few of the most common aggregate functions and how you can use them to manipulate how your data is displayed. You’ve seen how to use single-group aggregate functions such as MAX, MIN, and AVG as well as multigroup functions such as COUNT and SUM. You now know how these functions operate when null values are present in your data and how such operations can affect your results. You’ve been introduced to the GROUP BY and HAVING clauses and have been shown how these clauses can help you further filter and categorize your summary data. Last, but not least, you know what pitfalls to look for when using an ORDER BY clause with a GROUP BY clause and that column values listed in the SELECT list must also appear in any GROUP BY clause. By no means does this article provide an exhaustive list of the Oracle Database aggregate functions. Review the documentation for more details: bit.ly/WxKZFu.The next installment of SQL 101 will discuss analytic functions.
Melanie Caffrey is a senior development manager at Oracle. She is a coauthor of Expert PL/SQL Practices for Oracle Developers and DBAs (Apress, 2011) and Expert Oracle Practices: Oracle Database Administration from the Oak Table (Apress, 2010).
Send us your comments