TECHNOLOGY: SQL 101
Pivotal Access to Your Data: Analytic Functions, ConcludedBy Melanie Caffrey
Part 12 in a series on the basics of the relational database and SQL
Part 11 in this series, “Leading Ranks and Lagging Percentages: Analytic Functions, Continued” (Oracle Magazine, May/June 2013), continued the discussion of analytic functions that began in Part 10. It demonstrated analytic functions that enable you to obtain results for top-N queries, evaluate data comparisons, and calculate percentages within a group, among other actions. This article wraps up the series’ coverage of analytic functions by showing
To try out the examples in this series, you need access to an Oracle Database instance. If necessary, download and install an Oracle Database edition for your operating system. I recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle Database software, choose the installation option that enables you to create and configure a database. A new database, including sample user accounts and their associated schemas, will be created for you. (Note that SQL_101 is the user account to use for the examples in this series; it’s also the schema in which you’ll create database tables and other objects.) When the installation process prompts you to specify schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of them.
Finally—whether you installed the database software from scratch or have access to an existing Oracle Database instance—download, unzip, and execute the SQL script to create the tables for the SQL_101 schema that are required for this article’s examples. (View the script in a text editor for execution instructions.)
Turning Your Data on Its Side
A common business reporting requirement is that data in a column be displayed horizontally rather than vertically, for better readability. For example, compare the result set in Listing 1 to the one in Listing 2. The query in Listing 1 lists all employees alongside their respective department IDs, sorted by department and employee name. The data returned in Listing 2 is the same as that returned in Listing 1, but it is displayed differently.
Code Listing 1: Obtaining a list of employees, sorted by department and employee name
SQL> set feedback on SQL> set lines 32000 SQL> select department_id, last_name, first_name 2 from employee 3 order by department_id, last_name, first_name; DEPARTMENT_ID LAST_NAME FIRST_NAME —————————————— ————————————— ——————————— 10 Dovichi Lori 10 Eckhardt Emily 10 Friedli Roger 10 James Betsy 10 Michaels Matthew 10 Newton Donald 20 leblanc mark 20 peterson michael 30 Jeffrey Thomas 30 Wong Theresa Newton Frances
Code Listing 2: Employees, sorted by department and name, displaying one row per department
SQL> select department_id, 2 LISTAGG(first_name||' '||last_name, ', ') 3 WITHIN GROUP 4 (order by last_name, first_name) employees 5 from employee 6 group by department_id 7 order by department_id; DEPARTMENT_ID EMPLOYEES ————————————— ————————————————————————————————————————————————————————— 10 Lori Dovichi, Emily Eckhardt, Roger Friedli, Betsy James, Matthew Michaels, Donald Newton 20 mark leblanc, michael peterson 30 Thomas Jeffrey, Theresa Wong Frances Newton 4 rows selected.
Listing 2 uses the LISTAGG function (introduced in Oracle Database 11g) to construct a comma-delimited list of employees per department, thereby pivoting the more traditionally displayed result set in Listing 1. You can use LISTAGG as a single-group aggregate function, a multigroup aggregate function, or an analytic function.
When LISTAGG is invoked as a single-group aggregate function, it operates on all rows that satisfy any WHERE clause condition and—like all other single-group aggregate functions—returns a single output row. The example in Listing 2 demonstrates the use of LISTAGG as a multigroup aggregate function, returning a row for each group defined by the GROUP BY clause.
The syntax for the LISTAGG function is
LISTAGG ( column | expression, delimiter ) WITHIN GROUP (ORDER BY column | expression)
LISTAGG performs as an analytic function if you add an OVER clause:
OVER (PARTITION BY column | expression)
The column or expression to be aggregated, the WITHIN GROUP keywords, and the ORDER BY clause that immediately follows the WITHIN GROUP keywords (that is, the sort that takes place within the grouping) are mandatory in all three LISTAGG use cases.
The query in Listing 3 uses LISTAGG as an analytic function. It obtains a list of salaries, from highest to lowest, per department. Alongside each salary value is the name, in last-name/first-name alphabetical order, of the employee who earns that salary value. In addition, every employee for the current row’s listed department is returned in salary order, from highest to lowest, and in last-name/first-name alphabetical order.
Code Listing 3: Invoking the LISTAGG function as an analytic function
SQL> select department_id, salary, first_name||’ ‘||last_name earned_by, 2 listagg(first_name||’ ‘||last_name, ‘, ‘) 3 within group 4 (order by salary desc nulls last, last_name, first_name) 5 over (partition by department_id) employees 6 from employee 7 order by department_id, salary desc nulls last, last_name, first_name; DEPARTMENT_ID SALARY EARNED_BY EMPLOYEES ————————————— —————— —————————————— ——————————————————————————————— 10 100000 Emily Eckhardt Emily Eckhardt, Donald Newton, Matthew Michaels, Roger Friedli, Betsy James, Lori Dovichi 10 70000 Matthew Michaels Emily Eckhardt, Donald Newton, Matthew Michaels, Roger Friedli, Betsy James, Lori Dovichi 10 60000 Roger Friedli Emily Eckhardt, Donald Newton, Matthew Michaels, Roger Friedli, Betsy James, Lori Dovichi 10 60000 Betsy James Emily Eckhardt, Donald Newton, Matthew Michaels, Roger Friedli, Betsy James, Lori Dovichi 10 Lori Dovichi Emily Eckhardt, Donald Newton, Matthew Michaels, Roger Friedli, Betsy James, Lori Dovichi 20 90000 michael peterson michael peterson, mark leblanc 20 65000 mark leblanc michael peterson, mark leblanc 30 300000 Thomas Jeffrey Thomas Jeffrey, Theresa Wong 30 70000 Theresa Wong Thomas Jeffrey, Theresa Wong 75000 Frances Newton Frances Newton 11 rows selected.
Twisting and Turning into Fewer and Wider
A PIVOT clause enables you to turn rows into columns and present your data in a cross-tabular format. The syntax of the PIVOT clause is
SELECT … FROM … PIVOT ( aggregate- function column | expression ) FOR column | expression to be pivoted IN (value1, … valueN) ) AS alias
Compare the result sets in Listings 4 and 5. The query in Listing 4 displays a summary of each department’s total employee salary amount in a cross-tabular report. The query in Listing 5 returns the same departmental salary summaries as those in Listing 4 but in columnar format, which your users might consider less readable.
Code Listing 4: Using the PIVOT function to obtain cross-tabular results
SQL> select * 2 from (select department_id, salary 3 from employee) total_department_sals 4 PIVOT (SUM(salary) 5 FOR department_id IN (10 AS Accounting, 20 AS Payroll, 30 AS IT, 6 NULL AS Unassigned_Department)); ACCOUNTING PAYROLL IT UNASSIGNED_DEPARTMENT —————————— ——————— —————————— ————————————————————— 370000 155000 370000 75000 1 row selected.
Code Listing 5: Traditional columnar display of summarized salaries, grouped by department
SQL> select department_id, sum(salary) 2 from employee 3 group by department_id 4 order by department_id nulls last; DEPARTMENT_ID SUM(SALARY) ————————————— ——————————— 10 370000 20 155000 30 370000 75000 4 rows selected.
The query in Listing 6 demonstrates that it’s possible to pivot on more than one column. The results from this query display the sum total of salaries per department, but only for employees who were hired in a particular year. You can also pivot on and display multiple aggregate values, as Listing 7 demonstrates. The query in Listing 7 obtains the sum of all salaries, alongside the latest date of hire for an employee, per department.
Code Listing 6: Displaying the sum total salaries of employees per department for a particular year
SQL> select * 2 from (select department_id, 3 to_char(trunc(hire_date, 'YYYY'), 'YYYY') hire_date, salary 4 from employee) 5 PIVOT (SUM(salary) 6 FOR (department_id, hire_date) IN 7 ((10, '2007') AS Accounting_2007, 8 (20, '2008') AS Payroll_2008, 9 (30, '2010') AS IT_2010 10 ) 11 ); ACCOUNTING_2007 PAYROLL_2008 IT_2010 ——————————————— ———————————— ———————— 190000 90000 370000 1 row selected.
SQL> select * 2 from (select department_id, hire_date, salary 3 from employee) 4 PIVOT (SUM(salary) AS sals, 5 MAX(hire_date) AS latest_hire 6 FOR department_id IN (10, 20, 30, NULL)); 10_SALS 10_LATEST 20_SALS 20_LATEST 30_SALS 30_LATEST NULL_SALS NULL_LATE ——————— ————————— ——————— ————————— ——————— —————————— ————————— ————————— 370000 07-JUL-11 155000 06-MAR-09 370000 27-FEB-10 75000 14-SEP-05 1 row selected.
When you use multiple aggregate functions, it’s advisable to supply an alias for each of them. The resultant column headings are a concatenation of the pivot values (or pivot aliases), an underscore, and (if you’ve supplied them) the aliases of the aggregate functions. For example, some of Listing 7’s columns are 10_SALS and 10_LATEST. Note that the columns for the latest hire dates per department, such as 10_LATEST, are actually columns using the LATEST_HIRE alias. When you prepend the LATEST_HIRE alias with the department ID, the query should return a column that reads, for example, 10_LATEST_HIRE.
However, with SQL*Plus, the column heading displayed for a column with a DATE datatype is never longer than the default format for the value returned. The HIRE_DATE column’s format is DD-MON-RR, so only the first nine characters of the heading are displayed. To display a full heading, such as 10_LATEST_HIRE, consider using TO_CHAR to apply a date format mask to the column.
If you don’t supply an alias for your aggregate functions, you might get an error message, as shown in the example in Listing 8. Because neither of the aggregate functions in Listing 8 is aliased, the PIVOT operator doesn’t know to which one to apply the column heading for the pivot value (in this case, the DEPARTMENT_ID value). As a result, the PIVOT operator can’t simply use its default column headings and the query fails with a “column ambiguously defined” error message. Avoid this error by creating an alias for each aggregate function; don’t rely solely on the default column headings that result from use of the PIVOT operation.
Code Listing 8: A “column ambiguously defined” error occurs
SQL> select * 2 from (select department_id, hire_date, salary 3 from employee) 4 PIVOT (SUM(salary), 5 MAX(hire_date) 6 FOR department_id IN (10, 20, 30, NULL)); select * * ERROR at line 1: ORA-00918: column ambiguously defined
A Horizontal View of the Vertical
Just as you might have a reporting need to turn rows into columns, you might also need to turn columns into rows. You’ve seen one way to do this with the LISTAGG function. You can also use the UNPIVOT operator for this purpose. Note that the UNPIVOT operator does not reverse an action performed with the PIVOT operator. Rather, it works on data that is already stored as pivoted.
Consider the CREATE TABLE statement in Listing 9. It creates a table with pivoted data, using a query similar to the one in Listing 7. Now you can query this data by using the UNPIVOT operator, as Listing 10 illustrates. Compare the values returned from the query in Listing 10 with the values returned from the query in Listing 7. As you can see, they are the same but are displayed differently.
Code Listing 9: Creating a table with pivoted data
SQL> CREATE TABLE pivoted_emp_data AS 2 select * 3 from (select department_id, hire_date, salary 4 from employee) 5 PIVOT (SUM(salary) sum_sals, 6 MAX(hire_date) latest_hire 7 FOR department_id IN (10 AS Acc, 20 AS Pay, 30 AS IT, NULL)); Table created.
Code Listing 10: Using the UNPIVOT operator to turn rows into columns
SQL> select hire_date, salary 2 from pivoted_emp_data 3 UNPIVOT INCLUDE NULLS 4 ((hire_date, salary) 5 FOR department_id IN ( 6 (acc_latest_hire, acc_sum_sals) AS 'Accounting', 7 (pay_latest_hire, pay_sum_sals) AS 'Payroll', 8 (it_latest_hire, it_sum_sals) AS 'IT', 9 (null_latest_hire, null_sum_sals) AS 'Unassigned' 10 )) 11 order by hire_date, salary; HIRE_DATE SALARY —————————— —————— 14-SEP-05 75000 06-MAR-09 155000 27-FEB-10 370000 07-JUL-11 370000 4 rows selected.
Code Listing 11: Using aliases for value pairs of different datatypes
SQL> select hire_date, salary 2 from pivoted_emp_data 3 UNPIVOT INCLUDE NULLS 4 ((hire_date, salary) 5 FOR department_id IN ( 6 (acc_latest_hire, acc_sum_sals) AS Accounting, 7 (pay_latest_hire, pay_sum_sals) AS Payroll, 8 (it_latest_hire, it_sum_sals) AS IT, 9 (null_latest_hire, null_sum_sals) AS Unassigned 10 )) 11 order by hire_date, salary; (acc_latest_hire, acc_sum_sals) AS Accounting, * ERROR at line 6: ORA-56901: non-constant expression is not allowed for pivot|unpivot values
When and How to Predicate Analytically
Other than the final ORDER BY clause, analytic functions are the last set of operations performed in a query. Because they can appear only in the SELECT list or the ORDER BY clause, you cannot use them directly in any predicates, including in a WHERE or HAVING clause. If you need to select from a result set based on the outcome of applying an analytic function, you can use an inline view. An inline view is a SELECT statement in the FROM clause of another SELECT statement. It acts as a TABLE (otherwise known as a FROM) clause. You have already seen examples of inline view capability in this article in Listings 4, 6, 7, 8, and 9.
Code Listing 12: Using an inline view to enable use of an analytic function as a predicate
SQL> select * 2 from (select department_id, last_name||', '||first_name, salary, 3 dense_rank() over (partition by department_id 4 order by salary desc nulls last) d_rank 5 from employee) 6 where d_rank < 3 7 order by department_id, salary desc nulls last; DEPARTMENT_ID LAST_NAME||','||FIRST_NAME SALARY D_RANK ————————————— —————————————————————————————————————— —————— —————— 10 Eckhardt, Emily 100000 1 10 Newton, Donald 80000 2 20 peterson, michael 90000 1 20 leblanc, mark 65000 2 30 Jeffrey, Thomas 300000 1 30 Wong, Theresa 70000 2 Newton, Frances 75000 1 7 rows selected.
Suppose you want to use an analytic function to obtain the top two salary earners by department. As Listing 12 illustrates, you can place the analytic function operation in an inline view and alias it. The alias provided to the inline view in Listing 12 is D_RANK (this is named for the result of applying the DENSE_RANK analytic function). The query in the inline view (the inner query) must be resolved before it can be used by the query that encompasses it (the outer query). After the inline view completes, the outer query can use its result in a predicate. The predicate clause in the outer query for Listing 12 is
WHERE d_rank < 3
Striving to Perform Well
Although analytic functions help you form a more elegant and less convoluted SQL solution to a reporting requirement, they are not a replacement for writing good code. Your goal should be to constantly and consistently write good SQL that’s easy to maintain and that will perform well over time. It’s all too easy to abuse SQL techniques that make processes easier. Used incorrectly, any SQL technique can be written poorly and become a system inhibitor.
In particular, sorting and sifting data can exhaust system resources. (The query in Listing 13, for example, includes three potential sort operations.) This shouldn’t necessarily deter you from using analytic functions, but keep in mind that you can write a query that brings a system to its knees just as easily as you can write one that provides you with an efficient, elegant, and easy-to-maintain solution. With the power of analytic functions comes responsibility.
Code Listing 13: Query with analytic functions that may cause system performance problems
SQL> select first_name||' '||last_name, department_id, hire_date, 2 sum(salary) over (order by department_id, 3 first_name||' '||last_name) sum_dept_emp, 4 avg(salary) over (order by hire_date, department_id) avg_dept_hire_dt 5 from employee 6 order by department_id, hire_date, first_name||' '||last_name; FIRST_NAME||''||LAST_NAME D...MENT_ID HIRE_DATE SUM_DEPT_EMP AVG_DEPT_HIRE_DT ————————————————————————— ——————————— ————————— ———————————— ———————————————— Emily Eckhardt 10 07-JUL-04 240000 100000 Donald Newton 10 24-SEP-06 140000 85000 Betsy James 10 16-MAY-07 60000 74166.6667 Matthew Michaels 10 16-MAY-07 310000 74166.6667 Roger Friedli 10 16-MAY-07 370000 74166.6667 Lori Dovichi 10 07-JUL-11 240000 97000 michael peterson 20 03-NOV-08 525000 76428.5714 mark leblanc 20 06-MAR-09 435000 75000 Thomas Jeffrey 30 27-FEB-10 895000 100000 Theresa Wong 30 27-FEB-10 595000 97000 Frances Newton 14-SEP-05 970000 87500 11 rows selected.