As Published In

Oracle Magazine
March/April 2002
SQL

Business-Savvy SQL

By Ganesh Variar

Business-Savvy SQL Develop and execute complex business-intelligence queries quickly and efficiently in SQL, using Oracle9i's analytic SQL features.

Have you ever tried listing the top 20 percent of your firm's products using SQL? How about writing a query to compare the sales amount for this month with that for the same month last year? If you've tried to develop these or other popular business queries, you're probably quite aware of SQL's limitations as a language for data analysis.

Fortunately for the business world, SQL has made some major strides with the analytic SQL features introduced in Oracle8i Release 2 and enhanced in Oracle9i. These features reduce the coding effort for complex analysis and provide improvements in query performance.

This article introduces key analytic SQL features, highlights enhancements in Oracle9i, and provides examples of how you can use these functions to answer complex business questions with SQL queries.

Key Benefits of Analytic SQL

Oracle9i's analytic SQL provides several significant benefits, including:

  • Reduced development effort. Analytic SQL replaces lengthy PL/SQL blocks with single queries.
  • Faster table access. In many cases, analytic SQL reduces the number of data passes required, producing up to a tenfold reduction in table-access times—particularly if your query performs large table scans.
  • Leverage cached query results. When an analytic SQL query generates results, the results remain in the cache, so that similar queries can later leverage the existing data.

Complex Tasks Made Simple

Here are some of the key tasks made simpler and more efficient with Oracle analytic SQL (and related expressions):

  • Multidimensional analysis and cross-tabulation. The GROUPING function and the CUBE and ROLLUP extensions to GROUP BY let you calculate multilevel aggregations for groups of columns, generating the type of data typically required for cross-tabular reports. Oracle9i introduces the concepts of composite columns and concatenated groupings, as well as the GROUPING SETS expression and the GROUPING_ID and GROUP_ID functions.
  • Ranking. The ranking functions let you calculate ranks, percentiles, and n_tiles of the values in a result set, as well as assign unique sequential numbers to rows. Oracle9i adds two significant new ranking capabilities: inverse percentile functions and hypothetical-ranking functions.
  • Cumulative and moving aggregations (windowing). The windowing clause and functions let you specify a window of multiple sequential rows in which to apply aggregate functions, so you can calculate cumulative and moving averages and other statistics.
  • Partitioned result sets. You can use the PARTITION BY clause with most analytic functions to divide the result set into groups based on one or more values.

There are two important points to consider when using the analytic functions:

  • You can use the analytic functions only in the SELECT and ORDER BY clauses in a query, because these functions operate on the result set generated after the joins and the WHERE, GROUP BY, and HAVING clauses are executed. Note that the GROUP BY extensions are used only in the GROUP BY clause.
  • You cannot apply one analytic function over another directly. You can do so indirectly, however, by using multiple nested SELECT statements. You can also use nested SELECT statements to constrain the results of an analytic function.

Real-World Examples

To explore the power and flexibility of analytic SQL features, let's look at some example queries that answer real-world business questions. These example queries are based on a sales-transaction table with the following structure:

SALES
trans_date	date
product_id 	varchar(10)

customer_id 	varchar(10)
sales_amt   	number

Each customer purchase inserts a record into this table. For simplicity, assume that none of the columns contains null values; however, analytic functions can handle nulls quite elegantly.

Identifying the top five products bought by the highest-volume customers. "What are the top five products bought by the top 15 percent of our customers?" is a classic demand-analysis question. The ranking functions CUME_DIST and RANK help develop this query in two steps. Then you can try a little what-if analysis with Oracle9i's hypothetical-ranking capability.

First, use the query shown in Listing 1 to determine the top 15 percent of the customers (that is, customers whose sales amount is greater than the amounts for 85 percent of the customers).

This query calculates the total sales amount for each customer, using the GROUP BY clause. Then apply the CUME_DIST function to find the percentile value for each customer (the percentage of customers whose sales amounts are less than or equal to the amount for the current customer). Since you can't use the analytic function in the WHERE clause, you need an outer query to constrain the results of the CUME_DIST function. The outer query takes the inner query's output and puts constraints on the percentile value to determine the top 15 percent of the customers.

Next, extend the first query and rank the top five products bought by an elite group of customers, using the query shown in Listing 2. The two innermost levels of the query are similar to the first query of this example. The list of the top 15 percent of the customers is supplied to the next outer-level query, which uses the RANK function to rank the products bought by these high-volume customers based on the sales amount. The outermost query filters out the five top-ranking products. The final ORDER BY clause makes the results appear in ascending order by product rank.

Extending the example, you can use Oracle9i's hypothetical rank functions, in which hypothetical arguments are supplied to ranking, to perform a "what if" analysis. Listing 3 shows what the rank of a new product would be if the top 15 percent of customers were to buy $100,000 worth of the product.

The query results in Listing 3 show that the hypothetical product would rank seventh in sales amount among the highest-volume customers. To fully appreciate this example, consider how you would achieve these results without using analytic functions. You would first need to compute the top 15 percent of the customers by using a cursor, and then determine the top five products and the rank of the hypothetical product using additional cursors or temporary tables. Or you might not use your database at all. In practice, most users compute such queries by using analytic tools that require more powerful client hardware, increase network traffic, and yield relatively poor performance.

Identifying a company's top three-month sales periods. Businesses often like to find out their best sales periods to date. Listing 4 contains a query to list all three-consecutive-month periods having total sales of US$10 million or more. The query reports the first and last months of the three-month period, plus the total sales amount for that period. This example illustrates the use of a moving time-based window, created using the windowing clause and the analytic functions FIRST_VALUE, LAST_VALUE, and SUM OVER.

Listing 4 also shows how the nested SELECT statements create a query producing the needed results. The innermost query calculates the total sales amount for every month. Then, the next-level query takes this result as its input. For every row in the input, the query uses windowing functions to create a three-month window (spanning the current month and the two months preceding it). For every window, the query uses the analytic functions FIRST_VALUE, LAST_VALUE, and SUM OVER to determine the first month, last month, and total sales for the three-month period. Finally, the outermost query filters out the three-month periods having total sales of US$10 million or more and orders the results chronologically.

Analytic functions allow you to avoid complex cursor logic while keeping the computations in the database.

Viewing purchasing patterns over time. Businesses often like to analyze customers' purchasing patterns over time. Typically such data is displayed in cross-tabular reports, which can be complex to generate. Listing 5 uses the CUBE (or ROLLUP) extension to GROUP BY, along with the NTILE function, to generate such data with a single query. Then, it generates more-targeted subtotals by using Oracle9i's concatenated-grouping feature.

Once again, the query uses nested SELECT statements to apply multiple analytic functions. The innermost query first calculates the total sales amount for each customer. Next, the NTILE function divides the customers into four bands (or quartiles) based on the total sales amount for each customer. Band 1 has the top 25 percent of customers by sales amount, Band 2 has the next 25 percent, and so on.

The outer query joins the results of the inner query with the sales table and computes the total sales amount generated by the customers in each band for each year. Then it applies the CUBE extension in the GROUP BY clause of the outer query to calculate subtotals of the sales amount for each year across all bands and for each band across all years. Finally, the CUBE function returns a grand total across all years and all bands, outputting nulls for rows containing subtotals and the grand total. It uses the GROUPING function, along with the DECODE function, to replace the nulls in the output with a meaningful row header for the totals, as the sample output in Listing 5 shows.

To focus on purchasing patterns and product sales across years as well as months, you can use Oracle9i's concatenated-grouping feature to concatenate two GROUPING SETS expressions in the following GROUP BY clause:

group by 
GROUPING SETS (to_char(trans_date, 'YYYY'),
to_char(trans_date, 'YYYY-MM')),
GROUPING SETS (band, product_id)

This query computes the following four groupings: (Year, band), (Year, product_id), (Month, band), and (Month, product_id).

Unlike the CUBE and the ROLLUP functions, which compute the subtotals at various levels, this query gives only the groupings you request, and the query requires only one table scan. To achieve the same result without the concatenated-grouping feature, you would have to write four SELECT ?GROUP BY statements and combine them with UNION ALL clauses, and that query would require four table scans.

Getting Down to Business

The analytic functions in Oracle8i and Oracle9i have helped SQL keep pace with the growing demand for real-time business analysis. With these functions, you can greatly reduce development time and improve performance for your data-analysis queries. And by combining analytic functions strategically, as discussed in this article, you can maximize their effectiveness.

Ganesh Variar (ganesh_variar@yahoo.com) is a consultant at Saama Technologies Inc. (www.saama.com), a Silicon Valley-based consulting company.

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy