Legal | Privacy
Implementing Relational Business Analysis with SQL
 
 

Implementing Relational Business Analysis with SQL

Module Objectives

Purpose

In this module, you will learn how to use Oracle9i for relational business alanlysis.

Objectives

After completing this module, you should be able to:

Explain Oracle’s powerful aggregation capabilities
Use new and improved functionality for business intelligence uses
Use the new query subfactoring clause of Oracle9i

Prerequisites

Before starting this lesson, you should have:

Completed the Preinstallation module

Completed the Install Oracle9i Database module

Completed the Postinstallation module

Completed the Review the Sample Schema module
Completed the Setup Data Warehousing lesson
Downloaded rba.zip and unzipped it into your working directory

Reference Material

The following is a list of useful reference materials if you want additional information about the topics in this module:

Documentation: Data Warehousing Guide


Overview

The SQL language, while extremely capable in many areas, has provided little support for business intelligence processing. Many aggregation tasks and business intelligence calculations such as subtotals, moving averages, rankings and lead/lag comparisons have been inefficient and inconvenient. To handle these limitations, application developers and DBAs have often needed to program outside of standard SQL or use SQL that was slow and cumbersome. This situation has hurt developer productivity and database performance.

The recent releases of Oracle have addressed this problem by significantly enhancing SQL for business intelligence processing:

Oracle8i Release 1 added support for the CUBE and ROLLUP extensions to the SELECT statement’s GROUP BY clause. These extensions enable more efficient and convenient aggregations, a key part of data warehousing and business intelligence processing.

Oracle8i Release 2 introduced a powerful new set of SQL analytic functions to address essential business intelligence calculations. The analytic functions provide enhanced performance and higher developer productivity for many calculations. In addition, some of the functions have been incorporated into the international ANSI SQL standard.

Oracle9i adds several new families of analytic functions, plus further extensions to the GROUP BY clause such as Concatenated Groupings and GROUPING SETS. The new GROUP BY extensions add flexibility to aggregate processing and are integrated with Materialized Views for enhanced performance. The query subfactoring clause is another new feature enhancing query performance.

Now you will walk through this functionality in the following sections.

This module can only give an introduction into Oracle’s powerful and rich functionality for relational business analysis. Please consult the ‘Oracle Data Warehousing Guide’ for any further details.

Generic Aggregation Capabilities Cube and Rollup

Aggregation is a fundamental part of data warehousing. To improve aggregation performance in your warehouse,
Oracle provides the following extensions to the GROUP BY clause:

Loading data using external tables

To show you how external tables can be created and used, perform the following steps:

CUBE and ROLLUP Extensions to the GROUP BY Clause

The Three GROUPING Functions

GROUPING SETS Expression

The CUBE, ROLLUP, and GROUPING SETS extensions to SQL make querying and reporting easier and faster.ROLLUP calculates aggregations such as SUM, COUNT, MAX, MIN, and AVG at increasing levels of aggregation, from the most detailed up to a grand total. CUBE is an extension similar to ROLLUP, enabling a single statement to calculate all possible combinations of aggregations. CUBE can generate the information needed in cross-tabulation reports with a single query.

CUBE, ROLLUP, and the GROUPING SETS extension let you specify exactly the groupings of interest in the GROUP BY clause. This allows efficient analysis across multiple dimensions without performing a CUBE operation. Computing a full cube creates a heavy processing load, so replacing cubes with grouping sets can significantly increase performance. CUBE, ROLLUP, and grouping sets produce a single result set that is equivalent to a UNION ALL of differently grouped rows.

To enhance performance, CUBE, ROLLUP, and GROUPING SETS can be parallelized: multiple processes can simultaneously execute all of these statements. These capabilities make aggregate calculations more efficient,thereby enhancing database performance, and scalability.

The three GROUPING functions help you identify the group each row belongs to and enable sorting subtotal rows and filtering results.

Steps

The following steps will be performed to show you Oracle’s aggregation capabilities:

1.

Get a basic understanding of aggregation CUBE and ROLLUP

2.

Use the GROUPING SET expression from Oracle9i

BASIC UNDERSTANDING OF CUBE AND ROLLUP

Prior to the introduction of the aggregation functions CUBE and ROLLUP, there was no possibility to express an aggregation over different levels within one SQL statement without the usage of the set operation UNION ALL. Every different aggregation level needed its own SQL aggregation expression, operating on the exact same data set n times, once for each of the n different aggregation levels.

With the introduction of CUBE and ROLLUP in Oracle8i, Oracle provided a single SQL command for handling the aggregation over different levels within one single SQL statement, not only improving the runtime of this operation but also reducing the amount of internal operations necessary and reducing the workload on the system.

ROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. It also calculates a grand total. ROLLUP is a simple extension to the GROUP BY clause, so its syntax is extremely easy to use. The ROLLUP extension is highly efficient, adding minimal overhead to a query.

CUBE takes a specified set of grouping columns and creates subtotals for all of their possible combinations. In terms of multidimensional analysis, CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions.

We will demonstrate the advanced access path of those operations with a ROLLUP example.

ROLLUP - THE ‘OLD’ WAY

1.

From a SQL*Plus session logged on to the SH schema, run explain_old_rollup.sql, or copy the following SQL statement into your SQL*Plus session:

@explain_old_rollup.sql

Rem old way ROLLUP with UNION ALL
TRUNCATE TABLE plan_table;

EXPLAIN PLAN FOR
   SELECT   t.calendar_month_desc month,
            p.prod_category category,
            SUM(s.amount_sold) total_sales
   FROM     sales s, products p, times t, customers c
   WHERE    s.time_id = t.time_id
   AND      s.prod_id = p.prod_id
   AND      s.cust_id = c.cust_id
   AND      t.calendar_month_desc in ('2000-01', '2000-02')
   AND      c.cust_state_province in ('CA', 'NV')
   GROUP BY t.calendar_month_desc, p.prod_category
   UNION ALL
   SELECT   t.calendar_month_desc month,
            NULL category,
            SUM(s.amount_sold) total_sales
   FROM     sales s, products p, times t, customers c
   WHERE    s.time_id = t.time_id
   AND      s.prod_id = p.prod_id
   AND      s.cust_id = c.cust_id
   AND      t.calendar_month_desc in ('2000-01', '2000-02')
   AND      c.cust_state_province in ('CA', 'NV')
   GROUP BY t.calendar_month_desc
   UNION ALL
   SELECT NULL month,
          NULL category,
          SUM(s.amount_sold) total_sales
   FROM   sales s, products p, times t, customers c
   WHERE  s.time_id = t.time_id
   AND    s.prod_id = p.prod_id
   AND    s.cust_id = c.cust_id
   AND    t.calendar_month_desc in ('2000-01', '2000-02')
   AND    c.cust_state_province in ('CA', 'NV');

set linesize 132
set pagesize 999
select * from table(dbms_xplan.display);

The following plan shows, how Oracle executes this operation. The most important fact is that Oracle has to access the large fact table sales THREE times, once per UNION ALL branch, and that the subsequent result sets finally merged.

Selecting additional aggregation levels would lead to additional SQL statements in the UNION ALL operation, causing additional access of the large fact table SALES.

To execute the query, you can run old_rollup.sql.

ROLLUP - THE NEW WAY

2.

The same business query, expressed with Oracle’s ROLLUP functionality leads to a different execution plan:

@explain_new_rollup.sql
TRUNCATE TABLE plan_table;

EXPLAIN PLAN FOR
   SELECT   t.calendar_month_desc month,
            p.prod_category category,
            SUM(s.amount_sold) total_sales
   FROM     sales s, products p, times t, customers c
   WHERE    s.time_id = t.time_id
   AND      s.prod_id = p.prod_id
   AND      s.cust_id = c.cust_id
   AND      t.calendar_month_desc in ('2000-01', '2000-02')
   AND      c.cust_state_province in ('CA', 'NV')
   GROUP BY ROLLUP(t.calendar_month_desc, p.prod_category);

set linesize 132
set pagesize 999
select * from table(dbms_xplan.display);

Unlike the plan shown before, the new ROLLUP functionality scans the large fact table SALES only once and computes the different aggregations as part of the SORT-GROUP BY. This leads to tremendous performance improvements, especially when the data volume becomes larger and the requested number of aggregation levels increases.

To execute the query with the new ROLLUP functionality, you can run new_rollup.sql. Even with our small test data volume, you will experience a significant performance improvement.

ORACLE 9I’S NEW GROUPING SETS EXPRESSION

You can selectively specify the set of groups that you want to create using a GROUPING SETS expression within a GROUP BY clause. This allows precise specification across multiple dimensions without computing the whole CUBE.

The following example precisely specifies 3 grouping sets. Without the use of grouping sets, the query would need to be done as either:

- a UNION of 3 single group queries, or
- a group by query using CUBE, which would need 
  to be filtered to remove unwanted rows

Either of those approaches would run slower and use more resources.

1.

From a SQL*Plus session logged on to the SH schema, run run_gs1.sql, or copy the following SQL statement into your SQL*Plus session:

@run_gs1.sql

SELECT   channel_desc, calendar_month_desc, country_id,
         TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM     sales, customers, times, channels
WHERE    sales.time_id=times.time_id 
AND      sales.cust_id=customers.cust_id 
AND      sales.channel_id=channels.channel_id 
AND      channels.channel_desc IN ('Direct Sales','Internet') 
AND      times.calendar_month_desc IN ('2000-09', '2000-10')
AND      country_id IN ('UK', 'US')
GROUP BY GROUPING SETS((channel_desc,calendar_month_desc,country_id),
         (channel_desc, country_id),(calendar_month_desc, country_id))
;

Try to rewrite the above shown SQL statement as UNION ALL and compare the execution plans. You can compare your result with run_old_gs1.sql. The plans are shown with explain_gs1.sql, or explain_old_gs1.sql respectively.

GROUPING SETS WITH COMPOSITE COLUMNS

Composite columns are columns grouped together with parentheses. In CUBE and ROLLUP statements, composite columns are treated as a single unit. This reduces the number of groups generated by the query, which improves performance when we do not need all the groups that would otherwise be generated. It also reduces the number of rows that need to be filtered from the results.

In the example below, the composite column of (calendar_month_desc, country_id) in ROLLUP prevents the query from returning the grouping: (channel_desc, calendar_month_desc).

The only groupings returned are:

  • (channel_desc, calendar_month_desc, country_id)
  • channel_desc)
  • ( )
1.

From a SQL*Plus session logged on to the SH schema, run run_gs2.sql, or copy the following SQL statement into your SQL*Plus session:

@run_gs2.sql

SELECT   channel_desc, calendar_month_desc, country_id,
         TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM     sales, customers, times, channels
WHERE    sales.time_id=times.time_id 
AND      sales.cust_id=customers.cust_id 
AND      sales.channel_id=channels.channel_id 
AND      channels.channel_desc IN ('Direct Sales','Internet') 
AND      times.calendar_month_desc IN ('2000-09','2000-10')
AND      country_id IN ('UK', 'US')
GROUP BY ROLLUP(channel_desc, (calendar_month_desc, country_id));

Compare the output with the one from the preceding SQL statement, and you’ll see the suppression of the grouping (channel_desc, calendar_month_desc).

CONCATENATED GROUPING SETS

Grouping sets are further enhanced with the concept of 'concatenated grouping sets'. Concatenated grouping sets offer a concise way to generate large and complex combinations of groupings. Groupings specified with concatenated grouping sets yield the cross-product of groupings from each grouping set. The cross-product operation enables even a small number of concatenated groupings to generate a large number of final groups. The syntax for concatenated grouping sets is very easy: just list the grouping elements separated by commas.

The following example creates a cross product using the four columns specified in the two GROUPING SETS. Therefore it creates the following 4 groups:

  • (country_id, calendar_year)
  • (country_id, calendar_quarter_desc)
  • (cust_state_province, calendar_year)
  • (cust_state_province, calendar_quarter_desc)
1.

From a SQL*Plus session logged on to the SH schema, run run_gs3.sql, or copy the following SQL statement into your SQL*Plus session:

@run_gs3.sql

SELECT   country_id, cust_state_province,
         calendar_year, calendar_quarter_desc,
         TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM     sales, customers, times, channels
WHERE    sales.time_id=times.time_id 
AND      sales.cust_id=customers.cust_id 
AND      sales.channel_id= channels.channel_id 
AND      channels.channel_desc IN ('Direct Sales', 'Internet') 
AND      times.calendar_month_desc IN ('2000-09', '2000-10')
AND      country_id IN ('UK', 'US')
GROUP BY GROUPING SETS (country_id, cust_state_province),
         GROUPING SETS (calendar_year, calendar_quarter_desc);

CONCATENATED GROUPING SETS - HIERARCHIC CUBE EXAMPLE

The following example illustrates a powerful technique for generating all the groupings needed to create a hierarchical data cube, such as those used for OLAP queries. The SQL generates the ROLLUP of aggregations for each dimension (time, geography, product) of a fact table, and it creates the cross product of groups by combining the ROLLUP groups. This techniques enables a small, easy-to-read SQL statement to generate a very large number of useful groups.

The query below generates the following number of ROLLUP groups:

  • 4 for the time dimension
  • 6 for the geography dimension
  • 4 for the product dimension

When these groups are multiplied for the cross product, we see that the query will create 4*6*4 = 96 groups. Oracle9i has been optimized to process hierarchical cubes specified with concatenated grouping sets efficiently.

1.

From a SQL*Plus session logged on to the SH schema, run run_gs4.sql, or copy the following SQL statement into your SQL*Plus session:

@run_gs4.sql

SELECT   calendar_year, calendar_quarter_desc,
         calendar_month_desc, country_region, 
         country_subregion, countries.country_id,
         cust_state_province, cust_city,
         prod_cat_desc, prod_subcat_desc, prod_name,
         TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM     sales, customers, times, channels, countries, products
WHERE    sales.time_id=times.time_id 
AND      sales.cust_id=customers.cust_id 
AND      sales.channel_id= channels.channel_id 
AND      sales.prod_id=products.prod_id 
AND      customers.country_id=countries.country_id 
AND      channels.channel_desc IN ('Direct Sales', 'Internet') 
AND      times.calendar_month_desc IN ('2000-09', '2000-10') 
AND      prod_name IN ('Vintage Linen Shirt', 'Merrel Bangle Moc') 
AND      countries.country_id IN ('UK', 'US')
GROUP BY ROLLUP(calendar_year, calendar_quarter_desc, 
         calendar_month_desc), ROLLUP(country_region, 
         country_subregion, countries.country_id,
         cust_state_province, cust_city), 
         ROLLUP(prod_cat_desc, prod_subcat_desc, prod_name);

Note that this will return a large number of rows. You can interrupt the output with CTRL-C.

THE USAGE OF GROUPING_ID AND GROUP_ID

GROUPING_ID and GROUP_ID are 2 new functions in Oracle9i which help distinguish among groups returned by a query. Since a single query may return a large number of groups (consider the hierarchical cube query above) it is important to be able to distinguish among the groups efficiently.

GROUPING_ID creates a bit vector value that distinguishes among the groups specified in its arguments. If you specify 2 columns as arguments, GROUPING_ID will return a value of 0,1,2,or 3, as shown in the table below.

COL1
COL2
GROUPING_ID (COL1,COL2)
Detail
Detail
0
Detail
Aggregated
1
Aggregated
Detail
2
Aggregated
Aggregated
3

GROUP_ID shows you if a group is returned multiple times by a query, allowing you to filter out the extra occurrences. The query below returns 2 sets of the group (country_id). In typical situations, we would wish to filter out the redundant set of the (country_id) group. This would be easy with the GROUP_ID function, because it will give the second (country_id) group a value of 1.

1.

From a SQL*Plus session logged on to the SH schema, run run_gs5.sql, or copy the following SQL statement into your SQL*Plus session:

@run_gs5.sql

SELECT   country_id, cust_state_province, SUM(amount_sold),
         GROUPING_ID(country_id, cust_state_province) 
         GROUPING_ID, GROUP_ID()
FROM     sales, customers, times
WHERE    sales.time_id=times.time_id 
AND      sales.cust_id=customers.cust_id 
AND      times.time_id= '30-OCT-00'
AND      country_id IN ('FR', 'ES')
GROUP BY GROUPING SETS(country_id, 
            ROLLUP(country_id, cust_state_province));

USE NEW AND IMPROVED FUNCTIONALITY FOR BUSINESS INTELLIGENCE QUERIES

Oracle's analytic functions allow the RDBMS to directly resolve many OLAP queries closest to the data where the best performance can be achieved. Without these functions, many important calculations would require complex and inefficient SQL statements. The functions are particularly valuable for avoiding inefficient selfjoins and views. Below we present the analytic functions added in Oracle9i.

For details on the functions added in Oracle 8i release 2, please see the ‘Oracle Data Warehousing Guide’.

INVERSE PERCENTILE

The PERCENTILE_DISC function finds the stored value which is closest to the specified percentile. The PERCENTILE_CONT function uses linear interpolation to calculate a value matching the specification. Inverse percentile functions are very useful for finding cutoff points and target levels. For instance, you can use them to answer questions such as "What is the sales level achieved by my top 15% of sales people?"

1.

This example finds the median value (50th percentile), both as stored data and as calculated number. From a SQL*Plus session logged on to the SH schema, run run_an1.sql, or copy the following SQL statement into your SQL*Plus session:

@run_an1.sql
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP 
       (ORDER BY cust_credit_limit) AS perc_disc,
       PERCENTILE_CONT(0.5) WITHIN GROUP
       (ORDER BY cust_credit_limit) AS perc_cont
FROM   customers 
WHERE  cust_city='Marshal';

WIDTH_BUCKET

Histograms are a powerful and popular analytic tool. To create histograms, we must first categorize our values into numeric ranges called buckets. Once the rows in our data set have been assigned to buckets, it is easy to generate histograms by counting the number of rows for each bucket.

The WIDTH_BUCKET function performs the first step in this process. It lets you specify "n" equi-width buckets to be used, and the top and bottom of the numeric range. Since some numbers may be outside the specified range, there is an underflow bucket numbered 0 and an overflow bucket numbered n+1.

The example below shows customers divided into 4 buckets on the basis of their credit limits. It also shows how the ordering of the buckets can be determined by placing the low boundary first (WIDTH_BUCKET_UP) or last (WIDTH_BUCKET_DOWN).

1.

From a SQL*Plus session logged on to the SH schema, run run_an4.sql, or copy the following SQL statement into your SQL*Plus session:

@run_an4.sql

SELECT cust_id, cust_credit_limit,
       WIDTH_BUCKET(cust_credit_limit,0,20000,4) 
                                       AS WIDTH_BUCKET_UP,
       WIDTH_BUCKET(cust_credit_limit,20000, 0, 4) 
                                        AS WIDTH_BUCKET_DOWN
FROM   customers WHERE cust_city = 'Marshal';

HYPOTHETICAL RANK AND DISTRIBUTION FUNCTIONS

Many business planning tasks benefit from knowing how a proposed item would rank compared to existing items. Oracle9i has added syntax for its ranking and distribution functions that permits this kind of query.

This example shows the RANK, PERCENT_RANK, and CUME_DIST that a hypothetical sweater with a list price of $50 would have compared to other sweaters within each of the sweater subcategories.

1.

From a SQL*Plus session logged on to the SH schema, run run_an2.sql, or copy the following SQL statement into your SQL*Plus session:

@run_an2.sql

SELECT   prod_subcategory,
         RANK(50) WITHIN GROUP 
         (ORDER BY prod_list_price DESC) as HRANK,
         TO_CHAR(PERCENT_RANK(50) WITHIN GROUP
         (ORDER BY prod_list_price),'9.999') AS HPERC_RANK,
         TO_CHAR(CUME_DIST (50) WITHIN GROUP
         (ORDER BY prod_list_price),'9.999') AS HCUME_DIST
FROM     products
WHERE    prod_subcategory LIKE 'Sweater%'
GROUP BY prod_subcategory;

FIRST/LAST

The FIRST and LAST functions allow you to order groups by specified columns and then return the first or last value of another column. If a tie value is encountered, then a tiebreaker function will be used to resolve the tie.

To achieve the same result without this function is much less efficient.

In the example below, the data is grouped by channel_desc and ordered within each group by $amount_sold. The column titled QUANT_FOR_LEAST_$_SALES returns the unit quantity sold for the row that had the least $ amount. This is the "FIRST" row in the ordered group. If a tie value is found, the column will show the MIN value.

The column titled QUANT_FOR_MOST_$_SALES returns the unit quantity sold in the row that had the most $ amount. This is the "LAST" row in the ordered group. If a tie value is found, the column will show the MAX value.

1.

From a SQL*Plus session logged on to the SH schema, run run_an3.sql, or copy the following SQL statement into your SQL*Plus session:

@run_an3.sql

SELECT   ch.channel_desc,
         MIN((quantity_sold)) KEEP
         (DENSE_RANK FIRST ORDER BY (s.amount_sold) )
         as QUANT_FOR_LEAST_$_SALES,
         MAX((quantity_sold)) KEEP
         (DENSE_RANK LAST ORDER BY (s.amount_sold) )
         as QUANT_FOR_MOST_$_SALES
FROM     sales s , channels ch
WHERE    s.time_id=to_DATE('10-OCT-2000') 
AND      s.channel_id=ch.channel_id
GROUP BY ch.channel_desc;

USE NEW QUERY SUBFACTORING CLAUSE IN ORACLE9i

The query subfactoring clause allows a query block that is used multiple times in a query to be calculated just once and then reused. This is extremely valuable for query blocks that require a large amount of processing.

The example below defines channel_summary at the start and then uses it twice in the body of the query.

1.

Prior to Oracle9i, you had to code the recurring query block for every occurrence. There’s no optimization for the recurring query block.

From a SQL*Plus session logged on to the SH schema, run run_qs_old.sql, or copy the following SQL statement into your SQL*Plus session:

@run_qs_old.sql

SELECT   channels.channel_desc, SUM(amount_sold) channel_total
FROM     sales, channels
WHERE    sales.channel_id = channels.channel_id
GROUP BY channels.channel_desc
HAVING   SUM(amount_sold) >
         ( SELECT SUM(amount_sold) * 1/3
           FROM   sales, channels
           WHERE  sales.channel_id = channels.channel_id);

This is an example output of the execution plan of this query (explain_qs_old.sql). The large fact table sales is accessed twice as part of the query; once for the inner SELECT in the HAVING clause, and a second time for getting the SELECT part of the query.

2.

From a SQL*Plus session logged on to the SH schema, run run_qs_new.sql, or copy the following SQL statement into your SQL*Plus session:

@run_qs_new.sql

WITH channel_summary AS
   ( SELECT   channels.channel_desc, SUM(amount_sold) AS channel_total
     FROM     sales, channels
     WHERE    sales.channel_id = channels.channel_id
     GROUP BY channels.channel_desc
   )
   SELECT channel_desc, channel_total
   FROM   channel_summary
   WHERE  channel_total >
          ( SELECT SUM(channel_total) * 1/3 
            FROM channel_summary)
;

This is an example output of the execution plan of this query (explain_qs_new.sql). The fact table sales is touched only once. Oracle first executes an internal recursive part to build the result set for the query block described as channel_summary in the named WITH part of the query as temporary table for this specific query. You can see the actual insert statement in the OTHER column of explain plan.

This temporary table, representing the query block which is used multiple times, is then used in the statement, shown in the lower part of the plan, ending with the FILTER operation.


Module Summary

In this module, you should have learned how to:

Explain Oracle’s powerful aggregation capabilities
Use new and improved functionality for business intelligence uses
Use the new query subfactoring clause of Oracle9i

Copyright © 2002 Oracle Corporation. All Rights Reserved.


Close Window

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy