Legal | Privacy
Analyzing Business Information
 
 

Analyzing Business Information

Module Objectives

Purpose

This module focuses on how to use the Oracle9i analytic functions.

Objectives

After completing this module, you should be able to:

Use inverse percentile functions

Use hypothetical rank and distribution functions

Use the FIRST and LAST aggregate functions
Use the WIDTH_BUCKET function
Use grouping sets

Prerequisites

Before starting this module, you should have completed the following:

Preinstallation Tasks
Installing the Oracle9i Database
Postinstallation Tasks
Downloaded analyze.zip and unzipped it into your working directory

Reference Material

The following is useful reference material if you want additional information about the topics in this module:

Documentation: Oracle9i Data Warehousing Guide

 

Overview

Oracle9i adds several new types of analytic functions. The analytic functions enhance both database performance and developer productivity. They are valuable for all types of processing, from interactive decision support to batch report jobs. Corporate developers and independent software vendors alike can take advantage of the features.

Here are the key benefits provided by the new functions:

Improved query speed

Improved developer productivity

Minimal required training
Standardized syntax

 

Using Inverse Percentile Functions

Two new Oracle9i functions, PERCENTILE_CONT and PERCENTILE_DISC, compute inverse percentiles. You can specify the percentile value you want, and the functions return the row from your data set which is closest to that percentile (PERCENTILE_DISC) or the calculated value closest to that percentile (PERCENTILE_CONT). These functions require a sort specification and a percentile parameter value between 0 and 1.

The functions can be used either as aggregate functions or as reporting aggregate functions. When used as aggregate functions, they return a single value per ordered set. When used as reporting aggregate functions, they return the data from each row. The functions use a new WITHIN GROUP clause to specify the data ordering.

Sample Query Using the PERCENTILE_DISC Function

In the Sales History (SH) schema, the cost of goods is tracked through the channels of distribution for catalog, direct sales, internet, partner, and telephone sales. To find the discrete value closest to the 50th percentile of costs per channel of distribution for the month of November, 1999, use the PERCENTILE_DISC function. To find the median value of the cost data, specify that the data should be sorted by cost, and then specify a percentile value of 0.5.

To run this query, perform the following steps:

1.

From a SQL*Plus session connected to the SH schema, enter @query_with_percentile_disc.sql to run the following script:

 

2.

The result from the query shows the discrete values closest to the 50th percentile (that is, the median values) for costs per channel of distribution in November, 1999. This information can be useful for cost projections in future Novembers. The average is also shown in this example, to highlight the difference between an average and a median.

 

Using Hypothetical Rank and Distribution Functions

In certain kinds of analysis, such as financial planning, you may want to know how a data value would rank if it were added to your data set. For instance, you want to introduce a new piece of women's clothing and you want to know what rank the planned price would have compared to your other prices.

The hypothetical rank and distribution functions support this type of what-if analysis: they return the rank or percentile value which a row would be assigned if it were inserted into a set of other rows. There are the following hypothetical functions: RANK, DENSE_RANK, PERCENT_RANK, and CUME_DIST. Like the inverse percentile functions, the hypothetical rank and distributions functions use a WITHIN GROUP clause containing an ORDER BY specification.

Sample Query Using the RANK and DENSE_RANK Functions

In the Sales History (SH) schema, you want to find the rank for hypothetical a piece of women's clothing with a price of $50.

To run this query, perform the following steps:

1.

From a SQL*Plus session connected to the SH schema, enter @query_with_rank.sql to run the following script:

 

2. The result from the query shows the rank of a dress with a list price of $50 within each of the subcategories of women's clothing. The large differences between the RANK and DR columns is due to the fact that many dresses have the same list price, therefore women's clothing calculated with the DENSE_RANK function has very few rank values compared to the regular RANK function.

 

Sample Query Using the CUME_DIST Function

In this example, you want to find out what percentile a $100 product would have in each category.

To run this query, perform the following:

1.

From a SQL*Plus session connected to the SH schema, enter @query_with_cume_dist.sql to run the following script:

 

2. The result from the query shows the fraction of products in the subcategory that have higher list prices. For instance, 24% of the products in the Shoes - Women subcategory have a list price higher than $100.

 

 

Using FIRST and LAST Aggregate Functions

The FIRST and LAST aggregate functions allow you to specify an order within aggregated groups and then return the first or last value in the group for a column other than the sorted column. Although an equivalent query can be created using joins or subqueries, the SQL syntax is cumbersome in that case and performance can be inefficient. The FIRST and LAST functions do the same work with a simpler SQL syntax and better performance.

Sample Query Using the FIRST and LAST Aggregate Functions

In the Sales History (SH) schema, you want to run a query that determines, by product, the dollar amount of the sale which sold the lowest number of items, and the dollar amount of the sale which sold the highest number of items.

To run this query, perform the following steps:

1.

From a SQL*Plus session connected to the SH schema, enter @query_with_first_last.sql to run the following script:

 

2. The result of the query shows, within each channel, the quantity of items sold in the transactions that involved the highest and lowest dollar amounts on October 10, 2000.

 


Using Grouping Sets

A grouping set is set of aggregate groups specified in a GROUP BY clause. As powerful as the CUBE and ROLLUP extensions are, there are additional abilities that would enhance them significantly. It would be very helpful to be able to specify all of the following in a single GROUP BY clause:

Grouping sets to perform multiple independent groupings

Composite columns to skip some unneeded aggregation levels

Concatenated groupings to concisely specify the groupings of interest by generating combinations automatically

Without the enhancements in Oracle9i, you must use multiple queries and combine them with the UNION ALL clause in order to achieve these tasks. Using multiple queries is inefficient, however, because it requires multiple scans of the same data. The extensions to the GROUP BY clause in Oracle9i allow the optimizer to choose better plans, thus enabling the SQL execution engine to execute the query very efficiently. Users can analyze data in one dimension without completely rolling it up, analyze across multiple dimensions without computing the whole CUBE, and specify multiple arbitrary groupings to meet any need. These features are extremely useful in data warehousing, which frequently involves extensive aggregate calculations.

 

Queries Using the GROUPING SETS Function

In the example shown, the data is grouped into three grouping sets:

channel_desc, calendar_month_desc, country_id

channel_desc, country_id

calendar_month_desc, country_id

Note: For simplicity, only data for the first two days of December, 1999, and for the products 10, 20, and 45 are selected.

To run this query, perform the following steps:

1.

From a SQL*Plus session connected to the SH schema, enter query_with_grouping_sets.sql to run the following script:

 

2.

The GROUPING SET statement shown previously identifies the exact sets wanted:

 


Queries Using Composite Columns

A composite column is a collection of columns that is treated as a unit during the computation of groupings. You specify the columns in parenthesis. Composite columns allow you to skip unwanted levels of rollup.

To run this query, perform the following steps:

1.

From a SQL*Plus session connected to the SH schema, enter query_with_composite_column.sql to run the following script:

 

2.

Because of the (channel_id, prod_id) composite column, you never see values for the grouping of time_id and channel_id.

 

Using the WIDTH_BUCKET Function

The WIDTH_BUCKET function divides a numeric range into a specified number of subranges called "buckets." For any given expression, the WIDTH_BUCKET function returns that expression's bucket number.

You can also generate equiwidth histograms with this function. Equiwidth histograms divide data sets into buckets of equal size. You provide the input expression, the minimum boundary value, the maximum boundary value, and the number of buckets. However, if you specify n buckets, you actually get n + 2 buckets: n regular ones and two artificial ones to catch values outside the boundary range: bucket 0 holds the values less than the minimum boundary value and bucket n + 1 holds the values greater than the maximum boundary value.

Sample Query Using the WIDTH_BUCKET Function

In this example, you want to analyze the product list prices between $10 and $90, and determine where in that range the Navy Blue Trousers and Jersey Trousers Kids are. To run such a query, perform the following steps:

1.

From a SQL*Plus session connected to the SH schema, enter @query_with_width_bucket.sql to run the following script:

Products with a list price less than $10 will be placed in bucket 0; products that have a list price greater than $90 will be placed in bucket 8. The other products will be placed in buckets 1 through 7, depending on their price.

 

2. The result of the query shows that Navy Blue Trousers are in bucket 2, and Jersey Trousers Kids are in bucket 0 because their price is less than $10.

 


Queries Using Concatenated Groupings

Concatenated groupings offer a concise way to generate useful combinations of groupings. Groupings specified with concatenated groupings yield the cross-product of groupings from each grouping set. The cross-product operation enables even a small number of concatenated groupings to generate a large number of final groups. The concatenated groupings are specified simply by listing multiple grouping sets, cubes, or rollups, and separating them with commas.

To run this query, perform the following steps:

1.

From a SQL*Plus session connected to the SH schema, enter query_with_concatenated_groupings.sql to run the following script:

 

2.

The results are four grouping generated as the cross product of the two grouping sets:

country_id, calendar_year
country_id, calendar_quarter_desc
cust_state_province, calendar_year
cust_state_province, calendar_quarter_year

 

For information on using concatenated groupings to generate hierarchical cubes for online analytic processing, please see the Data Warehouse Guide.

Module Summary

In this module, you should have learned how to:

Use inverse percentile functions

Use hypothetical rank and distribution functions

Use the FIRST and LAST aggregate functions
Use the WIDTH_BUCKET function
Use grouping sets

Copyright © 2002 Oracle Corporation. All Rights Reserved.

Close Window

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy