Using the SQL MODEL Clause to Define Inter-Row Calculations

Purpose

In this tutorial you learn how to use the Oracle Database 10g SQL MODEL clause to perform inter-row calculations.

Time to Complete

Approximately 30 minutes

Topics

This tutorial covers the following topics:

	Overview
	Scenario
	Prerequisites
	Setting Up the Sample Data
	Reviewing the Example Syntax
	Using Positional and Symbolic Cell References
	Using Multi-Cell References on the Right Side of a Rule
	Using the CV() Function and the ANY Wildcard
	Coding FOR Loops: A Concise Way to Specify New Cells
	Understanding the Order of Evaluation of Rules
	Handling NULL Measures and Missing Cells
	Creating Reference Models
	Creating Iterative Models
	Using Ordered Rules
	Summary

Viewing Screenshots

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Overview

With the SQL MODEL clause, you can define a multidimensional array on query results and then apply rules on the array to calculate new values. The rules can be sophisticated interdependent calculations. By integrating advanced calculations into the database, performance, scalability, and manageability are enhanced significantly compared to external solutions. Rather than copying data into separate applications or PC spreadsheets, users can keep their data within the Oracle environment.

The MODEL clause defines a multidimensional array by mapping the columns of a query into three groups: partitioning, dimension, and measure columns. These elements perform the following tasks:

	Partitions define logical blocks of the result set in a way similar to the partitions of the analytical functions (described in the chapter titled "SQL for Analysis in Data Warehouses" in the Data Warehousing Guide). MODEL rules are applied to the cells of each partition.
	Dimensions identify each measure cell within a partition. These columns identify characteristics such as date, region, and product name.
	Measures are analogous to the measures of a fact table in a star schema. They typically contain numeric values such as sales units or cost. Each cell is accessed within its partition by specifying its full combination of dimensions.

To create rules on these multidimensional arrays, you define computation rules expressed in terms of the dimension values. The rules are flexible and concise, and can use wild cards and FOR loops for maximum expressiveness. Calculations built with the MODEL clause improve on traditional spreadsheet calculations by integrating analyses into the database, improving readability with symbolic referencing, and providing scalability and much better manageability.

The figure below gives a conceptual overview of the model feature using a hypothetical sales table. The table has columns for country, product, year, and sales amount. The figure has three parts. The top segment shows the concept of dividing the table into partitioning, dimension, and measure columns. The middle segment shows two hypothetical rules that forecast sales for Prod1 and Prod2 as the calculated value of product sales from the two previous years. Finally, the third part shows the output of a query applying the rules to such a table with hypothetical data. The black output is data retrieved from the database, whereas the blue output shows rows calculated from rules. Note that the rules are applied within each partition.

Columns mapped to Partition, Dimension, and Measure

COUNTRY	PRODUCT	YEAR	SALES
Partition	Dimension	Dimension	Measure

Rules:

sales('prod1', 2002) = sales('prod1', 2000) + sales('prod1', 2001)
sales('prod2', 2002) = sales('prod2', 2000) + sales('prod2', 2001)

Output of the MODEL clause:

COUNTRY	PRODUCT	YEAR	SALES
Partition	Dimension	Dimension	Measure
A	prod1	2000	10
A	prod1	2001	15
A	prod2	2000	12
A	prod2	2001	16
B	prod1	2000	21
B	prod1	2001	23
B	prod2	2000	28
B	prod2	2001	29
A	prod1	2002	25
A	prod2	2002	28
B	prod1	2002	44
B	prod2	2002	57

Note that the MODEL clause does not update existing data in tables, nor does it insert new data into tables—to change values in a table, the model results must be supplied to an INSERT, UPDATE, or MERGE statement.

1.	Perform the Installing Oracle Database 10g on Windows tutorial.
2.	Download and unzip model_clause.zip into your working directory (i.e. c:\wkdir)

1.	Start a SQL Plus session. Select Start >* Programs > Oracle-OraDB10g_home > Application Development > SQL Plus. (Note: This tutorial assumes you have an c:\wkdir folder. If you do not, you will need to create one and unzip the contents of model_clause.zip into this folder. While executing the scripts, paths are specified)
2.	Log in as the SH user. Enter SH as the User Name and SH as the Password. Then click OK.
3.	First, make sure that you have a clean environment. Run the cleanup.sql script from your SQLPlus session. @c:\wkdir\cleanup.sql* The cleanup.sqlscript contains the following: DROP VIEW sales_view; DROP TABLE dollar_conv; DROP TABLE growth_rate; DROP TABLE ledger;
4.	Now you can create the SALES_VIEW view. From your SQLPlus session, execute the following script: @c:\wkdir\sample_data.sql* The sample_data.sqlscript contains the following: CREATE VIEW sales_view AS SELECT country_name country, prod_name prod, calendar_year year, SUM(amount_sold) sale, COUNT(amount_sold) cnt FROM sales, times, customers, countries, products WHERE sales.time_id = times.time_id AND sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND customers.country_id = countries.country_id GROUP BY country_name, prod_name, calendar_year /
5.	Verify that the view is created correctly and that 3219 rows exist. From your SQLPlus session, execute the following script: @c:\wkdir\sel_sv.sql* The sel_sv.sqlscript contains the following: SELECT COUNT(*) FROM sales_view;
6.	To maximize performance, your system should already have a materialized view built on the data that is used by the above view. The materialized view is created during the installation of the SH schema data. Oracle's summary management system will automatically rewrite any query using the above view so that it takes advantage of the materialized view.

	Note that the RETURN UPDATED ROWS clause following the MODEL keyword limits the results to just those rows that were created or updated in this query. Using this clause is a convenient way to limit result sets to just the newly calculated values. You will use the RETURN UPDATED ROWS clause throughout the examples.
	The RULES keyword, shown in the examples at the start of the rules, is optional, but recommended for easier reading.
	Many of the examples shown do not require ORDER BY on the COUNTRY column. It is included in the specification in case you want to modify the examples and add multiple countries.

1.	You want to view the SALES value for the product Bounce in the year 2000, in Italy, and set it to 10. To do so, use a "positional cell reference." The value for the cell reference is matched to the appropriate dimension based on its position in the expression. The DIMENSION BY clause of the model determines the position assigned to each dimension—in this case, the first position is product (PROD) and the second position is YEAR. From your SQLPlus session, execute the following script: @c:\wkdir\pos_cell1.sql* The pos_cell1.sqlscript contains the following: COLUMN country FORMAT a20 COLUMN prod FORMAT a20 SELECT SUBSTR(country,1,20) country, SUBSTR(prod,1,15) prod, year, sales FROM sales_view WHERE country='Italy' MODEL RETURN UPDATED ROWS PARTITION BY (country) DIMENSION BY (prod, year) MEASURES (sale sales) RULES ( sales['Bounce', 2000] = 10 ) ORDER BY country, prod, year /
2.	You want to create a forecast value of SALES for the product Bounce in the year 2005, in Italy, and set it to 20. Use a rule in the SELECT statement that sets the year value to 2005 and thus create a new cell in the array. From your SQLPlus session, execute the following script: @c:\wkdir\pos_cell2.sql* The pos_cell2.sqlscript contains the following: SELECT SUBSTR(country,1,20) country, SUBSTR(prod,1,15) prod, year, sales FROM sales_view WHERE country='Italy' MODEL RETURN UPDATED ROWS PARTITION BY (country) DIMENSION BY (prod, year) MEASURES (sale sales) RULES ( sales['Bounce', 2005] = 20 ) ORDER BY country, prod, year / Note: If you want to create new cells, such as values for future years, you must use positional references or FOR loops (discussed later in this tutorial). That is, positional reference permits both updates and inserts into the array. This is called the UPSERT process and is handled with the Oracle SQL MERGE statement.
3.	You want to update the SALES for the product Bounce in all years after 1999 where the values are recorded for Italy and set them to 10. To do so, use a "symbolic cell reference." The value for the cell reference is matched to the appropriate dimension using Boolean conditions. You can use all the normal operators such as <,>, IN, and BETWEEN. In this case the query looks for product value equal to Bounce and any year value greater than 1999. This shows how a single rule can access multiple cells. From your SQLPlus session, execute the following script: @c:\wkdir\sym_cell1.sql* The sym_cell1.sqlscript contains the following: SELECT SUBSTR(country,1,20) country, SUBSTR(prod,1,15) prod, year, sales FROM sales_view WHERE country='Italy' MODEL RETURN UPDATED ROWS PARTITION BY (country) DIMENSION BY (prod, year) MEASURES (sale sales) RULES ( sales[prod='Bounce', year>1999] = 10 ) ORDER BY country, prod, year / Note: Symbolic references are very powerful, but they are used solely for updating existing cells: they cannot create new cells such as sales projections in future years.
4.	You want a single query to update the sales for several products in several years for multiple countries, and you also want it to insert new cells. By placing several rules into one query, processing is more efficient because it reduces the number of times needed to access the data. It also allows for more concise SQL, which supports higher developer productivity. From your SQLPlus session, execute the following script: @c:\wkdir\pos_sym.sql* The pos_sym.sqlscript contains the following: SELECT SUBSTR(country,1,20) country, SUBSTR(prod,1,15) prod, year, sales FROM sales_view WHERE country IN ('Italy','Japan') MODEL RETURN UPDATED ROWS PARTITION BY (country) DIMENSION BY (prod, year) MEASURES (sale sales) RULES ( sales['Bounce', 2002] = sales['Bounce', year = 2001] , --positional notation: can insert new cell sales['Y Box', year>2000] = sales['Y Box', 1999], --symbolic notation: can update existing cell sales['2_Products', 2005] = sales['Bounce', 2001] + sales['Y Box', 2000] ) --positional notation: permits insert of new cells --for new product ORDER BY country, prod, year / The example data has no values beyond the year 2001, so any rule involving the year 2002 or later requires insertion of a new cell. The same applies to any new product name defined here. In the third rule, 2_Products is defined as a product with sales in 2005 which equal the sum of Bounce in 2001 and Y Box in 2000. The first rule, for Bounce in 2002, inserts new cells since it is positional notation. The second rule, for Y Box, uses symbolic notation, but since there are already values for Y Box in the year 2001, it updates those values. The third rule, for 2_Products in 2005, is positional, so it can insert new cells, and you will see them in the output.

	0 for numeric data
	Empty string for character/string data
	01-JAN-2001 for date type data
	NULL for all other data types

1.	Convert projected sales figures of different countries, each in their own currency, into US currency and show both figures. You need to create a table with conversion ratios of local currencies to the US dollar. From your SQLPlus session, execute the following script: @c:\wkdir\cre_dc.sql* The cre_dc.sqlscript contains the following: CREATE TABLE dollar_conv(country VARCHAR2(30), exchange_rate NUMBER) /
2.	Insert two rows into the DOLLAR_CONV table. From your SQLPlus session, execute the following script: @c:\wkdir\ins_dc.sql* The ins_dc.sqlscript contains the following: INSERT INTO dollar_conv VALUES('Canada', 0.75) / INSERT INTO dollar_conv VALUES('Brazil', 0.14) /
3.	Base the sales on the 2001 figures and project market growth by 2005 to be 22% in Canada and 34% in Brazil. To convert the projected sales of Canada and Brazil for year 2005 to US dollars, you can use a Reference MODEL. From your SQLPlus session, execute the following script: @c:\wkdir\rm.sql* The rm.sqlscript contains the following: SELECT SUBSTR(country,1,20) country, year, localsales, dollarsales FROM sales_view WHERE country IN ( 'Canada', 'Brazil') GROUP BY country, year MODEL RETURN UPDATED ROWS REFERENCE conv_refmodel ON ( SELECT country, exchange_rate AS er FROM dollar_conv) DIMENSION BY (country) MEASURES (er) IGNORE NAV MAIN main_model DIMENSION BY (country, year) MEASURES (SUM(sale) sales, 0 localsales, 0 dollarsales) IGNORE NAV RULES ( /* assuming that sales in Canada grow by 22% / localsales['Canada', 2005] = sales[cv(country), 2001] 1.22, dollarsales['Canada', 2005] = sales[cv(country), 2001] * 1.22 * conv_refmodel.er['Canada'], /* assuming that economy in Brazil grows by 34% / localsales['Brazil', 2005] = sales[cv(country), 2001] 1.34, dollarsales['Brazil', 2005] = sales['Brazil', 2001] * 1.34 * er['Brazil'] ) / Note the following: A one-dimensional reference model named CONV_REFMODEL is created on rows from the DOLLAR_CONV table and its measure EXCHANGE_RATE named ER has been referenced in the rules of the main model. The main model has the optional keyword MAIN at the start of its specification, giving it the alias MAIN_MODEL. The MAIN keyword makes it easier to note the start of the main model specification. MAIN_MODEL has two dimensions, COUNTRY and YEAR, whereas the reference model DOLLAR_CONV has one dimension COUNTRY. You can use different styles of accessing the EXCHANGE_RATE measure of the reference model: for Canada it is explicit with model_name.measure_name notation CONV_REFMODEL.ER whereas for Brazil, it is a simple measure_name reference ER. The former notation must be used to resolve any ambiguities in column names across main and reference models. Use the placeholder value of 0 when specifying the new measures LOCALSALES and DOLLARSALES. Other numbers would also work as placeholder value. Growth rates in this example are hard-coded in the rules: growth rate for Canada is 22% and that of Brazil is 34%. Your rules would be much more flexible if they could work with growth values looked up from a separate table of growth rates. Such a table could cover many years and countries.
4.	Use both exchange rate and growth rate reference models to find the projected sales in local currency and US dollars for 2002. Create a table that stores the percentage growth by country and year. From your SQLPlus session, execute the following script: @c:\wkdir\cre_gr.sql* The cre_gr.sqlscript contains the following: CREATE TABLE growth_rate(country VARCHAR2(30), year NUMBER, growth_rate NUMBER) /
5.	Insert rows into the GROWTH_RATE table. From your SQLPlus session, execute the following script: @c:\wkdir\ins_gr.sql* The ins_gr.sqlscript contains the following INSERT INTO growth_rate VALUES('Brazil', 2002, 2.5) / INSERT INTO growth_rate VALUES('Brazil', 2003, 5) / INSERT INTO growth_rate VALUES('Canada', 2002, 3) / INSERT INTO growth_rate VALUES('Canada', 2003, 2.5) /
6.	Write a query that calculates sales for Brazil and Canada, applying the 2002 growth figures and converting the values to dollars. Use the reference model shown below in your query. From your SQLPlus session, execute the following script: @c:\wkdir\rm2.sql* The rm2.sqlscript contains the following: SELECT SUBSTR(country,1,20) country, year, localsales, dollarsales FROM sales_view WHERE country IN ('Canada','Brazil') GROUP BY country, year MODEL RETURN UPDATED ROWS REFERENCE conv_refmodel ON ( SELECT country, exchange_rate FROM dollar_conv) DIMENSION BY (country c) MEASURES (exchange_rate er) IGNORE NAV REFERENCE growth_refmodel ON ( SELECT country, year, growth_rate FROM growth_rate) DIMENSION BY (country c, year y) MEASURES (growth_rate gr) IGNORE NAV MAIN main_model DIMENSION BY (country, year) MEASURES (SUM(sale) sales, 0 localsales, 0 dollarsales) IGNORE NAV RULES ( localsales[FOR country IN ('Brazil', 'Canada'), 2002] = sales[cv(country), 2001] * (100 + gr[cv(country), cv(year)])/100 , dollarsales[FOR country IN ('Brazil', 'Canada'),2002] = sales[cv(country), 2001] * (100 + gr[cv(country), cv(year)])/100 * er[cv(country)] ) / Note the following: This query shows the capability of the MODEL clause in dealing with objects of different dimensionality. The Reference model CONV_REFMODEL has one dimension, whereas the Reference MODEL GROWTH_REFMODEL and the Main SQL MODEL have two dimensions. Dimensions in the single cell references on Reference MODELs are specified using the CV() function, thus relating the cells in Main SQL MODEL with the Reference MODEL. This specification, in effect, is performing a relational join between Main and Reference MODELs. By using the FOR construct, each rule can work with multiple countries, reducing the amount of coding. If you added the FOR construct to the YEAR dimension on the left side of the rules and CV(year) expressions to the right side, you could generalize the rule to multiple years.

1.	You want to do financial planning for a person who earns a salary of $100,000 and has a capital gain of $15,000. His net income will be calculated as salary minus interest payments minus taxes. He pays tax-deductible interest on a loan. He also pays taxes at two rates: 28% for the salary income after interest expense is deducted, and 38% on capital gains. This person would like his interest expense to represent exactly 30% of his income. How can you calculate the taxes, interest expense, and net income that will result? All values of this scenario are stored in a table called LEDGER. The table holds the labels for a financial item in one column and the value of the item in another. From your SQLPlus session, execute the following script: @c:\wkdir\cre_led.sql* The cre_led.sqlscript contains the following: CREATE TABLE ledger (account VARCHAR2(20), balance NUMBER(10,2) ) /
2.	Insert rows into the LEDGER table. From your SQLPlus session, execute the following script: @c:\wkdir\ins_led.sql* The ins_led.sqlscript contains the following: INSERT INTO ledger VALUES ('Salary', 100000) / INSERT INTO ledger VALUES ('Capital_gains', 15000) / INSERT INTO ledger VALUES ('Net', 0) / INSERT INTO ledger VALUES ('Tax', 0) / INSERT INTO ledger VALUES ('Interest', 0) /
3.	To perform the calculations, use the ITERATE option to have the calculations repeated as many times as desired. The first pass will insert the values stored in the LEDGER table into the right side of the rules and create a new set of values for NET, TAX, and INTEREST. The second pass will calculate a new set of values for NET , TAX, and INTEREST using the TAX and INTEREST values calculated in the previous pass. This cycle will be repeated a total of 100 times. From your SQLPlus session, execute the following script: @c:\wkdir\it1.sql* The it1.sqlscript contains the following: SELECT b, account FROM ledger MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b) RULES ITERATE (100) ( b['Net'] = b['Salary'] - b['Interest'] - b['Tax'], b['Tax'] = (b['Salary'] - b['Interest']) * 0.38 + b['Capital_gains'] 0.28, b['Interest'] = b['Net'] 0.30 ) /
4.	Write a query to avoid unnecessary processing time in the previous example. Monitor the results after each loop is complete. If the value of certain results have stopped changing by a significant amount, then you can stop the cycles at that point. From your SQLPlus session, execute the following script: @c:\wkdir\it2.sql* The it2.sqlscript contains the following: SELECT b, account FROM ledger MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b) RULES ITERATE (100) UNTIL ( ABS( (PREVIOUS(b['Net']) - b['Net']) ) < 0.01 ) ( b['Net'] = b['Salary'] - b['Interest'] - b['Tax'], b['Tax'] = (b['Salary'] - b['Interest']) * 0.38 + b['Capital_gains'] 0.28, b['Interest'] = b['Net'] 0.30, b['Iteration Count']= ITERATION_NUMBER + 1 -- the '+1' is needed because the ITERATION_NUMBER starts at 0 ) / Note that: The ABS() function is used as part of the UNTIL clause. This ensures that the difference between the previous and current value can be either positive or negative as long as it is smaller than the condition. With the rule s['Iteration Count']= ITERATION_NUMBER+1, a new row called Iteration Count is defined. It is assigned the value of the ITERATION_NUMBER variable, thus tracking the number of loops performed. In this example you see that only 26 loops were needed to get the example close to a steady state. By stopping here, an extra 74 iterations were avoided.

	Support for analytic SQL functions inside the Model clause
	Greater flexibility on the number of rules a model can contain
	An additional form of Upsert

	Use positional and symbolic cell references
	Use the CV() function and the ANY wildcard
	Use FOR loops
	Use reference and Iterative models

Purpose

Topics

Viewing Screenshots

Overview

Scenario

Setting Up the Sample Data

Reviewing the Example Syntax

Syntax Guidelines

Technical Details

Using Positional and Symbolic Cell References

Using Multi-Cell References on the Right Side of a Rule

Using the CV() Function and the ANY Wildcard

Coding FOR Loops: A Concise Way to Specify New Cells

Understanding the Order of Evaluation of Rules

Handling NULL Measures and Missing Cells

Creating Reference Models

Creating Iterative Models

Using Ordered Rules

Summary

Place the cursor over this icon to hide all screenshots.