Using Data Transformation Operators with Source and Target Operators

Purpose

This tutorial shows you how to create mappings using the different data transformation operators with source and target operators.

Time to Complete

Approximately 1 hour

Topics

This tutorial covers the following topics:

	Overview
	Prerequisites
	Using Join, Sequence, and Transformation Operators in a Mapping
	Loading Data into a Dimension From an External Table
	Using the Pivot and the Unpivot Operator
	Examining Different Operators in Mappings
	Summary

Viewing Screenshots

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over each individual icon in the following steps to load and view only the screenshot associated with that step.

Overview

Mappings are a visual representation of operations that extract data from sources, transform it, and load it into targets. Mappings define the flow of the data and the operations performed on the data.

The design elements in a mapping are the operators. You use operators to represent sources and targets in the data flow. You also use operators to define how to transform the data from source to target.

This tutorial focuses on some of the transformation operators that you can use in combination with source and target operators.

Note: The focus of this tutorial is only on the design of the mappings and does not show how to deploy the mappings and execute them. You work only at the design metadata level.

Back to Topic List

Prerequisites

Before starting this tutorial, you should:

1.	Have completed the tutorial titled, Handling Flat File and COBOL Copybook Sources in Mappings.
2.	Have logged into the Design Center as "etl_owner/etl_owner"

Back to Topic List

Using Join, Sequence, and Transformation Operators in a Mapping

The mapping that you create in this topic uses a join operator to consolidate data from two staging tables, Stg_Countries_Rec and Stg_Regions_Rec(created in the "Handling Flat File and COBOL Copybook Sources in Mappings" tutorial). The target table is the Geography table in the enterprise model warehouse module. You add a primary key column, Country_WH_ID, and connect a sequence generator to make those column serve as a surrogate key. You also add a sysdate transformation to populate the Load_Date column.

Note: Source tables and files typically have alphanumeric record identifiers. One of the steps in building a warehouse is to replace these identifiers with simple, numeric sequences called surrogate warehouse keys. This is useful as it simplifies table joins for query and analysis. In this mapping, you add a primary key column, Country_WH_ID, and connect a sequence generator to make those column serve as a surrogate key.

Back to Topic List

Expand ETL_PROJECT > Databases > Oracle > TRGT_ENTERPRISE_MODEL module.

Right-click Mappings and select New Mapping. Enter Map_Geography as the name of the mapping. Click OK.

In the Projects Navigator panel, expand TRGT_STAGING > Tables. Drag the STG_COUNTRIES_REC table to the mapping canvas.

Similarly, drag STG_REGIONS_REC table to the mapping canvas..

Drag a Joiner operator from the Component Palette to the canvas, to the right of the table operators.

Note: The Joiner operator joins multiple row sets from different sources with different cardinalities and produces a single output row set.

Map all the columns from Stg_Countries_Rec to the Joiner in one step. Click INOUTGRP1 of Stg_Countries_Rec and drag the mapping line to INGRP1 of the Joiner.

In a similar way, map all the columns from Stg_Regions_Rec to INGRP2 of the Joiner operator.

By default the join operator has 2 input groups, you can add more if required.

In the mapping canvas, click the Joiner operator to select it. Note the Property Inspector panel shows the properties of a Joiner operator

Note: If you don't see the Property Inspector panel, in the View menu, select Property Inspector to get it..

Note the Join Condition property. Click the ellipses, highlighted in the screenshot.

The Expression Builder dialog box opens. In the Inputs tab, expand INGRP1 and double-click SUBREGION_NUMBER. This attribute appears in the Join Condition window on the right side of the Expression Builder dialog box.

Click the Equal (=) button under the Join Condition window.

Expand INGRP2 and double-click SUBREGION_NUMBER. The join condition is now complete. Click OK to close the dialog box.

In the Projects Navigator panel, expand TRGT_ENTERPRISE_MODEL > Tables. Drag the GEOGRAPHY table to the mapping canvas.

Map the attributes from Joiner’s OUTGRP1 to GEOGRAPHY table attributes.

Note: For easier mapping, collapse the Joiner’s INGRP1 and INGRP2 by clicking the minus signs (–) at the left of the INGRP1 and INGRP2 labels. You can also minimize the two source table operators, STG_REGIONS_REC and STG_COUNTRIES_REC.

To map from the JOINER output group to the GEOGRAPHY input group, you can also use the fast automapper which lets you copy attributes, match by name or position.

Map the following attributes from OUTGRP1 of the Joiner to the attributes in the INOUTGRP1 of GEOGRAPHY table:

	Joiner OUTGRP1	GEOGRAPHY INOUTGRP1
	COUNTRY_CODE2	COUNTRY_ISO_CODE_2
	COUNTRY_CODE3	COUNTRY_ISO_CODE_3
	COUNTRY_NAME	COUNTRY_NAME
	CURRENCY_CODE	CURRENCY_CODE
	CURRENCY_NAME	CURRENCY_NAME
	REGION_NAME	REGION_NAME
	SUBREGION_NAME	SUB_REGION_NAME

Note: To create space on the canvas, minimize the Joiner operator by clicking on the operator header.

In the Projects Navigator panel, expand TRGT_ENTERPRISE_MODEL > Sequences. Drag SEQ_GEOGRAPHY to the mapping canvas.

Note: The sequence operator generates sequential numbers that increment for each row.

From the SEQ_GEOGRAPHY, map NEXTVAL attribute to COUNTRY_WH_ID attribute in the GEOGRAPHY table.

In the Mapping Editor canvas, click GEOGRAPHY table operator header to select it. In the Property Inspector panel, expand General.

Change the Loading Type property's default value of INSERT to UPDATE/INSERT.

Note: Warehouse Builder supports INSERT/UPDATE and UPDATE/INSERT load types that generates the MERGE statement in PL/SQL. Merging enhances load performance by performing inserts and updates in single DML statements.

Note: If the Property Inspector is not open or visible, go to View menu and click Property Inspector.

In the Property Inspector panel, expand Conditional Loading. Set the property Match by constraint to No Constraint. Click OK.

Note: The value of No Constraints informs OWB not to use primary key or unique key information when matching. The user can specify arbitrary, non-key matching criteria by setting various properties on the table operator’s attribute properties.

10.

To update data in the GEOGRAPHY table, which uses a sequence generator for the surrogate key, you must change the default attribute properties for the surrogate key and natural key attributes of the GEOGRAPHY table.

Note: Because sequence generators are used with primary keys to create surrogate warehouse keys, you may have questions such as: How do you update a table with a surrogate key, if you do not know what value that surrogate key will have? By what columns do you match? How do you prevent the update from overwriting the primary key with a new sequence number?

The approach is to use the natural key for matching, and not the surrogate key. You accomplish this by setting the "Match Column when Updating Row" attribute property to Yes for the natural key, and No for the surrogate key.

For each attribute in the GEOGRAPHY table, click the attribute to show the related properties in its Property Inspector panel. Expand Loading Properties. Change the attribute properties using the values in the following table:

Property	Country_WH_ ID (surrogate key)	Country_Iso_Code_2 (natural key)	All other attributes
Load Column when Inserting Row	Yes	Yes	Yes
Load Column when Updating Row	No	No	Yes
Match Column when Deleting Row	Yes	Yes	No
Match Column when Updating Row	No	Yes	No
Update: Operation	=	=	=

Note: If you had chosen the GEOGRAPHY_UK unique key as the match by constraint, then you would not have had to specify the match column when updating row details.

11.

From the Component Palette, drag the Transformation operator to the mapping canvas.

Note: The Transformation operator transforms the attribute value data of rows within a row set using a PL/SQL function or procedure.

In the Add Transformation Operator dialog box, accept the default option, “Select from existing repository objects and bind.”

Expand Public Transformations > Oracle > Pre-Defined > Date. Select SYSDATE () return DATE. Click OK.

Map the VALUE attribute of the SYSDATE transformation to the LOAD_DATE attribute of the GEOGRAPHY mapping table.

The mapping is now complete. Click Save All () to save your mapping. Click Yes in the Save Confirmation dialog box.

Next you see some important properties that you can use to further customize your mapping.

12.

You can enable DML error logging feature. In the mapping canvas, select the GEOGRAPHY table operator.

In the Property Inspector panel, expand Error Table.

Note: Error logging enables the processing of DML statements to continue despite errors being encountered during the statement execution. The details of the error such as the error code and the associated error message are stored in an error table. After the DML operation completes, you can use the error table to correct rows with errors and subsequently process. DML error logging is supported for SQL statements such as INSERT, UPDATE, MERGE, and multi-table insert.

It is useful in long-running, bulk DML statements - for example processing 1 million records and 10 fail, with DML error logging all good records can be committed and the 10 error rows recorded in an error table. To use DML error logging, the target schema must be created on an Oracle Database 10g R2 or later versions.

To enable DML error logging, specify the name of the error table using the DML Error table name property, such as GEOGRAPHY_ERR. Note that there is a separate Error table name property that logs the invalid records for when data rules are enabled.

Note: The "Error table name" property is used only for "logical errors", which include the orphan management errors, and data rule violation errors. The DML error table format and the logical error table format are not the same, so they cannot actually share the same table.

The logical table format has several extra columns, whose names begin with "ERR$$$". The DML error table has longer VARCHAR columns to accommodate error data that was too large to fit into the target column (not necessary for logical errors).

You can also set the Truncate error table property to Yes to enable housekeeping.

13.

The execution of mappings that contain data objects for which DML error logging is enabled fails if the number of errors generated exceeds the specified maximum number of errors for the mapping.

The default set for this value is 50. You can modify this value by setting the Maximum number of errors configuration property of the mapping.

In the Projects Navigator panel, right-click Map_Geography and select Configure.

In the Configuration of MAP_GEOGRAPHY panel, expand Runtime parameters. You can change the Maximum number of errors property to a suitable value.

You can also set the Default audit level property to one of the four options shown in the screenshot. By default, it is set to ERROR DETAILS.

Click Save All () to save your mapping. Click Yes in the Save Confirmation dialog box.

Close the Configuration of MAP_GEOGRAPHY and the MAP_GEOGRAPHY panels.

Back to Topic List

Loading Data into a Dimension From an External Table

In this topic you create a mapping that uses an external table (pointing to a sampled flat file) as a source to load a dimension in the target data mart. The CHANNELS dimension derives its data from the CHANNELS_EXT external table. The CHANNELS_EXT external table points to the File_Marketing_Channels.csv flat file.

An external table is a read-only table that is associated with a flat file. It represents data from non-relational sources in a relational table format as if you are accessing an Oracle table. By using external tables, you can eliminate the need for flat-file staging tables.

Back to Topic List

1.
Expand TRGT_DATA_MARTS. Right-click Mappings and select New Mapping. Enter Map_Channels as the name of the mapping. Click OK.

2.
In the Projects Navigator panel, expand TRGT_DATA_MARTS > External Tables. Drag Channels_ext to the mapping canvas.

3.
Expand TRGT_DATA_MARTS > Dimensions. Drag Channels to the mapping canvas.

Expand the CHANNELS dimension to view all its attributes.

4.
Map the attributes of the external table with the attributes of the dimension as shown in the table below:

CHANNELS_EXT External Table CHANNELS Dimension

CHANNEL_ID

CHANNEL Level: SOURCE_ID

CHANNEL_DES

CHANNEL Level: CHANNEL_DESC

CLASS_ID

CLASS Level: SOURCE_ID

CHANNEL_CLASS

CLASS Level: NAME

EFF_FROM_DATE

CHANNEL Level: EFF_FROM_DATE

EFF_TO_DATE

CHANNEL Level: EFF_TO_DATE

5.
From the Component Palette, drag a Constant operator on the canvas.

Double-click the Constant operator. In the Constant Editor, click Output Attributes.

The Constant Editor opens. Define an output attribute for the constant operator as shown below:

Attribute: OUTPUT1
Data type: VARCHAR

Length: 2

In the Expression field, enter 'A' (to indicate active). Click OK.

6.
Map the OUTPUT1 attribute of the Constant operator to the STATUS attribute of the CHANNELS dimension as shown in the screenshot below:

7.
One design benefit of using an external table is that you can enable parallel access to the file through the external table.

Parallel read capabilities optimize performance in high volume systems and enterprise-level systems. To configure an external table for parallel access, in the Projects Navigator panel, expand TRGT_DATA_MARTS > External Tables. Right-click Channels_ext and select Configure.

In Configuration of CHANNELS_EXT panel, expand Parallel.

By default Parallel Access Mode property is set to false and Parallel Access Drivers(the degree of parallelism) is set to 1 . The ORACLE_LOADER access driver attempts to divide large data files into chunks that can be processed separately. The degree of parallelism indicates the number of access drivers that can be started to process the data files

Click Save All () to save your mapping. Click Yes in the Save Confirmation dialog box. Close the MAP_CHANNELS panel and the Configuration of Channels_ext panel.

Back to Topic List

Using the Pivot and the Unpivot Operator

The Pivot operator enables you to transform a single row of attributes into multiple rows. The pivot mapping example transforms a table that has a row for each year with the quarterly sales in a table with a row for each quarter.

If your data is organized as follows:

Year	Q1_Sales	Q2_Sales	Q3_Sales	Q4_Sales
2005	10000	15000	14000	25000
2006	12000	16000	15000	35000
2007	16000	19000	15000	34000

You can perform pivot operation to transform the above data set to the following with a row for each quarter:

Year Quarter Sales
2005 Q1 10000

2005 Q2 15000

2005 Q3 14000

2005 Q4 25000

2006 Q1 12000

2006 Q2 16000

and so on....

The reverse of this scenario is the unpivot operator.

In this topic, you examine simple mappings that use the pivot and the unpivot operators.

	Examine a Mapping that Performs a Pivot
	Examine a Mapping that Performs an Unpivot

Back to Topic List

Examine a Mapping that Performs a Pivot:

1.	Expand TRGT_ENTERPRISE_MODEL > Mappings. Right-click PIVOT_SALES and select Open. Click Auto Layout () to view the complete mapping. Note: As you can see in the mapping the source table, SALES_BY_QTR, has columns that store for each year the quarterly sales. The pivot operator allows you to define the input columns, the output columns and how the data is pivoted. The target table is SALES_TGT that stores a row for each quarter.
2.	Examine the internals of the Pivot Operator in the mapping. Double-click the PIVOT operator in the mapping. The PIVOT Editor opens. Click Input Attributes. Note that YEAR is selected as the key column as this remains the same for each pivoted row.
3.	Click Output Attributes or click Next. Note that QUARTER is selected as the Row locator column. This is the pivot column.
4.	Click Pivot transform or click Next. Here you define how the transformation should happen. You define how the row is transformed from a row with columns to a number of rows; you enter a row for each case you may have (so you have a row for Q1, a row for Q2, Q3 and Q4). Click OK. Close the PIVOT_SALES mapping panel. Using the pivot operator, you can define a pivot transformation in a simple step by step manner.

Back to Topic

Examine a Mapping that Performs a Unpivot:

1.	Expand TRGT_ENTERPRISE_MODEL > Mappings. Right-click UNPIVOT_SALES and select Open. Click Auto Layout() to view the complete mapping. Note: In Unpivot operation, you select the key column, then define the row locator (or the unpivot column), and then define unpivot transformation details.
2.	Examine the internals of the Unpivot Operator in the mapping. Double-click the UNPIVOT operator in the mapping. The UNPIVOT Editor opens. Click Input Attributes. Note that YEAR is selected as the key column.
3.	Click Row Locator or click Next. Here, you select the row locator column and then define the distinct row locator values for each match row.
4.	Now click Output Attributes or click Next. Here, you define the output attributes for the unpivot operator.
5.	Click Unpivot transform or click Next. Here you define how the transformation should happen. You define how the column data is taken from the matching row. Click OK. Close the UNPIVOT_SALES mapping panel.

Back to Topic

Back to Topic List

Examining Different Operators in Mappings

In this topic, you examine completed mappings. The focus is to introduce you to the different operators and their properties that you must know to design any kind of ETL job.

1.	Expand TRGT_DATA_MARTS > Mappings. Right-click Map_Dim_D2_Customers and select Open. This mapping uses many different operators. Note: On the toolbar, click Fit in Window () to view the complete mapping. You can minimize all the operators to view the mapping as shown in the screenshot below. Also, click Auto Layout to get a cleaner layout of the operators on the canvas. Note: The CUSTOMERS table retrieves data from multiple sources. This mapping joins the People and Business_Parties tables to determine a person's role (employee, vendor, customer, manager). The Address tables are joined to handle multiple addresses (billing, mailing, shipping, and so on). Note: All the four source tables could have been joined by using a single joiner and filter for simplicity.
2.	Examine the CUSTOMER_FLTR operator. Double-click CUSTOMER_FLTR operator to expand it or you may just click it to highlight it. In the Property Inspector panel, check the Filter Condition property. Note: The filter operator conditionally filters out rows from a row set. To view the whole filter condition click on the ellipses. Note: The filter condition extracts data only for records where the role is set as customer. Click OK.
3.	Examine the YOB_EXPR expression operator. Double-click YOB_EXPR operator. If required, click to expand it. Double-click on it again to start the Expression editor. Note: The expression operator enables you to write SQL expressions that define non-procedural algorithms for one output parameter of the operator. The expression text can contain combinations of input parameter names, variable names, and library functions. In the Expression Editor, click Output Attributes. To view the complete expression, click the ellipses. Click OK in the Expression Builder. Click OK to close the Expression Editor. Close the Map_Dim_D2_Customers mapping panel. This completes the review of this mapping. You learned how you can use the filter and the expression operators in a mapping.
4.	Now, let us examine another mapping for some more operators that can be used to define different ETL operations. Expand TRGT_DATA_MARTS > Mappings. Right-click MAP_SALES and select Open. Examine the complex mapping in parts. Note: Click Fit in Window() to view the complete mapping.
5.	Click Auto Layout () to get the enlarged view of the mapping. Click Zoom Out () twice to get the desired view. Note that there are three sources, ORDER_ITEMS table, ORDERS_VIEW_STG view, and an external table WEEKLY_SALES_EXT. There is a Join operator that is joining the two sources (the table and the view). In between there are two expression operators. You have seen how these operators work in the previous topics. If you want to get into the detail, check each operator and its properties.
6.	The external table source is mapped to a pivot operator with a few columns getting processed through an expression operator. You have already seen how to use an external table and a pivot operator. So lets move on with the mapping.
7.	One interesting operator added in this mapping is the SET Operation operator. Click on it to select it. In the Property Inspector panel, the Set operation is set to UNIONALL. Note: The SET operation operator performs union, union all, intersect, and minus operations in a mapping.
8.	Scroll horizontally to get to the AGG operator. This is an Aggregator operator. The Aggregator operator performs data aggregations, such as SUM and AVG, and provides an output row set with aggregated data. Click the AGG operator to select it. Examine the Aggregator properties in the Property Inspector panel. The most important properties are the Group By clause and the Having Clause. Note: You can use the CUBE and ROLLUP clauses in the Group By clause of the aggregation operator. You can also use GROUPING in the HAVING clause. To view the Group By clause for this mapping, click on the ellipses. In the Expression Builder, view the Group By clause. Click OK.
9.	Double-click AGG operator to open the Aggregator editor. Click Output Attributes. Note the TOTAL_REVENUE column, the expression indicates that it is sum of the sales. If you want to add a new output attribute, just enter its name in the Attribute column and then define the aggregator expression. For example, you want to calculate average sales. Define an output attribute, AVG_SALES. Click on the ellipses besides the Expression column to open the Expression Builder. From the Function drop down list, select AVG function. Choose between ALL or DISTINCT. From the drop down list, select the attribute on which the aggregation should be performed. Select SALES. Click Use Above Values to get the expression in the text area. Click OK. Click OK again to close the Aggregator Editor. The output attribute is now defined. This attribute must then be mapped to a corresponding output attribute in the target. Click Save All () to save your mapping. Click Yes in the Save Confirmation dialog box. Close the MAP_SALES mapping panel. You learned about the various different operators and how they can be used to solve business needs. You can implement any ETL operations using these operators.

Back to Topic List

Summary

In this lesson, you learned how to:

	Use the join, sequence, and the transformation operator in a mapping
	Load data into a dimension using an external table
	Use the pivot and the unpivot operator
	Use different operators such as filter, expression, set and aggregator operators in a mapping

Back to Topic List

Year	Quarter	Sales
2005	Q1	10000
2005	Q2	15000
2005	Q3	14000
2005	Q4	25000
2006	Q1	12000
2006	Q2	16000
		and so on....