Extracting, Transforming and Loading Data
Extracting, Transforming and Loading Data
This tutorial shows you how to define simple ETL mapping and also
shows how to work with user-defined database objects in Warehouse Builder. In
this tutorial, you also define a simple process flow.
Note: The screenshots in this tutorial may appear slightly
different from yours depending on the order in which you perform the other tutorials
in this OBE Series. You can ignore the irrelevant items you see on your screens.
Approximately 1 hour
Note: This tutorial and its setup scripts support OWB 11g Release 1, only. Earlier versions of this Oracle By Example tutorial are available for OWB 10g Releases 1 and 2.
This tutorial covers the following topics:
Place the cursor over this icon to load and view all the screenshots for this tutorial.
(Caution: This action loads all screenshots simultaneously, so response
time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor
over each individual icon in the following steps to load and view only the screenshot
associated with that step.
ETL stands for Extract, Transform and Load. ETL involves the
movement and transformation of data from your sources to your targets. In the
tutorial "Setting up the Oracle Warehouse Builder Project", you learned
how to define Warehouse Builder modules that connect to your source and target
schemas. Now, you learn how to create the data movement and transformation logic.
This tutorial begins with creating a simple mapping, using
the mapping editor. Warehouse Builder facilitates creation of large and complex
mappings, increasing productivity and reducing the number of errors.
Back to Topic List
Before starting this tutorial, you should:
| 1. |
Complete the Setting Up the Oracle Warehouse Builder Project tutorial.
Note: For this tutorial, you must have completed
the Installation and Configuration of Oracle Workflow as mentioned in
Prerequisite topic of the tutorial "Setting up the Oracle Warehouse
Builder Project".
Also, you must have completed the "Importing
Metadata in OE module" subtopic in tutorial "Setting up
the Oracle Warehouse Builder Project"
|
| 2. |
Log in to SQL Plus and run the grant_priv.sql script.
The grant_priv.sql script grants
select privileges to a few OE and HR tables that you use as source
tables in your mappings later in this tutorial.
Log in to SQL*Plus as sys and run the grant_priv.sql
script that you downloaded into the c:\owbdemo_files folder.
To run the script, enter the following command at the
SQL> prompt:
@c:\owbdemo_files\grant_priv.sql;
|
Back to Topic List
In this topic, you create a basic mapping to understand how
ETL design works in Warehouse Builder. You use the OE sample schema as the source
from which you load the data into DP_TGT target.
In this topic, you perform the following:
Back to Topic List
Creating a Simple ETL Mapping
To create a simple ETL Mapping, perform the following steps:
|
1.
|
Assuming that you have logged in as owb
user in the Design Center, in the Project Explorer panel, expand HANDSON
> Databases > Oracle > DP_TGT. Right-click Mappings
and select New.
The Create Mapping window appears.
|
| 2. |
In the Create Mapping window, enter
SIMPLE_MAP as name of the mapping. Click OK.
The Mapping Editor launches the SIMPLE_MAP mapping.
|
| 3. |
You can add existing sources and targets to a mapping through the Explorer
panel, which is on the top left of the Mapping Editor.
In the Explorer panel, ensure that Available objects tab is selected.
Expand Oracle > OE > Tables. Drag ORDERS to the canvas.
Also, drag ORDER_ITEMS on to the canvas.
|
| 4. |
From the Palette, drag the JOINER operator to the canvas. Place
the operator to the right of the two tables. Click ( )
icon of each operator headers to maximize the operators.
|
| 5. |
Connection lines graphically represent how the data flows from a source,
through operators, and to a target.
Drag a connection line from ORDERS.INOUTGRP1 into JOINER.INGRP1.
|
| 6. |
Drag a connection line from ORDER_ITEMS.INOUTGRP1 into JOINER.INGRP2.
Observe that the attributes from the source tables get added as input
attributes in the JOINER operator.
|
| 7. |
In the Mapping Editor Canvas, click the JOINER operator header
to highlight it.
In the Joiner Properties panel, examine the Join Condition property.
Click Edit ( )
to view the full join condition. In the Expression Builder window, the
join condition is as follows: INGRP2.ORDER_ID = INGRP1.ORDER_ID.
Warehouse Builder frames the join condition based on the constraints between
the two tables, ORDERS and ORDER_ITEMS. Click OK.
|
| 8. |
From the Palette, drag a Table Operator to the canvas. Place the
operator to the right of the JOINER operator.
The Add Table Operator window displays.
|
| 9. |
In the Add Table Operator window, select "Create unbound operator
with no attributes". In the New operator name field, enter CONS_ORDERS.
Click OK.
|
| 10. |
Drag a connection line from JOINER.OUTGRP1 to CONS_ORDERS.INOUTGRP1.
Maximize the CONS_ORDERS operator and ensure that the operator
is highlighted.
|
| 11. |
In the Table Operator Properties panel, for the Loading Type property,
select TRUNCATE/INSERT. If you want to insert or update, you can
simply select INSERT/UPDATE and Warehouse Builder will generate code with
a MERGE statement
|
| 12. |
The CONS_ORDERS table still does not exist in the repository. Right-click
the CONS_ORDERS table operator header and select Create and
Bind from the menu.
In the Create and Bind window, examine the details and click OK.
From the Mapping menu, select Close to exit the Mapping
Editor.
|
| 13. |
In DP_TGT, expand Tables. Double-click CONS_ORDERS.
Examine the table details in the Data Object Editor. From Diagram
menu, select Close Window to exit the Data Object Editor.
Note that you derived this table directly from the mapping editor and
did not create the table and its columns via typing. This adds to your
productivity and saves time.
From the Design menu, select Save All. In Warehouse Builder
warning dialog box , click Yes to save your work.
|
Back to Topic
Deploying the Mapping and Related Objects
Deployment is the process of creating your target system from
the logical design or model. When you deploy ETL mappings, the deploy procedure
generates a PL/SQL script. You need to explicitly execute the scripts to perform
ETL. Before you deploy a mapping, you should have deployed the objects the mapping
refers to.
To deploy SIMPLE_MAP mapping, perform the following steps:
| 1. |
To deploy, you must launch the Control Center Manager. From the Tools
menu, select Control Center Manager.
|
| 2. |
Click DP_TGT_LOCATION if not already highlighted. In the Object
Details panel, observe the two objects you defined, SIMPLE_MAP and CONS_ORDERS,
design status is set to New and Deploy Status set to Not Deployed.
|
| 3. |
Click Default Actions to change Deploy Action from None to Create.
As a result of performing other tutorials, you may have many other objects
under DP_TGT_LOCATION. In order to select the two objects, SIMPLE_MAP
and CONS_ORDERS, expand DP_TGT_LOCATION > DP_TGT.
Expand Tables. Expand Mappings. Hold ctrl and select
the two objects, SIMPLE_MAP and CONS_ORDERS, and then click
Default Actions as shown in the screenshot below.
|
| 4. |
From the File menu, select Deploy > To Control Center.
You can monitor the progress of deployment in the Control Center Jobs
panel on the lower right of the Control Center window.
In the Control Center Jobs panel, the deployment status changes from
"generate" to "run" to "completed successfully."
The mappings are deployed successfully and are ready for execution.
To view the generated code for the mapping, double-click the HANDSON
job in the Control Center Jobs panel.
Note: If you performed deployment in any of the other tutorials
in this series, you may see multiple jobs in this panel. If you see multiple
jobs in the window, check the Finished column for the date and
time and select the latest job.
|
| 5. |
In the Job Details: Handson dialog box, click the Script
tab.
Note that Warehouse Builder generates a .pls (PL/SQL) code for the deployed
mapping. To view the generated code for SIMPLE_MAP, select SIMPLE_MAP
and click View Code
Scroll down in the code window to examine the generated code. From
Code menu, select Close. Then in the Job Details window,
from File menu, click Close to exit.
|
Back to Topic
Executing the Mapping
You learned that when you deploy a mapping Warehouse Builder
generates a .pls script. You need to execute that deployed code to actually
perform the task the mapping was designed to do."
The SIMPLE_MAP mapping was designed to select rows from OE.ORDERS
and OE.ORDER_ITEMS that would go through a join condition using the JOINER operator
and then load the resultant data into CONS_ORDERS table.
In order to execute the mapping, perform the following steps:
| 1. |
Assuming you did not exit the Control Center, expand DP_TGT_LOCATION
> DP_TGT > Mappings and click SIMPLE_MAP.
|
| 2. |
From the File menu, select Start.
When you click Start, the Control Center Jobs panel switches to the Execution
tab. You can monitor the execution in this panel.
To view the execution details, double-click SIMPLE_MAP in the
Job column of the Control Center Jobs panel.
|
| 3. |
In the Job Details: SIMPLE_MAP dialog box, click the Execution Results
tab and monitor Row Activity.
You can get a count of how many rows were inserted in the target table,
CONS_ORDERS. From the File menu of the Job Details: SIMPLE_MAP
dialog box, select Close. From the File menu of the Control
Center Manager, select Close to exit.
|
Back to Topic
Viewing the Loaded Data in
CONS_ORDERS table
To view the data that was loaded in CONS_ORDERS table as a
result of executing the mapping, perform the following steps:
| 1. |
In Project Explorer panel, expand Tables under DP_TGT. Right-click
CONS_ORDERS and select Data from the menu.
|
| 2. |
The Relational Data Viewer shows the data in the CONS_ORDERS table. Scroll
up/down or left/right to examine the data. From Object menu, select
Close Window to exit the Data Viewer.
|
Back to Topic
Back to Topic List
In this topic you transform and load data that originally resided
in User-Defined Database objects.
In this topic, you perform the following:
Back to Topic List
Creating a MAP_CUSTOMERS Mapping
that uses a User-Defined Database Object
Now you define a mapping that uses a user-defined object
from an OE.CUSTOMERS table as a source and loads the transformed data into a
target table, SCALAR_CUST.
To create the mapping, perform the following steps:
| 1. |
In Project Explorer panel, expand DP_TGT. Right-click Mappings
and select New.
The Create Mapping window appears.
|
| 2. |
In the Create Mapping window, enter MAP_CUSTOMERS
as name of the mapping. Click OK.
The Mapping Editor launches the MAP_CUSTOMERS
mapping.
|
| 3. |
In the Explorer panel, ensure that Available objects tab is selected.
Expand Oracle > OE > Tables. Drag CUSTOMERS table
to the canvas. Maximize the CUSTOMERS operator by clicking maximize
( ).
|
| 4. |
Double-click CUST_ADDRESS attribute in the CUSTOMERS operator.
In the Table Editor: Customers window, note the data type of the CUST_ADDRESS
attribute. The data type is OE.CUST_ADDRESS_TYP, a user-defined
data type. Click OK.
|
| 5. |
To interpret the CUST_ADDRESS_TYP in CUSTOMERS you use the Expand Operator.
From the Palette, drag the Expand Object Operator on
the canvas. The Add Expand Object window displays.
|
| 6. |
In the Add Expand Object window, select CUST_ADDRESS_TYP, as
shown in the screenshot. Click OK.
Maximize the CUST_ADDRESS_TYP operator by clicking maximize ( ).
|
| 7. |
Drag a connection line from CUSTOMERS.CUST_ADDRESS to INGRP1.CUST_ADDRESS_TYPE.
|
| 8. |
Note that the type has COUNTRY_ID. To find the corresponding country
name, you define a lookup.
From the Palette, drag the Key Lookup Operator on the canvas.
Place the operator to the right of the Expand Object operator. The Key
Lookup wizard launches. Click Next on the Welcome page.
|
| 9. |
In the Name page, accept the default name, KEY_LOOKUP. Click
Next.
|
| 10. |
In the Groups page, accept the default groups. Click Next.
|
| 11. |
In the Input Connections page, expand CUST_ADDRESS_TYP > OUTGRP1
and select COUNTRY_ID.
Click > to move COUNTRY_ID from Available Attributes list
to Mapped Attributes. Click Next.
|
| 12. |
In the Lookup page, click the drop down list to select the object,
which has the lookup result. Expand OE and select COUNTRIES.
|
| 13. |
Click in the blank Lookup table Key field and select COUNTRY_C_ID_PK.
|
| 14. |
Click in the blank Input Attribute field. Note COUNTRY_ID gets
added under COUNTRY_C_ID_PK.
Select COUNTRY_ID under Input Attribute field next to the COUNTRY_ID
lookup table key. Click Next.
|
| 15. |
In the No-match Rows page, enter 'UNKNOWN' (including single
quotes) in the Default Value field for COUNTRY_NAME Lookup Table
Column. Click Next. In the Summary page, examine the details
and click Finish.
|
| 16. |
If you are not able to see all the objects similar to the following
screenshot, then in the Mapping Editor, click Auto-Layout on the toolbar
to bring the mapping to its default size. Also, you can click Fit in
Window to adjust the mapping in your window.
The Mapping looks as follows:
|
| 17. |
Now you create the target table to load the transformed data into.
From the Palette, drag the Table Operator on the canvas.
|
| 18. |
In the Add Table Operator window, select Create unbound operator
with no attributes. Enter SCALAR_CUST as the new operator
name. Click OK. Place the operator to the right of the Key Lookup
operator.
|
| 19. |
Drag a connection line from KEY_LOOKUP.COUNTRY_NAME to SCALAR_CUST.INOUTGRP1.
|
| 20. |
From CUST_ADDRESS_TYP, select all the attributes except COUNTRY_ID.
Drag a connection line to SCALAR_CUST.INOUTGRP1, as shown in
the screenshot.
In Connect Operators window, accept the default option, Copy source
attributes to target group and match. Click Go. Click OK.
|
| 21. |
From CUSTOMERS, hold shift and select CUST_FIRST_NAME and CUST_LAST_NAME
and drag a connection line into SCALAR_CUST>INOUTGRP1.
In Connect Operators window, accept the default option, Copy source
attributes to target group and match. Click Go. Click OK.
|
| 22. |
Minimize all the operators except SCALAR_CUST.
Click SCALAR_CUST operator header. In the Table Operator properties
panel, for Loading Type property, select TRUNCATE/INSERT.
|
| 23. |
Note that the SCALAR_CUST table is still not existing in the repository.
Right-click SCALAR_CUST header and select Create and Bind
from the menu.
In the Create and Bind window, examine the details and click OK.
The table SCALAR_CUST gets added under DP_TGT> Tables.
This completes the mapping. From the Mapping menu, select Close
to exit the Mapping Editor.
|
Back to Topic
Generating Code for the MAP_CUSTOMERS
Mapping
To generate the code for this mapping, perform the following
steps:
| 1. |
Expand DP_TGT > Mappings. Right-click MAP_CUSTOMERS
and select Generate.
The Generation Results window displays.
|
| 2. |
Click the Script tab. Select MAP_CUSTOMERS and click View
Code.
|
| 3. |
Examine the generated code for the mapping. Notice the NVL in the code
to ensure the 'UNKNOWN' country name, the ANSI SQL code of the left outer
join and the splitting out of the type in SQL.
Note: This code is ready for deployment. You may also right-click
the MAP_CUSTOMERS mapping and click Deploy to deploy the
mapping. Rather you will generate, deploy and execute the mapping from
the Control Center Manager in the forthcoming topics in this tutorial.
In the MAP_CUSTOMERS window, from Code Menu, select Close
to exit. From the File menu, select Close to exit the Generation
Results window.
From the Design menu, select Save All. In Warehouse Builder
warning dialog box , click Yes to save the changes.
|
Back to Topic
Deploying the MAP_CUSTOMERS
Mapping
To deploy the mapping, perform the following steps:
| 1. |
From the Tools menu, select Control Center Manager.
|
| 2. |
In the Control Center, expand DP_TGT_LOCATION > DP_TGT. Expand
Tables. Expand Mappings.
|
| 3. |
Hold Ctrl and select SCALAR_CUST and MAP_CUSTOMERS,
as shown in the screenshot. Note that both the objects Design Status is
set to New and Deploy Status is set to Not Deployed.
|
| 4. |
Click Default Actions to set the Deploy Action to Create.
|
| 5. |
From the File menu, select Deploy > To Control Center.
In the Object Details panel, the Deploy Status is set to Success.
|
Back to Topic
Executing the MAP_CUSTOMERS
Mapping
To execute the MAP_CUSTOMERS mapping, perform the following
steps:
| 1. |
Assuming you did not exit the Control Center, expand DP_TGT_LOCATION
> DP_TGT > Mappings and click MAP_CUSTOMERS. From the
File menu, select Start.
|
| 2. |
When you click Start, the Control Center Jobs panel switches to the Execution
tab. You can monitor the execution in this panel.
To view the execution details, double-click MAP_CUSTOMERS in the
Job column of the Control Center Jobs panel.
|
| 3. |
In the Job Details: MAP_CUSTOMERS dialog box, click the Execution
Results tab and monitor Row Activity.
You can get a count of how many rows were inserted in the target table,
SCALAR_CUST.
From the File menu of the Job Details: MAP_CUSTOMERS dialog box,
select Close. From the File menu of the Control Center Manager,
select Close to exit.
|
Back to Topic
Viewing the Loaded
Data in SCALAR_CUST table
To view the data that was loaded after executing the
MAP_CUSTOMERS mapping, perform the following steps:
| 1. |
In Project Explorer panel, expand Tables under DP_TGT. Right-click
SCALAR_CUST and select Data from the menu.
|
| 2. |
The Relational Data Viewer shows the data in the SCALAR_CUST table. Scroll
up/down or left/right to examine the data. From Object menu, select
Close Window to exit the Data Viewer.
|
Back to Topic
In Warehouse Builder, a process flow is used to manage the
execution of mappings and related activities, such as uploading a file to the
warehouse machine using FTP or notifying the administrator in case of an error
in the load process. Using process flows, you can define the sequence in which
the mappings will run and also define actions based on the outcome of the execution
for example sending a notification email if the mapping execution failed. Warehouse
Builder also allows you to schedule the process flows.
The Process Flow Editor includes a variety of activities that
you add and then connect with transitions to design a flow. An Activity represents
units of work in a process flow. These units of work can involve components
internal or external to Warehouse Builder. Transitions indicate the sequence
and conditions in which to launch activities.
In this topic, you perform the following tasks:
Back to Topic List
Creating
the Process Flow Module and Package
Before you can start designing the process flow, you need
to create a process flow module that includes a process flow package. Process
Flow Modules include Process Flow Packages. The Process Flow Package is a grouping
mechanism that determines which process flows you can interrelate. At run time,
you can launch one process flow that launches other process flows that exist
in the same process flow package.
In this topic, you create a Process Flow Module named PFLOWS,
a Process Flow Package named L_MAPS. The Process Flow Module acts as
a container by which you can validate, generate, and deploy a group of process
flows.
To create the process flow module and package, perform
the following steps:
| 1. |
In the Project Explorer, expand HANDSON >
Process Flows. Right-click Process Flow Modules and select
New from the menu. The Create Module Wizard is launched. Click
Next on the Welcome page.
|
| 2. |
In the Name and Description page, enter
PFLOWS as the name of the process flow module. Click Next.
|
| 3. |
Now you create an Oracle Workflow
location for the process flow module, PFLOWS.
Note: You define an Oracle Workflow location
to specify where you want to deploy your process flows. The Oracle Workflow
location points to a workflow schema (OWF_MGR) that runs in the target
database. Hence, you should have Oracle Workflow installed and configured
beforehand.
On the Connection Information page, observe that Warehouse
Builder creates a location with a default name, PFLOWS_LOCATION1. Click
Edit... to specify the details of the Workflow location.
In the Edit Oracle Workflow Location window, change
the name of the location to PFLOWS_LOCATION (remove the 1
from the default name).
This tutorial uses owf_mgr as the schema name
and owf_mgr as the password. If your Oracle Workflow schema has
a different name and password, substitute accordingly. Ensure the Workflow
User is unlocked.
Provide the following information to create a fully
qualified location:
 |
Name: PFLOWS_LOCATION
(remove the 1 from the default name) |
 |
Password: owf_mgr |
 |
Host Name: localhost |
 |
Port Number: 1521 |
 |
Service Name: orcl |
 |
Schema: owf_mgr |
 |
Version: 2.6.4 |
Click Test Connection. If the connection is successful,
click OK. If the connection is not successful, check the connection
details and try again. Click OK. Click Next. On the Summary
page, review the details and click Finish
|
| 4. |
In the Create Process Flow Package
window, enter L_MAPS as the name and click OK.

|
| 5. |
The Create Process Flow window is launched. Enter
LOAD_PF as the name of the process flow.

Click OK.
Warehouse Builder launches the Process Editor and displays
the process flow with a Start activity and an End_Success activity.
In the next topic, you now learn how to design the process
flow with activities and transitions.
|
Back to Topic
Designing
the Process Flow
To design the process flow, perform the following steps:
| 1. |
In the Process Editor, from the Palette panel, drag the Fork activity
to the canvas. You can use the Fork activity to launch multiple, concurrent
activities after the completion of an activity. You can also drag the
Fork activity from the Selected Objects tab in the Explorer panel.
|
| 2. |
In the Process Editor, in the Explorer panel, click the Available
Objects tab. Expand DP_TGT and drag SIMPLE_MAP mapping
to the canvas. Place the operator to the right of the FORK activity.
|
| 3. |
In the Explorer panel, ensure that the Available Objects tab
is selected. From DP_TGT, drag the MAP_CUSTOMERS mapping to the canvas.
|
| 4. |
From the Palette panel, drag an AND activity to the
canvas, to the right of the SIMPLE_MAP and MAP_CUSTOMERS mapping activity.
|
| 5. |
Position your cursor on the Start activity. Ensure the cursor
changes to ( ). Drag
a line (these lines from here on will be referred to as transitions) from
Start activity to the Fork activity.
To drag a transition, the mouse pointer must change to ( ).
-
Drag a transition from START1 activity to the FORK
activity.
-
Drag a transition from FORK activity to the mapping activity
SIMPLE_MAP.
-
Drag a transition from FORK activity to the mapping activity
MAP_CUSTOMERS.
-
Drag a transition from mapping activity SIMPLE_MAP to the
AND1 activity.
-
Drag a transition from mapping activity MAP_CUSTOMERS to the
AND1 activity.
-
Drag a transition from AND1 activity to the mapping activity
END_SUCCESS.
On the Process Editor toolbar, click AutoLayout ( )
.
After having linked all the activities with transitions, the process
flow should be as shown in the screenshot.
|
| 6. |
This completes the design of this process flow.
Generate the LOAD_PF process flow by selecting Generate from the
Process Flow menu. The code generated from a process flow definition
is in industry standard XML Process Definition Language (XPDL).
From Process Flow menu, select Close to exit the process
editor.
In the Design Center, click Save All ( )
on the toolbar. In Warehouse Builder warning dialog box , click Yes
to save the changes.
|
Back to Topic
Deploying
and Executing the Process Flow
To deploy and execute the process flow, perform the following
steps:
| 1. |
From the Tools menu, select Control Center Manager. In
the Control Center, expand PFLOWS_LOCATION > PFLOWS. Select
PFLOWS.
In the Object Details panel, the process flow package L_MAPS is
selected for deployment. Design Status is set to New. Click Default
Actions to set Deploy Action to Create.
Note: You deploy the process flow package, and not the
process flow. However, you execute the process flow.
|
| 2. |
From the File menu, select Deploy > To Control Center.
The process flow package, L_MAPS, is deployed successfully.
|
| 3. |
Expand L_MAPS if not already expanded and click LOAD_PF
to select it.
Select Start from the File menu or click the Start
( ) button on the toolbar.
|
| 4. |
To view the execution details, double-click LOAD_PF in the Job
column of the Control Center Jobs panel.
In the LOAD_PF dialog box, click the Execution Results tab and
monitor Row Activity. You can get a count of how many rows were inserted.
From the File menu of the Job Details: LOAD_PF dialog box, select
Close. From the File menu of the Control Center Manager,
select Close to exit.
The LOAD_PF process flow (a very simple example) runs the two mappings
parallel to one another, then waits until both are done, and continues.
Recall since you had set the Loading Type property for the target tables
as TRUNCATE/INSERT, on execution of this process flow, Warehouse Builder
truncated the rows and re-loaded the data into the two target tables,
CONS_ORDERS and SCALAR_CUST.
You can achieve much more complex flows, where you loop around, set parameters
and wait for results. You can also decide to traverse certain branches
based on conditions and errors, and create a complete error handling scenario
with notifications and emails.
|
Back to Topic
Back to Topic List
In this tutorial, you learned how to:
 |
Create, Deploy and Execute a Simple ETL Mapping
|
 |
Create, Deploy and Execute a Mapping that uses a User-Defined Database
Object
|
 |
Create, Deploy and Execute a Simple Process Flow
|
Back to Topic List
Place the cursor over this icon
to hide all screenshots.
|