Extracting, Transforming and Loading Data

This tutorial shows you how to define simple ETL mapping and also shows how to work with user-defined database objects in Warehouse Builder. In this tutorial, you also define a simple process flow.

Note: The screenshots in this tutorial may appear slightly different from yours depending on the order in which you perform the other tutorials in this OBE Series. You can ignore the irrelevant items you see on your screens.

Approximately 1 hour

Topics

This tutorial covers the following topics:

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over each individual icon in the following steps to load and view only the screenshot associated with that step.

Overview

ETL stands for Extract, Transform and Load. ETL involves the movement and transformation of data from your sources to your targets. In the tutorial "Setting up the Oracle Warehouse Builder Project", you learned how to define Warehouse Builder modules that connect to your source and target schemas. Now, you learn how to create the data movement and transformation logic.

This tutorial begins with creating a simple mapping, using the mapping editor. Warehouse Builder facilitates creation of large and complex mappings, increasing productivity and reducing the number of errors.

Back to Topic List

Before starting this tutorial, you should:

1.

Complete the Setting Up the Oracle Warehouse Builder Project tutorial.

Note: For this tutorial, you must have completed the Installation and Configuration of Oracle Workflow as mentioned in Prerequisite topic of the tutorial "Setting up the Oracle Warehouse Builder Project".

Also, you must have completed the "Importing Metadata in OE module" subtopic in tutorial "Setting up the Oracle Warehouse Builder Project"

 

2.

Log in to SQL Plus and run the grant_priv.sql script.

The grant_priv.sql script grants select privileges to a few OE and HR tables that you use as source tables in your mappings later in this tutorial.

Log in to SQL*Plus as sys and run the grant_priv.sql script that you downloaded into the c:\owbdemo_files folder.

To run the script, enter the following command at the SQL> prompt:

@c:\owbdemo_files\grant_priv.sql;

 

Back to Topic List

Defining a Simple ETL Mapping

In this topic, you create a basic mapping to understand how ETL design works in Warehouse Builder. You use the OE sample schema as the source from which you load the data into DP_TGT target.

In this topic, you perform the following:

Creating a Simple ETL Mapping
Deploying the Mapping and Related Objects
Executing the Mapping
Viewing the Loaded Data in CONS_ORDERS table

Back to Topic List

Creating a Simple ETL Mapping

To create a simple ETL Mapping, perform the following steps:

1.

Assuming that you have logged in as owb user in the Design Center, in the Project Explorer panel, expand HANDSON > Databases > Oracle > DP_TGT. Right-click Mappings and select New.

The Create Mapping window appears.

2.

In the Create Mapping window, enter SIMPLE_MAP as name of the mapping. Click OK.

The Mapping Editor launches the SIMPLE_MAP mapping.

 

3.

You can add existing sources and targets to a mapping through the Explorer panel, which is on the top left of the Mapping Editor.

In the Explorer panel, ensure that Available objects tab is selected. Expand Oracle > OE > Tables. Drag ORDERS to the canvas.

Also, drag ORDER_ITEMS on to the canvas.

 

4.

From the Palette, drag the JOINER operator to the canvas. Place the operator to the right of the two tables. Click () icon of each operator headers to maximize the operators.

 

5.

Connection lines graphically represent how the data flows from a source, through operators, and to a target.

Drag a connection line from ORDERS.INOUTGRP1 into JOINER.INGRP1.

 

6.

Drag a connection line from ORDER_ITEMS.INOUTGRP1 into JOINER.INGRP2.

Observe that the attributes from the source tables get added as input attributes in the JOINER operator.

 

7.

In the Mapping Editor Canvas, click the JOINER operator header to highlight it.

In the Joiner Properties panel, examine the Join Condition property.

Click Edit () to view the full join condition. In the Expression Builder window, the join condition is as follows: INGRP2.ORDER_ID = INGRP1.ORDER_ID. Warehouse Builder frames the join condition based on the constraints between the two tables, ORDERS and ORDER_ITEMS. Click OK.

 

8.

From the Palette, drag a Table Operator to the canvas. Place the operator to the right of the JOINER operator.

The Add Table Operator window displays.

 

9.

In the Add Table Operator window, select "Create unbound operator with no attributes". In the New operator name field, enter CONS_ORDERS. Click OK.

 

10.

Drag a connection line from JOINER.OUTGRP1 to CONS_ORDERS.INOUTGRP1. Maximize the CONS_ORDERS operator and ensure that the operator is highlighted.

 

11.

In the Table Operator Properties panel, for the Loading Type property, select TRUNCATE/INSERT. If you want to insert or update, you can simply select INSERT/UPDATE and Warehouse Builder will generate code with a MERGE statement

 

12.

The CONS_ORDERS table still does not exist in the repository. Right-click the CONS_ORDERS table operator header and select Create and Bind from the menu.

In the Create and Bind window, examine the details and click OK. From the Mapping menu, select Close to exit the Mapping Editor.

 

13.

In DP_TGT, expand Tables. Double-click CONS_ORDERS.

Examine the table details in the Data Object Editor. From Diagram menu, select Close Window to exit the Data Object Editor.

Note that you derived this table directly from the mapping editor and did not create the table and its columns via typing. This adds to your productivity and saves time.

From the Design menu, select Save All. In Warehouse Builder warning dialog box , click Yes to save your work.

 

Back to Topic

Deploying the Mapping and Related Objects

Deployment is the process of creating your target system from the logical design or model. When you deploy ETL mappings, the deploy procedure generates a PL/SQL script. You need to explicitly execute the scripts to perform ETL. Before you deploy a mapping, you should have deployed the objects the mapping refers to.

To deploy SIMPLE_MAP mapping, perform the following steps:

1.

To deploy, you must launch the Control Center Manager. From the Tools menu, select Control Center Manager.

 

2.

Click DP_TGT_LOCATION if not already highlighted. In the Object Details panel, observe the two objects you defined, SIMPLE_MAP and CONS_ORDERS, design status is set to New and Deploy Status set to Not Deployed.

 

3.

Click Default Actions to change Deploy Action from None to Create.

As a result of performing other tutorials, you may have many other objects under DP_TGT_LOCATION. In order to select the two objects, SIMPLE_MAP and CONS_ORDERS, expand DP_TGT_LOCATION > DP_TGT.

Expand Tables. Expand Mappings. Hold ctrl and select the two objects, SIMPLE_MAP and CONS_ORDERS, and then click Default Actions as shown in the screenshot below.

 

4.

From the File menu, select Deploy > To Control Center.

You can monitor the progress of deployment in the Control Center Jobs panel on the lower right of the Control Center window.

In the Control Center Jobs panel, the deployment status changes from "generate" to "run" to "completed successfully." The mappings are deployed successfully and are ready for execution.

To view the generated code for the mapping, double-click the HANDSON job in the Control Center Jobs panel.

Note: If you performed deployment in any of the other tutorials in this series, you may see multiple jobs in this panel. If you see multiple jobs in the window, check the Finished column for the date and time and select the latest job.

 

5. In the Job Details: Handson dialog box, click the Script tab.

Note that Warehouse Builder generates a .pls (PL/SQL) code for the deployed mapping. To view the generated code for SIMPLE_MAP, select SIMPLE_MAP and click View Code

Scroll down in the code window to examine the generated code. From Code menu, select Close. Then in the Job Details window, from File menu, click Close to exit.

 

Back to Topic

Executing the Mapping

You learned that when you deploy a mapping Warehouse Builder generates a .pls script. You need to execute that deployed code to actually perform the task the mapping was designed to do."

The SIMPLE_MAP mapping was designed to select rows from OE.ORDERS and OE.ORDER_ITEMS that would go through a join condition using the JOINER operator and then load the resultant data into CONS_ORDERS table.

In order to execute the mapping, perform the following steps:

1.

Assuming you did not exit the Control Center, expand DP_TGT_LOCATION > DP_TGT > Mappings and click SIMPLE_MAP.

 

2.

From the File menu, select Start.

When you click Start, the Control Center Jobs panel switches to the Execution tab. You can monitor the execution in this panel.

To view the execution details, double-click SIMPLE_MAP in the Job column of the Control Center Jobs panel.

 

3.

In the Job Details: SIMPLE_MAP dialog box, click the Execution Results tab and monitor Row Activity.

You can get a count of how many rows were inserted in the target table, CONS_ORDERS. From the File menu of the Job Details: SIMPLE_MAP dialog box, select Close. From the File menu of the Control Center Manager, select Close to exit.


Back to Topic

Viewing the Loaded Data in CONS_ORDERS table

To view the data that was loaded in CONS_ORDERS table as a result of executing the mapping, perform the following steps:

1.

In Project Explorer panel, expand Tables under DP_TGT. Right-click CONS_ORDERS and select Data from the menu.

 

2.

The Relational Data Viewer shows the data in the CONS_ORDERS table. Scroll up/down or left/right to examine the data. From Object menu, select Close Window to exit the Data Viewer.

 

Back to Topic

Back to Topic List

Using User-Defined Database Objects in an ETL Mapping

In this topic you transform and load data that originally resided in User-Defined Database objects.

In this topic, you perform the following:

Creating a MAP_CUSTOMERS Mapping that uses a User-Defined Database Object
Generating Code for the MAP_CUSTOMERS Mapping
Deploying the MAP_CUSTOMERS Mapping
Executing the MAP_CUSTOMERS Mapping
Viewing the Loaded Data in SCALAR_CUST table

Back to Topic List

Creating a MAP_CUSTOMERS Mapping that uses a User-Defined Database Object

Now you define a mapping that uses a user-defined object from an OE.CUSTOMERS table as a source and loads the transformed data into a target table, SCALAR_CUST.

To create the mapping, perform the following steps:

1.

In Project Explorer panel, expand DP_TGT. Right-click Mappings and select New.

The Create Mapping window appears.

 

2. In the Create Mapping window, enter MAP_CUSTOMERS as name of the mapping. Click OK.

The Mapping Editor launches the MAP_CUSTOMERS mapping.

 

3.

In the Explorer panel, ensure that Available objects tab is selected. Expand Oracle > OE > Tables. Drag CUSTOMERS table to the canvas. Maximize the CUSTOMERS operator by clicking maximize ().

 

4.

Double-click CUST_ADDRESS attribute in the CUSTOMERS operator.

In the Table Editor: Customers window, note the data type of the CUST_ADDRESS attribute. The data type is OE.CUST_ADDRESS_TYP, a user-defined data type. Click OK.

 

5.

To interpret the CUST_ADDRESS_TYP in CUSTOMERS you use the Expand Operator.

From the Palette, drag the Expand Object Operator on the canvas. The Add Expand Object window displays.

 

6.

In the Add Expand Object window, select CUST_ADDRESS_TYP, as shown in the screenshot. Click OK.

Maximize the CUST_ADDRESS_TYP operator by clicking maximize ().

 

7.

Drag a connection line from CUSTOMERS.CUST_ADDRESS to INGRP1.CUST_ADDRESS_TYPE.

 

8.

Note that the type has COUNTRY_ID. To find the corresponding country name, you define a lookup.

From the Palette, drag the Key Lookup Operator on the canvas. Place the operator to the right of the Expand Object operator. The Key Lookup wizard launches. Click Next on the Welcome page.

 

9.

In the Name page, accept the default name, KEY_LOOKUP. Click Next.

 

10.

In the Groups page, accept the default groups. Click Next.

 

11.

In the Input Connections page, expand CUST_ADDRESS_TYP > OUTGRP1 and select COUNTRY_ID.

Click > to move COUNTRY_ID from Available Attributes list to Mapped Attributes. Click Next.

 

12.

In the Lookup page, click the drop down list to select the object, which has the lookup result. Expand OE and select COUNTRIES.

 

13.

Click in the blank Lookup table Key field and select COUNTRY_C_ID_PK.

 

14.

Click in the blank Input Attribute field. Note COUNTRY_ID gets added under COUNTRY_C_ID_PK.

Select COUNTRY_ID under Input Attribute field next to the COUNTRY_ID lookup table key. Click Next.

 

15.

In the No-match Rows page, enter 'UNKNOWN' (including single quotes) in the Default Value field for COUNTRY_NAME Lookup Table Column. Click Next. In the Summary page, examine the details and click Finish.

 

16.

If you are not able to see all the objects similar to the following screenshot, then in the Mapping Editor, click Auto-Layout on the toolbar to bring the mapping to its default size. Also, you can click Fit in Window to adjust the mapping in your window.

The Mapping looks as follows:

 

17.

Now you create the target table to load the transformed data into.

From the Palette, drag the Table Operator on the canvas.

 

18.

In the Add Table Operator window, select Create unbound operator with no attributes. Enter SCALAR_CUST as the new operator name. Click OK. Place the operator to the right of the Key Lookup operator.

 

19.

Drag a connection line from KEY_LOOKUP.COUNTRY_NAME to SCALAR_CUST.INOUTGRP1.

 

20.

From CUST_ADDRESS_TYP, select all the attributes except COUNTRY_ID. Drag a connection line to SCALAR_CUST.INOUTGRP1, as shown in the screenshot.

In Connect Operators window, accept the default option, Copy source attributes to target group and match. Click Go. Click OK.

 

21.

From CUSTOMERS, hold shift and select CUST_FIRST_NAME and CUST_LAST_NAME and drag a connection line into SCALAR_CUST>INOUTGRP1.

In Connect Operators window, accept the default option, Copy source attributes to target group and match. Click Go. Click OK.

 

22.

Minimize all the operators except SCALAR_CUST.

Click SCALAR_CUST operator header. In the Table Operator properties panel, for Loading Type property, select TRUNCATE/INSERT.

 

23.

Note that the SCALAR_CUST table is still not existing in the repository.

Right-click SCALAR_CUST header and select Create and Bind from the menu.

In the Create and Bind window, examine the details and click OK. The table SCALAR_CUST gets added under DP_TGT> Tables.

This completes the mapping. From the Mapping menu, select Close to exit the Mapping Editor.

 

Back to Topic

Generating Code for the MAP_CUSTOMERS Mapping

To generate the code for this mapping, perform the following steps:

1.

Expand DP_TGT > Mappings. Right-click MAP_CUSTOMERS and select Generate.

The Generation Results window displays.

 

2.

Click the Script tab. Select MAP_CUSTOMERS and click View Code.

 

3.

Examine the generated code for the mapping. Notice the NVL in the code to ensure the 'UNKNOWN' country name, the ANSI SQL code of the left outer join and the splitting out of the type in SQL.

Note: This code is ready for deployment. You may also right-click the MAP_CUSTOMERS mapping and click Deploy to deploy the mapping. Rather you will generate, deploy and execute the mapping from the Control Center Manager in the forthcoming topics in this tutorial.

In the MAP_CUSTOMERS window, from Code Menu, select Close to exit. From the File menu, select Close to exit the Generation Results window.

From the Design menu, select Save All. In Warehouse Builder warning dialog box , click Yes to save the changes.

 

Back to Topic

Deploying the MAP_CUSTOMERS Mapping

To deploy the mapping, perform the following steps:

1.

From the Tools menu, select Control Center Manager.

 

2.

In the Control Center, expand DP_TGT_LOCATION > DP_TGT. Expand Tables. Expand Mappings.

 

3.

Hold Ctrl and select SCALAR_CUST and MAP_CUSTOMERS, as shown in the screenshot. Note that both the objects Design Status is set to New and Deploy Status is set to Not Deployed.

 

4.

Click Default Actions to set the Deploy Action to Create.

 

5.

From the File menu, select Deploy > To Control Center.

In the Object Details panel, the Deploy Status is set to Success.

 

Back to Topic

Executing the MAP_CUSTOMERS Mapping

To execute the MAP_CUSTOMERS mapping, perform the following steps:

1.

Assuming you did not exit the Control Center, expand DP_TGT_LOCATION > DP_TGT > Mappings and click MAP_CUSTOMERS. From the File menu, select Start.

 

2.

When you click Start, the Control Center Jobs panel switches to the Execution tab. You can monitor the execution in this panel.

To view the execution details, double-click MAP_CUSTOMERS in the Job column of the Control Center Jobs panel.

 

3.

In the Job Details: MAP_CUSTOMERS dialog box, click the Execution Results tab and monitor Row Activity.

You can get a count of how many rows were inserted in the target table, SCALAR_CUST.

From the File menu of the Job Details: MAP_CUSTOMERS dialog box, select Close. From the File menu of the Control Center Manager, select Close to exit.

 

Back to Topic

Viewing the Loaded Data in SCALAR_CUST table

To view the data that was loaded after executing the MAP_CUSTOMERS mapping, perform the following steps:

1.

In Project Explorer panel, expand Tables under DP_TGT. Right-click SCALAR_CUST and select Data from the menu.

 

2.

The Relational Data Viewer shows the data in the SCALAR_CUST table. Scroll up/down or left/right to examine the data. From Object menu, select Close Window to exit the Data Viewer.

 

Back to Topic

Back to Topic List

Defining a Simple Process Flow

In Warehouse Builder, a process flow is used to manage the execution of mappings and related activities, such as uploading a file to the warehouse machine using FTP or notifying the administrator in case of an error in the load process. Using process flows, you can define the sequence in which the mappings will run and also define actions based on the outcome of the execution for example sending a notification email if the mapping execution failed. Warehouse Builder also allows you to schedule the process flows.

The Process Flow Editor includes a variety of activities that you add and then connect with transitions to design a flow. An Activity represents units of work in a process flow. These units of work can involve components internal or external to Warehouse Builder. Transitions indicate the sequence and conditions in which to launch activities.

In this topic, you perform the following tasks:

Creating the Process Flow Module and Package
Designing the Process Flow
Deploying and Executing the Process Flow

Back to Topic List

Creating the Process Flow Module and Package

Before you can start designing the process flow, you need to create a process flow module that includes a process flow package. Process Flow Modules include Process Flow Packages. The Process Flow Package is a grouping mechanism that determines which process flows you can interrelate. At run time, you can launch one process flow that launches other process flows that exist in the same process flow package.

In this topic, you create a Process Flow Module named PFLOWS, a Process Flow Package named L_MAPS. The Process Flow Module acts as a container by which you can validate, generate, and deploy a group of process flows.

To create the process flow module and package, perform the following steps:

1.

In the Project Explorer, expand HANDSON > Process Flows. Right-click Process Flow Modules and select New from the menu. The Create Module Wizard is launched. Click Next on the Welcome page.

 

2. In the Name and Description page, enter PFLOWS as the name of the process flow module. Click Next.
3. Now you create an Oracle Workflow location for the process flow module, PFLOWS.

Note: You define an Oracle Workflow location to specify where you want to deploy your process flows. The Oracle Workflow location points to a workflow schema (OWF_MGR) that runs in the target database. Hence, you should have Oracle Workflow installed and configured beforehand.

On the Connection Information page, observe that Warehouse Builder creates a location with a default name, PFLOWS_LOCATION1. Click Edit... to specify the details of the Workflow location.

In the Edit Oracle Workflow Location window, change the name of the location to PFLOWS_LOCATION (remove the 1 from the default name).

This tutorial uses owf_mgr as the schema name and owf_mgr as the password. If your Oracle Workflow schema has a different name and password, substitute accordingly. Ensure the Workflow User is unlocked.

Provide the following information to create a fully qualified location:

Name: PFLOWS_LOCATION (remove the 1 from the default name)
Password: owf_mgr
Host Name: localhost
Port Number: 1521
Service Name: orcl
Schema: owf_mgr
Version: 2.6.4

Click Test Connection. If the connection is successful, click OK. If the connection is not successful, check the connection details and try again. Click OK. Click Next. On the Summary page, review the details and click Finish

4.

In the Create Process Flow Package window, enter L_MAPS as the name and click OK.

 

5. The Create Process Flow window is launched. Enter LOAD_PF as the name of the process flow.

Click OK.

Warehouse Builder launches the Process Editor and displays the process flow with a Start activity and an End_Success activity.

In the next topic, you now learn how to design the process flow with activities and transitions.

 

Back to Topic

Designing the Process Flow

To design the process flow, perform the following steps:

1.

In the Process Editor, from the Palette panel, drag the Fork activity to the canvas. You can use the Fork activity to launch multiple, concurrent activities after the completion of an activity. You can also drag the Fork activity from the Selected Objects tab in the Explorer panel.

 

2.

In the Process Editor, in the Explorer panel, click the Available Objects tab. Expand DP_TGT and drag SIMPLE_MAP mapping to the canvas. Place the operator to the right of the FORK activity.

 

3. In the Explorer panel, ensure that the Available Objects tab is selected. From DP_TGT, drag the MAP_CUSTOMERS mapping to the canvas.

 

4. From the Palette panel, drag an AND activity to the canvas, to the right of the SIMPLE_MAP and MAP_CUSTOMERS mapping activity.

 

5. Position your cursor on the Start activity. Ensure the cursor changes to ( ). Drag a line (these lines from here on will be referred to as transitions) from Start activity to the Fork activity.

To drag a transition, the mouse pointer must change to ().

  • Drag a transition from START1 activity to the FORK activity.

  • Drag a transition from FORK activity to the mapping activity SIMPLE_MAP.

  • Drag a transition from FORK activity to the mapping activity MAP_CUSTOMERS.

  • Drag a transition from mapping activity SIMPLE_MAP to the AND1 activity.

  • Drag a transition from mapping activity MAP_CUSTOMERS to the AND1 activity.

  • Drag a transition from AND1 activity to the mapping activity END_SUCCESS.

On the Process Editor toolbar, click AutoLayout () .

After having linked all the activities with transitions, the process flow should be as shown in the screenshot.

 

6.

This completes the design of this process flow.

Generate the LOAD_PF process flow by selecting Generate from the Process Flow menu. The code generated from a process flow definition is in industry standard XML Process Definition Language (XPDL).

From Process Flow menu, select Close to exit the process editor.

In the Design Center, click Save All () on the toolbar. In Warehouse Builder warning dialog box , click Yes to save the changes.

 

Back to Topic

Deploying and Executing the Process Flow

To deploy and execute the process flow, perform the following steps:

1.

From the Tools menu, select Control Center Manager. In the Control Center, expand PFLOWS_LOCATION > PFLOWS. Select PFLOWS.

In the Object Details panel, the process flow package L_MAPS is selected for deployment. Design Status is set to New. Click Default Actions to set Deploy Action to Create.

Note: You deploy the process flow package, and not the process flow. However, you execute the process flow.

 

2.

From the File menu, select Deploy > To Control Center.

The process flow package, L_MAPS, is deployed successfully.

 

3.

Expand L_MAPS if not already expanded and click LOAD_PF to select it.

Select Start from the File menu or click the Start ( ) button on the toolbar.

 

4. To view the execution details, double-click LOAD_PF in the Job column of the Control Center Jobs panel.

In the LOAD_PF dialog box, click the Execution Results tab and monitor Row Activity. You can get a count of how many rows were inserted.

From the File menu of the Job Details: LOAD_PF dialog box, select Close. From the File menu of the Control Center Manager, select Close to exit.

The LOAD_PF process flow (a very simple example) runs the two mappings parallel to one another, then waits until both are done, and continues. Recall since you had set the Loading Type property for the target tables as TRUNCATE/INSERT, on execution of this process flow, Warehouse Builder truncated the rows and re-loaded the data into the two target tables, CONS_ORDERS and SCALAR_CUST.

You can achieve much more complex flows, where you loop around, set parameters and wait for results. You can also decide to traverse certain branches based on conditions and errors, and create a complete error handling scenario with notifications and emails.

 

Back to Topic

Back to Topic List

In this tutorial, you learned how to:

Create, Deploy and Execute a Simple ETL Mapping

Create, Deploy and Execute a Mapping that uses a User-Defined Database Object

Create, Deploy and Execute a Simple Process Flow

Back to Topic List

Place the cursor over this icon to hide all screenshots.