OEMM: Configuring Data Lineage Analysis


Options



Before You Begin

Purpose

This tutorial covers data lineage analysis, or more simply, Show me where the data is coming from.

Time to Complete

Approximately 20 minutes.

Introduction

This Oracle By Example (OBE) is fourth in a series of seven OBEs. The recommendation is to do them in order. The series is:

  1. Installing OEMM, on a Windows PC.
  2. Harvesting Metadata from Various Sources: DB, ODI, OBIEE, and others.
  3. Managing the Configuration and Metadata Stitching with OEMM, including backup and restore.
  4. Configuring Data Lineage Analysis, graphically and exporting to Excel (you are currently viewing this OBE).
  5. Using Impact Analysis, the "opposite" of data lineage.
  6. Creating a Business Glossary in OEMM, using the enterprise edition.
  7. Administering OEMM, creating users, roles, and scripts.

Scenario

You have a report that was generated using BI, and you want to see where all the data came from (its lineage). You want to see this information graphically as well as to export it to a Microsoft Excel (spreadsheet) format.

Software and Hardware Requirements

The following is a list of software and hardware requirements used in this Oracle By Example (OBE):

  • Software
    • Oracle Database 11.2.0.4.0 or higher
    • Oracle Business Intelligence Enterprise Edition (OBIEE) was installed at one time, but no longer is installed.
    • Oracle Enterprise Metadata Manager (OEMM) 12.1.3.0.2 or higher. The OEMM Install and Setup Guide is available online and in the product itself.
    • Oracle Data Integrator (ODI) 12.1.3 or higher. At least one copy of ODI for Windows must be on the same machine as OEMM. (ODI is not a requirement for OEMM, but is a requirement for parts of this OBE.)
    • Alternatively, the VM contains Database, ODI, and OEMM sample repositories (but not OEMM software) pre-installed with Oracle Enterprise Linux (OEL) 6.4. The VM (approx 7 GB) requires Oracle VirtualBox to run. The VM is available in either of two places (both are the same VM):
  • Hardware
    • At least 2 GB RAM for each target (database or ODI)
    • At least 5 GB RAM for the VM (if you are using that)

Prerequisites

Before starting this tutorial, you should:

  • Have an account to download Oracle software.
  • Have access to or have installed the sample lab artifacts
  • Have attended ODI 12c training, [D82167GC10] Oracle Data Integrator 12c: Integration and Administration, or equivalent experience. (ODI is not a requirement for OEMM, but is a requirement for parts of this OBE.)

Typographic Conventions

These are the conventions used throughout this practice to indicate who types what.

  • Filename or /path/subdirectory/. Often you can navigate to a destination without having to type it.
  • Command or value that you type, such as gedit .bashrc. Be mindful of periods at the end of sentences that are not part of the command.
  • Click this such as or .
  • Some variable or <variable> as something you replace. Often prefixed with "my" such as <myName> or <myHost>.
  • A Menu Item or dialog Field Prompt or Tab name. For example, "click the Designer tab and select Models > New Model".
  • Generally a red rectangle in the graphics is not part of the original image, but added to draw your attention to an item.

1. View the Source (a BI Report)

Browse the BI model that is harvested from an OBIEE server.

1.1 Exploring the BI Report

  1. Open Repository > Oracle Tutorial > BI Reports and notice the Reports (Folder) and BI Repository (RPD Content).

    BI Reports tab
    Description of this image

    Trivia: RPD is not an acronym, it does not stand for anything! It is just a metadata file extension.

  2. Open BI Repository to see the logical/physical/presentation layers. Explore the BI RPD nodes.

    RPD Layers
    Description of this image

    You can browse the content in OEMM without needing to install or use the BI Admin tool.

  3. Now open Reports > User Folders > My Folders > Products Report and view it graphically by clicking Data and Semantic Flow overview (folder with arrows). It shows the report and its link to the presentation layer.

    Data Flow Overview
    Description of this image

    You may need to collapse some sections by clicking collapse-left (circle-<).

This is the report that will be analyzed for data lineage. Leave the tabs open for the next topic.

This shows the final state of the report, but what you are interested in is, "How did the data get there?"

2. Generate Data Lineage

In order to display where the data came from, perform the following steps:

2.1 Basic Analysis

  1. After browsing the report, you want to see where the data is coming from. Collapse (circle-<) left and right (circle->) side panes to have a better view of the report. You can click Fit Contents (four-way arrows) to automatically zoom and adjust the graph.

    Data Flow Overview
    Description of this image

    You can always expand the collapsed panels by clicking those circle-< left and right circle-> icons again.

  2. Right-click table(1) in the layout...

    table(1) fly-out menu
    Description of this image

    ...and select Trace Lineage > Trace Lineage (Advanced).

    Use Trace Lineage (Advanced) because it allows various options. "Trace Lineage" and "Trace Impact" are simplified actions and do not allow various options. It is best to always use the "Trace Lineage (Advanced)" action for lineage, impact and semantic analysis since it allows complete flexibility.

  3. In Configuration field, click Search (magnifying glass).

    Advanced Lineage Options
    Description of this image

    Do not click Trace yet.

  4. Select the configuration you created in the last OBE: Sales Reporting Application.

    Select Configuration
    Description of this image

    Click .

  5. Click .

    Advanced Lineage Options
    Description of this image
  6. OEMM finds the source model/table/columns for this report.

    Data Lineage for table(1)
    Description of this image

    Leave the report open after you have looked at it for the next topic.

This displays a simple lineage.

2.2 Intermediate Analysis

Currently you can only see the final objects that make up the lineage. To see the intermediate objects, perform the following tasks:

  1. In the same Data Lineage for table(1) report from the last topic, click End Objects Only pull-down, and select All Objects.

    End Objects Only
    Description of this image
  2. Quite a few more objects are now shown.

    All Objects
    Description of this image

    Leave the report open after you have looked at it for the next topic.

You have expanded the scope of the analysis to include all objects.

2.3 Export the Analysis (to Excel)

  1. In the same Data Lineage for table(1) report from the last topic, click Download Details.

    Download to CSV
    Description of this image

    Use whatever dialog your web browser uses to save the .csv file to a destination folder of your choice.

  2. Open the comma-separated variable (.csv) file using a spreadsheet viewer such as Microsoft Excel or OpenOffice.

    csv spreadsheet
    Description of this image

    Remember that as soon as you download it, there is a chance it will start to get out of sync! You can close the spreadsheet.

    The columns Package/Classifier/Features are generic terms for object hierarchy in any model. For a database model, they correspond to Schema/Table/Columns. Leave the data lineage report open after you have looked at it for the next topic.

You exported the information to a .csv file which can externally be turned into an .xls spreadsheet which can be printed to a .pdf document.

You have completed basic data lineage analysis.

3. Interactive Lineage Graphs

Somewhere in between, show me only the last thing and, show me all the objects, is the ability to dynamically move around and selectively drill down just a few of the objects and return while browsing the configuration. To do this, perform the following steps:

3.1 Lineage Graphs

  1. In the same Data Lineage for table(1) report from the last topic, click Show Lineage Graph.

    Show Lineage Graph
    Description of this image

    It does not matter at this point what, if anything, is selected in the report.

  2. Drill down into the detail by clicking expand [+] and collapse [-] in the upper-right corners of the boxes.

    Drill-down
    Description of this image

    This is what it looks like before you drill down (initially collapsed to only top-level showing).

  3. This is the same diagram expanded down a few levels.

    3 levels down
    Description of this image

    You can drill deeper if you wish. Leave this graph open for the next topic.

This is the static version of the lineage graphs.

3.2 Tracing Data Flow

  1. Expand plus all of the objects down to their lowest level. You can do this manually, or by selecting the object and then on the toolbar is an icon that expands the entire object all at once. Then rezoom to Fit (four-way arrows).

    While expanding all nodes using plus, you will be asked whether you want to expand at "Feature level" or at "Classifier level." In OEMM, model hierarchies are abstracted as Classifier/Feature. Usually the Feature level is the lowest level in any model. In a Database model, the Classifier level maps to Table and the Feature level maps to column.

    Before expanding

    Click To Feature level to get the expanded view below, then rezoom to fit.

    After expanding to Feature level
    Description of this image
  2. Select SAMP_REVENU (right in the middle of the page), and on the toolbar click Highlight. This highlights the path from start to finish going through this object.

    Highlighted path
    Description of this image

    Click on any empty space in the diagram to unhighlight the path. Try different objects.

  3. Select an individual object, such as SAMP_REVENU, and examine the Properties panel (which may need to be re-expanded).

    Object Properties
    Description of this image

    In this case it is showing a view, and the SELECT statement that generated the view. Click on any empty space in the diagram to unhighlight the object.

  4. The thinner lines (shown here) are expression links. Select an individual link, such as from REVENUE to REVENUE, and the details are shown in the Operations box. It shows the operation SUM(REVENUE) taking place between source and target for this link. If there are more than one transformation taking place then they will all be displayed with different contexts. Here Context refers to an individual mechanism where the operation takes place. Double clicking on the context will take you the in model browser where the operation takes place. In this case, double clicking on the context REVENUE will open the DB model "DB - Dimensional DW" and point to the view/column where the operation is defined.

    Link Type
    Description of this image
  5. Select another individual link, such as from ORDER_DATE to PER_NAME_YEAR, and it displays the Link Type. The thicker lines (shown here) are ETL processes. In this case, double clicking on Mapping LD_TIME_DIMS_87 will open the "Load to Dimensional DW" model and point to the ODI mapping where this operation is taking place.

    Link Type
    Description of this image

    You can get the same information in the hovering pop-up (transient), or in the Properties panel (permanent). Leave the graph open for the next Topic.

All of the objects are interactive with varying amounts of detail available and shown.

3.3 Drilling Down into ETL Processes

  1. Using the highlighted link from the previous topic, from ORDER_DATE to PER_NAME_YEAR link, right-click on it to drill into the ETL mapping.

    Trace ETL Details
    Description of this image

    Select Trace ETL Details.

  2. Tracing data lineage into ETL only exposes the data flow relevant to earlier lineage graph. You can see then in the mapping PER_NAME_YEAR, dates are calculated by summing the order date fields from the source.

    ETL Details
    Description of this image
  3. If you want to browse the full mapping, right-click to Show in Metadata Browser (LD_TIME_DIMS in this case).

    Open ETL
    Description of this image

    Do not bother doing it, but you could if you wanted to.

This illustrates drilling down into the ETL processes.

You have explored the interactive lineage graphs.

Want to Learn More

You had a report that was generated using BI, and you wanted to see where all the data came from (its lineage). You wanted to see this information graphically as well as to export it to a Microsoft Excel (spreadsheet) format.

In this tutorial, you learned to:

  • View a source report such as BI Reports
  • Generate data lineage in static and comma-separated formats (suitable for exporting)
  • Explore interactive lineage graphs
  • Drill down to various levels exploring the details available for each kind of object and link

Resources

Following is a list of conceptual or procedural Help topics relevant to the topic of this tutorial:

  • Product marketing information
  • Product documentation in PDF format.
  • Virtual Machine demo environment on OTN
  • Internal (employee) resources:
    • Virtual Machine demo environment (same as above) VM
    • Tutorial scripts
  • MetaIntegration, the owner of the OEM version of this product.
  • To learn more about OEMM, refer to additional OBEs in the Oracle Learning Library <can link to a specific set of OBEs on the subject or product(s)>

Credits

  • Lead Curriculum Developer: Jayant Mahto
  • Other Contributors: Steve Friedberg