External Tables
   

Oracle9i introduces a family of new features to support more scalable and efficient ETL (Extraction, Transformation, Load) processing for data warehouses and business intelligence systems.

One of the most exciting ETL additions is the External Table feature. The External Table feature allows for flat files, which reside outside the database, to be accessed just like relational tables within the database: the flat-file data can be queried and joined to other tables using standard SQL. Data access can be serial or parallel for maximum scalability. From a user’s point of view, the main difference between an external table and a regular table is that the external table is read-only.

To understand the value of external tables, consider that a data warehouse always contains data from external data sources. For example, a data warehouse may contain data extracted from an OLTP system or data purchased from a third-party data provider. When adding new data into a data warehouse from an external data sources, the data has to be transformed. Transformation tasks may include complex filtering, validation of new data against information already in the warehouse, and insertion of data at both detail and aggregate levels.

Prior to Oracle9i, performing complex transformations within Oracle required that the external data be transformed using one of two possible strategies:

  • Load the data into Oracle and store it in a staging table. When the transformations were finished the table could be deleted. This approach devoted disk space, administrative resources and processing time to data that was already available on disk.
  • Transform the external data in flat files outside the database. Since some transformations require data from the data warehouse, this process requires not only flat-file manipulation but also database access. When the transformations were completed, the transformed data is loaded into the data warehouse. This approach also requires redundant data storage and processing time.

Oracle9i's external tables avoid the problems inherent in the approaches described above: external tables provide a whole new model for loading and transforming external data. The data is no longer transformed outside the database, nor must it be stored in a staging table. Instead, the external data is presented to the database as a virtual table, enabling full data processing inside the database engine. This seamless integration of external data access with the transformation phase is generally referred to as "pipelining." With pipelining, there is no interruption of the data stream, so every ETL process can show increased performance.

Example
External tables are defined to Oracle using a CREATE TABLE statement. The example below defines an external table called "products_ext" which is based on two flat files outside Oracle called 'new_prod1.txt' and 'new_prod2.txt'.

CREATE TABLE products_ext
(prod_id NUMBER, prod_name VARCHAR2(50), prod_desc VARCHAR2(4000),
  prod_category VARCHAR2(50), prod_category_desc VARCHAR2(4000),
  list_price NUMBER(6,2), min_price NUMBER(6,2),
  last_updated DATE)
ORGANIZATION EXTERNAL
(
  TYPE oracle_loader
  DEFAULT DIRECTORY stage_dir
  ACCESS PARAMETERS
  ( RECORDS DELIMITED BY NEWLINE
     BADFILE bad_dir:'bad_products_ext'
     LOGFILE log_dir:'log_products_ext'
     FIELDS TERMINATED BY ','
     MISSING FIELD VALUES ARE NULL
     (prod_id, prod_name, prod_desc, prod_category,
      prod_category_desc, price, price_delta,
      last_updated char date_format date mask "dd-mon-yyyy")
   )
LOCATION ('new_prod1.txt','new_prod2.txt')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;

The access of this external table is defined as PARALLEL 5. Access to external files is transparently parallelized, independent of the number of files (intra-file parallelism). The parallelism available through external tables increases performance dramatically compared to serial processing on a file-by-file basis.

The external table defined above can now be used directly like a regular table using SQL. For example, the external data can be used to synchronize existing products information in the database with the changes of the last 24 hours.

MERGE INTO products d
USING (SELECT * FROM products_ext
      WHERE last_updated > SYSDATE - 1) s
ON (d.prod_id=s.prod_id)
WHEN MATCHED THEN
      UPDATE
      SET d.prod_list_price = s.list_price,
      d.prod_min_price = s.min_price
WHEN NOT MATCHED THEN
      INSERT (prod_id, prod_name, prod_desc, prod_category,
              prod_category_desc, prod_list_price, prod_min_price)
      VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_category,
              s.prod_category_desc, s.list_price, s.min_price);

This statement processes the rows which have been changed in the past 24 hours and uses these rows directly in a MERGE command. This not only reduces the amount of data to be processed, it also avoids any intermediate staging in the database by combining the access and transform operation. Since any SQL query can be executed against an external table, the external table data can be transformed in much more complex ways as the data is loaded.

External tables provide a whole new way to access and process external data directly from within the database. They simplify data access and transformations by treating external files as standard tables. Through pipelining and parallelism, external tables significantly increase overall ETL performance. External tables are just one example of the new ETL features of Oracle9i. The full set of new ETL functionality creates a powerful framework for handling all ETL tasks within the Oracle database.

 More Info
Oracle9i for E-Business: Business Intelligence - Technical White Paper
Oracle9i database daily feature: MERGE statement
ETL processing within Oracle9i- Technical White Paper
Oracle9i Data Warehousing Guide

Oracle9i Daily Features
Archives

   
E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy