As Published In
Oracle Magazine
March/April 2005

TECHNOLOGY: Talking Tuning


Speed and Simplify Data Movement

By Kimberly Floss

Use Oracle Database 10g Data Pump for fast, flexible export and import.

With databases approaching the double-digit terabyte range increasingly common these days, Oracle DBAs can spend a fair amount of time shuttling vast amounts of data as part of their daily operations. If we're not trying to keep a complete physical backup current for disaster recovery purposes, we're busy feeding developers their weekly or daily diet of refreshed production data to use in development and test environments or populating a read-only data warehouse at some satellite location. As DBAs, we've developed numerous scripts over the years to facilitate all these processes and we've tuned our scripts to perform as well as they can. But Oracle Export (and Oracle Import, especially) still takes too long.

Fortunately, Oracle Data Pump—Oracle Database 10g's new infrastructure for exporting and importing database data and metadata—speeds up movement of all of this data. This new server-side infrastructure is exposed through a new PL/SQL package, DBMS_DATAPUMP , which you use in one of three basic ways: through two new command-line client tools (expdp, impdp); through Oracle Enterprise Manager; or through your own procedure code, with calls to the application programming interface (API).

At first glance, the command-line clients expdp and impdp may remind you of the exp and imp utilities, but any similarity is superficial: These are both simply new client tools for working with the completely new Oracle server processes and related infrastructure. Let's start by taking a quick look at some of the high points of that infrastructure, before turning to a couple of usage scenarios.

Enabling Restart and Remote Access

The Oracle Data Pump Export and Import facility leverages key Oracle technology, including Advanced Queuing (AQ), the metadata API ( DBMS_METADATA ), and direct path, for example (see Figure 1). Here are a few of the key components of the Oracle Data Pump facility:

 

figure 1
Figure 1: Oracle Data Pump architecture

 

  • A master control process that manages the export and import process as jobs you define. Each job has its own master control process that gets instantiated when you start the job. One of the first things a master control process does is to spawn several other important facilities, including worker processes, and populate a master table. 

  • The worker processes unload or load metadata and data, as required by the user-specified job parameters. Worker processes use the Metadata API for all metadata loading and unloading. The number of worker processes the master control process spawns depends on the degree of parallelism you've selected (for Oracle Database 10g Enterprise Edition—parallelism isn't provided in Oracle Database 10g Standard Edition). 

  • A master table maintains job state, description, restart, dumpfile, and other information as a job is under way—the master control process and all worker processes feed this data into the master table as operations occur.

The master table matches the job name you pass as a parameter (if you gave it a name when you started the job) or uses a system-generated name.

As part of job completion, the master table's contents are written to the dumpfile. A pointer to the master file data is thus at the head of the dumpfile and is used to reconstitute the data structures upon import. Oracle Data Pump infrastructure deletes the master table upon successful completion of the Data Pump job.

If there's a system failure of some kind—a disk fills to capacity during writing, for example—or if you decide to stop the job so that it doesn't interfere with a critical production job, then the table (and, more important, all the job status information detail the table contains) remains, enabling you to resume the job later (see Listing 1 for the master table structure).

Code Listing 1: Master table structure 

SQL> desc kimberly.dev_refresh_job
 Name                                      Null?     Type
 --------------------------------------   -------    ----------

 PROCESS_ORDER                                       NUMBER
 DUPLICATE                                           NUMBER
 DUMP_FILEID                                         NUMBER
 DUMP_POSITION                                       NUMBER
 DUMP_LENGTH                                         NUMBER
 DUMP_ALLOCATION                                     NUMBER
 COMPLETED_ROWS                                      NUMBER
 ERROR_COUNT                                         NUMBER
 ELAPSED_TIME                                        NUMBER
 OBJECT_TYPE_PATH                                    VARCHAR2(200)
 OBJECT_PATH_SEQNO                                   NUMBER
 OBJECT_TYPE                                         VARCHAR2(30)
 IN_PROGRESS                                         CHAR(1)
 OBJECT_NAME                                         VARCHAR2(500)
 OBJECT_LONG_NAME                                    VARCHAR2(4000)
 OBJECT_SCHEMA                                       VARCHAR2(30)
 ORIGINAL_OBJECT_SCHEMA                              VARCHAR2(30)
 PARTITION_NAME                                      VARCHAR2(30)
 SUBPARTITION_NAME                                   VARCHAR2(30)
 FLAGS                                               NUMBER
 PROPERTY                                            NUMBER
 COMPLETION_TIME                                     DATE
 OBJECT_TABLESPACE                                   VARCHAR2(30)
 SIZE_ESTIMATE                                       NUMBER
 OBJECT_ROW                                          NUMBER
 PROCESSING_STATE                                    CHAR(1)
 PROCESSING_STATUS                                   CHAR(1)
 BASE_PROCESS_ORDER                                  NUMBER
 BASE_OBJECT_TYPE                                    VARCHAR2(30)
 BASE_OBJECT_NAME                                    VARCHAR2(30)
 BASE_OBJECT_SCHEMA                                  VARCHAR2(30)
 ANCESTOR_PROCESS_ORDER                              NUMBER
 DOMAIN_PROCESS_ORDER                                NUMBER
 PARALLELIZATION                                     NUMBER
 UNLOAD_METHOD                                       NUMBER
 GRANULES                                            NUMBER
 SCN                                                 NUMBER
 GRANTOR                                             VARCHAR2(30)
 XML_CLOB                                            CLOB
 NAME                                                VARCHAR2(30)
 VALUE_T                                             VARCHAR2(4000)
 VALUE_N                                             NUMBER
 IS_DEFAULT                                          NUMBER
 FILE_TYPE                                           NUMBER
 USER_DIRECTORY                                      VARCHAR2(4000)
 USER_FILE_NAME                                      VARCHAR2(4000)
 FILE_NAME                                           VARCHAR2(4000)
 EXTEND_SIZE                                         NUMBER
 FILE_MAX_SIZE                                       NUMBER
 PROCESS_NAME                                        VARCHAR2(30)
 LAST_UPDATE                                         DATE
 WORK_ITEM                                           VARCHAR2(30)
 OBJECT_NUMBER                                       NUMBER
 COMPLETED_BYTES                                     NUMBER
 TOTAL_BYTES                                         NUMBER
 METADATA_IO                                         NUMBER
 DATA_IO                                             NUMBER
 CUMULATIVE_TIME                                     NUMBER
 PACKET_NUMBER                                       NUMBER
 OLD_VALUE                                           VARCHAR2(4000)
 SEED                                                NUMBER
 LAST_FILE                                           NUMBER
 USER_NAME                                           VARCHAR2(30)
 OPERATION                                           VARCHAR2(30)
 JOB_MODE                                            VARCHAR2(30)
 CONTROL_QUEUE                                       VARCHAR2(30)
 STATUS_QUEUE                                        VARCHAR2(30)
 REMOTE_LINK                                         VARCHAR2(4000)
 VERSION                                             NUMBER
 DB_VERSION                                          VARCHAR2(30)
 TIMEZONE                                            VARCHAR2(64)
 STATE                                               VARCHAR2(30)
 PHASE                                               NUMBER
 GUID                                                RAW(16)
 START_TIME                                          DATE
 BLOCK_SIZE                                          NUMBER
 METADATA_BUFFER_SIZE                                NUMBER
 DATA_BUFFER_SIZE                                    NUMBER
 DEGREE                                              NUMBER
 PLATFORM                                            VARCHAR2(100)
 ABORT_STEP                                          NUMBER
 INSTANCE                                            VARCHAR2(60)


The ability to restart an Oracle Data Pump Export or Import job—and to do so remotely—is just one of the many significant capabilities of Oracle Data Pump that the legacy Oracle Export and Oracle Import don't provide. Other capabilities include support for parallelism during the import and export processes and the ability to change the degree of parallelism while a job is running.

Fine-Grained Object Selection

Oracle Data Pump provides more functionality than simply bulk export and import of data, however. It lets you define to a very granular degree the schemas, tablespaces, and other objects you want to export or import by using the INCLUDE, EXCLUDE , and QUERY parameters at various stages of the process to filter or transform the data.

Using the original Oracle Export and Oracle Import utilities, DBAs have to set up convoluted schemes to confine a specific export process to specific database objects. For example, you might set access policies (using role-based access control) on tables differently to allow a batch export process to obtain just the tables it needs. But using policies in this way is not the type of application for which the Oracle fine-grained access control (also known as role-based security) was intended.

Workarounds such as this aren't needed with Oracle Data Pump, which provides a fine-grained object selection feature for very granular selection without any performance impact on the source system, because the selection happens only at export or import time, depending on how you've set it up.

Export and Import Over the Network

Typically, Oracle Data Pump Export results in a binary format dumpfile that is then loaded (using Oracle Data Pump Import) into another database. However, in addition to this file-based Oracle Data Pump Export and Import, Oracle Data Pump provides a network-based approach for direct loading to or unloading from one database to another.

Next Steps


 READ other articles in this series

Most DBAs are probably familiar with using the legacy Oracle Export and Import utilities over a network, using named pipes as the mechanism through which to pass the data. One of the problems with that approach can be that named pipes aren't supported on every platform.

Instead of using named pipes, Oracle Data Pump uses database links for exporting and importing data between source and target over the network. And, as with the named pipe mechanism, the network_link feature entirely bypasses dumpfile creation, transmitting data from one database to another across the network.

This feature should be of keen interest to any DBAs spending lots of time doing production-to-test refreshes for development or extracting data from a read-only data warehouse. (For a read-only database, you must use Oracle Data Pump Export's network_link mechanism, initiating the process on the target database, because Oracle Data Pump won't be able to create the necessary master table on the read-only database.)

Because network bandwidth is the limiting factor, the network_link capability is best used in conjunction with the filtering capability, as a means of siphoning off a subset of data (rather than an entire multigigabyte or -terabyte database) from one database to another.

Here's how to set up an Oracle Data Pump Import, using a network-based approach (see Figure 2):

 

figure 1
Figure 2: Oracle Data Pump uses database links. To export tables from a read-only database, it creates a master table on the read-write database.

 

  • Set up a remote link from a satellite database (database 2) to the master database (database 1).

  • Create the import as a SQL script to run weekly from the satellite database to import selected tables from the source system, filtering and transforming as needed (see Listing 2).

Code Listing 2: Sample script using DBMS_DATAPUMP API for import 

DECLARE
   ind NUMBER;                  — Loop index number
   jobhandle NUMBER;            — Data Pump job handle
   percent_done NUMBER;         — Percentage of job complete
   job_state VARCHAR2(30);      — Keeps track of job state
   le ku$_LogEntry;             — work-in-progress and error messages
   js ku$_JobStatus;            — Job status from get_status
   jd ku$_JobDesc;              — Job description from get_status
   sts ku$_Status;              — Status object returned by get_status

BEGIN

— Create a (user-named) Data Pump job to do a "table-level" import

— using a network link
jobhandle := DBMS_DATAPUMP.OPEN('IMPORT','TABLE', 'RM_TECHLAB', 'DEV_REFRESH');

— Set parallelism to 1— network import/export always uses 1 anyway, may not
— be necessary to explicitly set...
DBMS_DATAPUMP.SET_PARALLEL(jobhandle, 1); 

— Remap schema objects from source named BATCH_JOBS to target KFLOSS.
DBMS_DATAPUMP.METADATA_REMAP(jobhandle,'REMAP_SCHEMA','BATCH_JOBS','KFLOSS');

— If a table already exists in the destination schema, don't override (this
— is default behavior.)
DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'TABLE_EXISTS_ACTION','SKIP');

— Use statistics (rather than blocks) to estimate time.
DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'ESTIMATE','STATISTICS');

— Start the job. An exception is returned if something is not set up properly.
DBMS_DATAPUMP.START_JOB(jobhandle);

— The import job should now be running. This loop monitors the job until 
— it completes, displaying progress information if there are problems

percent_done := 0;
job_state := 'UNDEFINED';

while (job_state != 'COMPLETED') and (job_state != 'STOPPED') loop
DBMS_DATAPUMP.get_status(jobhandle,
DBMS_DATAPUMP.ku$_status_job_error +
DBMS_DATAPUMP.ku$_status_job_status +
DBMS_DATAPUMP.ku$_status_wip,-1,job_state,sts);
js := sts.job_status;

— As the percentage-complete changes in this loop, the new value displays.
if js.percent_done != percent_done
then
DBMS_OUTPUT.PUT_LINE('*** Job percent done = ' || to_char(js.percent_done));
percent_done := js.percent_done;
end if;

— Displays any work-in-progress or error messages received for the job.

if (bitand(sts.mask,dbms_datapump.ku$_status_wip) != 0)
then
le := sts.wip;
else
if (bitand(sts.mask,dbms_datapump.ku$_status_job_error) != 0)
then
le := sts.error;
else
le := null;
end if;
end if;
if le is not null
then
ind := le.FIRST;
while ind is not null loop
DBMS_OUTPUT.PUT_LINE(le(ind).LogText);
ind := le.NEXT(ind);
end loop;
end if;
end loop;

— When the job finishes, display status before detaching from job.
DBMS_OUTPUT.PUT_LINE('Job has completed');
DBMS_OUTPUT.PUT_LINE('Final job state = ' || job_state);
DBMS_DATAPUMP.DETACH(jobhandle);
END;
/


When you run the script on the satellite system, it connects to the source system and brings the data over directly to the target. The master table is created in the schema of the user on the initiating system and not in the database source. This approach works well for read-only databases and also as a means of refreshing development databases with extracts from a production system.

Data Pump Is Fast and Flexible

Oracle Data Pump imports are anywhere from 15 to 40 times as fast as with traditional import, even with parallelism set to 1, in part because of Oracle Data Pump's use of direct path (when possible).

Speed improvements are always welcome, but so are improvements such as the ability to restart a job and get the status of export and import jobs from anywhere. And the rich API and its use of DBMS_METADATA lets DBAs develop comprehensive automated utilities for a variety of data movement scenarios.


Kimberly Floss (kimberly_floss@ioug.org). president of the International Oracle Users Group, specializes in Oracle tuning techniques. She is the author of Oracle SQL Tuning and CBO Internals (Rampant TechPress).


Send us your comments