TECHNOLOGY: Talking Tuning
Speed and Simplify Data Movement
By Kimberly Floss
Use Oracle Database 10g Data Pump for fast, flexible export and import.
With databases approaching the double-digit terabyte range increasingly common these days, Oracle DBAs can spend a fair amount of time shuttling vast amounts of data as part of their daily operations. If we're not trying to keep a complete physical backup current for disaster recovery purposes, we're busy feeding developers their weekly or daily diet of refreshed production data to use in development and test environments or populating a read-only data warehouse at some satellite location. As DBAs, we've developed numerous scripts over the years to facilitate all these processes and we've tuned our scripts to perform as well as they can. But Oracle Export (and Oracle Import, especially) still takes too long.
Fortunately, Oracle Data Pump—Oracle Database 10g's new infrastructure for exporting and importing database data and metadata—speeds up movement of all of this data. This new server-side infrastructure is exposed through a new PL/SQL package, DBMS_DATAPUMP , which you use in one of three basic ways: through two new command-line client tools (expdp, impdp); through Oracle Enterprise Manager; or through your own procedure code, with calls to the application programming interface (API).
At first glance, the command-line clients expdp and impdp may remind you of the exp and imp utilities, but any similarity is superficial: These are both simply new client tools for working with the completely new Oracle server processes and related infrastructure. Let's start by taking a quick look at some of the high points of that infrastructure, before turning to a couple of usage scenarios.
Enabling Restart and Remote Access
The Oracle Data Pump Export and Import facility leverages key Oracle technology, including Advanced Queuing (AQ), the metadata API ( DBMS_METADATA ), and direct path, for example (see Figure 1). Here are a few of the key components of the Oracle Data Pump facility:
The master table matches the job name you pass as a parameter (if you gave it a name when you started the job) or uses a system-generated name.
As part of job completion, the master table's contents are written to the dumpfile. A pointer to the master file data is thus at the head of the dumpfile and is used to reconstitute the data structures upon import. Oracle Data Pump infrastructure deletes the master table upon successful completion of the Data Pump job.
If there's a system failure of some kind—a disk fills to capacity during writing, for example—or if you decide to stop the job so that it doesn't interfere with a critical production job, then the table (and, more important, all the job status information detail the table contains) remains, enabling you to resume the job later (see Listing 1 for the master table structure).
Code Listing 1: Master table structure
SQL> desc kimberly.dev_refresh_job Name Null? Type -------------------------------------- ------- ---------- PROCESS_ORDER NUMBER DUPLICATE NUMBER DUMP_FILEID NUMBER DUMP_POSITION NUMBER DUMP_LENGTH NUMBER DUMP_ALLOCATION NUMBER COMPLETED_ROWS NUMBER ERROR_COUNT NUMBER ELAPSED_TIME NUMBER OBJECT_TYPE_PATH VARCHAR2(200) OBJECT_PATH_SEQNO NUMBER OBJECT_TYPE VARCHAR2(30) IN_PROGRESS CHAR(1) OBJECT_NAME VARCHAR2(500) OBJECT_LONG_NAME VARCHAR2(4000) OBJECT_SCHEMA VARCHAR2(30) ORIGINAL_OBJECT_SCHEMA VARCHAR2(30) PARTITION_NAME VARCHAR2(30) SUBPARTITION_NAME VARCHAR2(30) FLAGS NUMBER PROPERTY NUMBER COMPLETION_TIME DATE OBJECT_TABLESPACE VARCHAR2(30) SIZE_ESTIMATE NUMBER OBJECT_ROW NUMBER PROCESSING_STATE CHAR(1) PROCESSING_STATUS CHAR(1) BASE_PROCESS_ORDER NUMBER BASE_OBJECT_TYPE VARCHAR2(30) BASE_OBJECT_NAME VARCHAR2(30) BASE_OBJECT_SCHEMA VARCHAR2(30) ANCESTOR_PROCESS_ORDER NUMBER DOMAIN_PROCESS_ORDER NUMBER PARALLELIZATION NUMBER UNLOAD_METHOD NUMBER GRANULES NUMBER SCN NUMBER GRANTOR VARCHAR2(30) XML_CLOB CLOB NAME VARCHAR2(30) VALUE_T VARCHAR2(4000) VALUE_N NUMBER IS_DEFAULT NUMBER FILE_TYPE NUMBER USER_DIRECTORY VARCHAR2(4000) USER_FILE_NAME VARCHAR2(4000) FILE_NAME VARCHAR2(4000) EXTEND_SIZE NUMBER FILE_MAX_SIZE NUMBER PROCESS_NAME VARCHAR2(30) LAST_UPDATE DATE WORK_ITEM VARCHAR2(30) OBJECT_NUMBER NUMBER COMPLETED_BYTES NUMBER TOTAL_BYTES NUMBER METADATA_IO NUMBER DATA_IO NUMBER CUMULATIVE_TIME NUMBER PACKET_NUMBER NUMBER OLD_VALUE VARCHAR2(4000) SEED NUMBER LAST_FILE NUMBER USER_NAME VARCHAR2(30) OPERATION VARCHAR2(30) JOB_MODE VARCHAR2(30) CONTROL_QUEUE VARCHAR2(30) STATUS_QUEUE VARCHAR2(30) REMOTE_LINK VARCHAR2(4000) VERSION NUMBER DB_VERSION VARCHAR2(30) TIMEZONE VARCHAR2(64) STATE VARCHAR2(30) PHASE NUMBER GUID RAW(16) START_TIME DATE BLOCK_SIZE NUMBER METADATA_BUFFER_SIZE NUMBER DATA_BUFFER_SIZE NUMBER DEGREE NUMBER PLATFORM VARCHAR2(100) ABORT_STEP NUMBER INSTANCE VARCHAR2(60)
The ability to restart an Oracle Data Pump Export or Import job—and to do so remotely—is just one of the many significant capabilities of Oracle Data Pump that the legacy Oracle Export and Oracle Import don't provide. Other capabilities include support for parallelism during the import and export processes and the ability to change the degree of parallelism while a job is running.
Fine-Grained Object Selection
Oracle Data Pump provides more functionality than simply bulk export and import of data, however. It lets you define to a very granular degree the schemas, tablespaces, and other objects you want to export or import by using the INCLUDE, EXCLUDE , and QUERY parameters at various stages of the process to filter or transform the data.
Using the original Oracle Export and Oracle Import utilities, DBAs have to set up convoluted schemes to confine a specific export process to specific database objects. For example, you might set access policies (using role-based access control) on tables differently to allow a batch export process to obtain just the tables it needs. But using policies in this way is not the type of application for which the Oracle fine-grained access control (also known as role-based security) was intended.
Workarounds such as this aren't needed with Oracle Data Pump, which provides a fine-grained object selection feature for very granular selection without any performance impact on the source system, because the selection happens only at export or import time, depending on how you've set it up.
Export and Import Over the Network
Typically, Oracle Data Pump Export results in a binary format dumpfile that is then loaded (using Oracle Data Pump Import) into another database. However, in addition to this file-based Oracle Data Pump Export and Import, Oracle Data Pump provides a network-based approach for direct loading to or unloading from one database to another.
Most DBAs are probably familiar with using the legacy Oracle Export and Import utilities over a network, using named pipes as the mechanism through which to pass the data. One of the problems with that approach can be that named pipes aren't supported on every platform.
Instead of using named pipes, Oracle Data Pump uses database links for exporting and importing data between source and target over the network. And, as with the named pipe mechanism, the network_link feature entirely bypasses dumpfile creation, transmitting data from one database to another across the network.
This feature should be of keen interest to any DBAs spending lots of time doing production-to-test refreshes for development or extracting data from a read-only data warehouse. (For a read-only database, you must use Oracle Data Pump Export's network_link mechanism, initiating the process on the target database, because Oracle Data Pump won't be able to create the necessary master table on the read-only database.)
Because network bandwidth is the limiting factor, the network_link capability is best used in conjunction with the filtering capability, as a means of siphoning off a subset of data (rather than an entire multigigabyte or -terabyte database) from one database to another.
Here's how to set up an Oracle Data Pump Import, using a network-based approach (see Figure 2):
Code Listing 2: Sample script using DBMS_DATAPUMP API for import
DECLARE ind NUMBER; — Loop index number jobhandle NUMBER; — Data Pump job handle percent_done NUMBER; — Percentage of job complete job_state VARCHAR2(30); — Keeps track of job state le ku$_LogEntry; — work-in-progress and error messages js ku$_JobStatus; — Job status from get_status jd ku$_JobDesc; — Job description from get_status sts ku$_Status; — Status object returned by get_status BEGIN — Create a (user-named) Data Pump job to do a "table-level" import — using a network link jobhandle := DBMS_DATAPUMP.OPEN('IMPORT','TABLE', 'RM_TECHLAB', 'DEV_REFRESH'); — Set parallelism to 1— network import/export always uses 1 anyway, may not — be necessary to explicitly set... DBMS_DATAPUMP.SET_PARALLEL(jobhandle, 1); — Remap schema objects from source named BATCH_JOBS to target KFLOSS. DBMS_DATAPUMP.METADATA_REMAP(jobhandle,'REMAP_SCHEMA','BATCH_JOBS','KFLOSS'); — If a table already exists in the destination schema, don't override (this — is default behavior.) DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'TABLE_EXISTS_ACTION','SKIP'); — Use statistics (rather than blocks) to estimate time. DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'ESTIMATE','STATISTICS'); — Start the job. An exception is returned if something is not set up properly. DBMS_DATAPUMP.START_JOB(jobhandle); — The import job should now be running. This loop monitors the job until — it completes, displaying progress information if there are problems percent_done := 0; job_state := 'UNDEFINED'; while (job_state != 'COMPLETED') and (job_state != 'STOPPED') loop DBMS_DATAPUMP.get_status(jobhandle, DBMS_DATAPUMP.ku$_status_job_error + DBMS_DATAPUMP.ku$_status_job_status + DBMS_DATAPUMP.ku$_status_wip,-1,job_state,sts); js := sts.job_status; — As the percentage-complete changes in this loop, the new value displays. if js.percent_done != percent_done then DBMS_OUTPUT.PUT_LINE('*** Job percent done = ' || to_char(js.percent_done)); percent_done := js.percent_done; end if; — Displays any work-in-progress or error messages received for the job. if (bitand(sts.mask,dbms_datapump.ku$_status_wip) != 0) then le := sts.wip; else if (bitand(sts.mask,dbms_datapump.ku$_status_job_error) != 0) then le := sts.error; else le := null; end if; end if; if le is not null then ind := le.FIRST; while ind is not null loop DBMS_OUTPUT.PUT_LINE(le(ind).LogText); ind := le.NEXT(ind); end loop; end if; end loop; — When the job finishes, display status before detaching from job. DBMS_OUTPUT.PUT_LINE('Job has completed'); DBMS_OUTPUT.PUT_LINE('Final job state = ' || job_state); DBMS_DATAPUMP.DETACH(jobhandle); END; /
When you run the script on the satellite system, it connects to the source system and brings the data over directly to the target. The master table is created in the schema of the user on the initiating system and not in the database source. This approach works well for read-only databases and also as a means of refreshing development databases with extracts from a production system.
Data Pump Is Fast and Flexible
Oracle Data Pump imports are anywhere from 15 to 40 times as fast as with traditional import, even with parallelism set to 1, in part because of Oracle Data Pump's use of direct path (when possible).
Speed improvements are always welcome, but so are improvements such as the ability to restart a job and get the status of export and import jobs from anywhere. And the rich API and its use of DBMS_METADATA lets DBAs develop comprehensive automated utilities for a variety of data movement scenarios.
Kimberly Floss (firstname.lastname@example.org). president of the International Oracle Users Group, specializes in Oracle tuning techniques. She is the author of Oracle SQL Tuning and CBO Internals (Rampant TechPress).