|
TECHNOLOGY: Talking Tuning
Speed and Simplify Data Movement
By Kimberly Floss
Use Oracle Database 10g Data Pump for fast, flexible export and import.
With databases approaching the double-digit terabyte range increasingly common these days, Oracle DBAs can spend a fair amount of time shuttling vast amounts of data as part of their daily operations. If we're not trying to keep a complete physical backup current for disaster recovery purposes, we're busy feeding developers their weekly or daily diet of refreshed production data to use in development and test environments or populating a read-only data warehouse at some satellite location. As DBAs, we've developed numerous scripts over the years to facilitate all these processes and we've tuned our scripts to perform as well as they can. But Oracle Export (and Oracle Import, especially) still takes too long.
Fortunately, Oracle Data PumpOracle Database 10g's new infrastructure for exporting and importing database data and metadataspeeds up movement of all of this data. This new server-side infrastructure is exposed through a new PL/SQL package, DBMS_DATAPUMP, which you use in one of three basic ways: through two new command-line client tools (expdp, impdp); through Oracle Enterprise Manager; or through your own procedure code, with calls to the application programming interface (API).
At first glance, the command-line clients expdp and impdp may remind you of the exp and imp utilities, but any similarity is superficial: These are both simply new client tools for working with the completely new Oracle server processes and related infrastructure. Let's start by taking a quick look at some of the high points of that infrastructure, before turning to a couple of usage scenarios.
Enabling Restart and Remote Access
The Oracle Data Pump Export and Import facility leverages key Oracle technology, including Advanced Queuing (AQ), the metadata API (DBMS_METADATA), and direct path, for example (see Figure 1). Here are a few of the key components of the Oracle Data Pump facility:
|
| Figure 1: Oracle Data Pump architecture
|
- A master control process that manages the export and import process as jobs you define. Each job has its own master control process that gets instantiated when you start the job. One of the first things a master control process does is to spawn several other important facilities, including worker processes, and populate a master table.
- The worker processes unload or load metadata and data, as required by the user-specified job parameters. Worker processes use the Metadata API for all metadata loading and unloading. The number of worker processes the master control process spawns depends on the degree of parallelism you've selected (for Oracle Database 10g Enterprise Editionparallelism isn't provided in Oracle Database 10g Standard Edition).
- A master table maintains job state, description, restart, dumpfile, and other information as a job is under waythe master control process and all worker processes feed this data into the master table as operations occur.
The master table matches the job name you pass as a parameter (if you
gave it a name when you started the job) or uses a system-generated name.
As part of job completion, the master table's contents are written to the dumpfile. A pointer to the master file data is thus at the head of the dumpfile and is used to reconstitute the data structures upon import. Oracle Data Pump infrastructure deletes the master table upon successful completion of
the Data Pump job.
If there's a system failure of some kinda disk fills to capacity during writing, for exampleor if you decide to stop the job so that it doesn't interfere with a critical production job, then the table (and, more important, all the job status information detail the table contains) remains, enabling you to resume the job later (see Listing 1 for the master table structure).
Code Listing 1: Master table structure
SQL> desc kimberly.dev_refresh_job
Name Null? Type
-------------------------------------- ------- --------------------------
PROCESS_ORDER NUMBER
DUPLICATE NUMBER
DUMP_FILEID NUMBER
DUMP_POSITION NUMBER
DUMP_LENGTH NUMBER
DUMP_ALLOCATION NUMBER
COMPLETED_ROWS NUMBER
ERROR_COUNT NUMBER
ELAPSED_TIME NUMBER
OBJECT_TYPE_PATH VARCHAR2(200)
OBJECT_PATH_SEQNO NUMBER
OBJECT_TYPE VARCHAR2(30)
IN_PROGRESS CHAR(1)
OBJECT_NAME VARCHAR2(500)
OBJECT_LONG_NAME VARCHAR2(4000)
OBJECT_SCHEMA VARCHAR2(30)
ORIGINAL_OBJECT_SCHEMA VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
FLAGS NUMBER
PROPERTY NUMBER
COMPLETION_TIME DATE
OBJECT_TABLESPACE VARCHAR2(30)
SIZE_ESTIMATE NUMBER
OBJECT_ROW NUMBER
PROCESSING_STATE CHAR(1)
PROCESSING_STATUS CHAR(1)
BASE_PROCESS_ORDER NUMBER
BASE_OBJECT_TYPE VARCHAR2(30)
BASE_OBJECT_NAME VARCHAR2(30)
BASE_OBJECT_SCHEMA VARCHAR2(30)
ANCESTOR_PROCESS_ORDER NUMBER
DOMAIN_PROCESS_ORDER NUMBER
PARALLELIZATION NUMBER
UNLOAD_METHOD NUMBER
GRANULES NUMBER
SCN NUMBER
GRANTOR VARCHAR2(30)
XML_CLOB CLOB
NAME VARCHAR2(30)
VALUE_T VARCHAR2(4000)
VALUE_N NUMBER
IS_DEFAULT NUMBER
FILE_TYPE NUMBER
USER_DIRECTORY VARCHAR2(4000)
USER_FILE_NAME VARCHAR2(4000)
FILE_NAME VARCHAR2(4000)
EXTEND_SIZE NUMBER
FILE_MAX_SIZE NUMBER
PROCESS_NAME VARCHAR2(30)
LAST_UPDATE DATE
WORK_ITEM VARCHAR2(30)
OBJECT_NUMBER NUMBER
COMPLETED_BYTES NUMBER
TOTAL_BYTES NUMBER
METADATA_IO NUMBER
DATA_IO NUMBER
CUMULATIVE_TIME NUMBER
PACKET_NUMBER NUMBER
OLD_VALUE VARCHAR2(4000)
SEED NUMBER
LAST_FILE NUMBER
USER_NAME VARCHAR2(30)
OPERATION VARCHAR2(30)
JOB_MODE VARCHAR2(30)
CONTROL_QUEUE VARCHAR2(30)
STATUS_QUEUE VARCHAR2(30)
REMOTE_LINK VARCHAR2(4000)
VERSION NUMBER
DB_VERSION VARCHAR2(30)
TIMEZONE VARCHAR2(64)
STATE VARCHAR2(30)
PHASE NUMBER
GUID RAW(16)
START_TIME DATE
BLOCK_SIZE NUMBER
METADATA_BUFFER_SIZE NUMBER
DATA_BUFFER_SIZE NUMBER
DEGREE NUMBER
PLATFORM VARCHAR2(100)
ABORT_STEP NUMBER
INSTANCE VARCHAR2(60)
The ability to restart an Oracle Data Pump Export or Import joband to do so remotelyis just one of the many significant capabilities of Oracle Data Pump that the legacy Oracle Export and Oracle Import don't provide. Other capabilities include support for parallelism during the import and export processes and the ability to change the degree of parallelism while a job is running.
Fine-Grained Object Selection
Oracle Data Pump provides more functionality than simply bulk export and import of data, however. It lets you define to a very granular degree the schemas, tablespaces, and other objects you want to export or import by using the INCLUDE, EXCLUDE, and QUERY parameters at various stages of the process to filter or transform the data.
Using the original Oracle Export and Oracle Import utilities, DBAs have to set up convoluted schemes to confine a specific export process to specific database objects. For example, you might set access policies (using role-based access control) on tables differently to allow a batch export process to obtain just the tables it needs. But using policies in this way is not the type of application for which the Oracle fine-grained access control (also known as role-based security) was intended.
Workarounds such as this aren't needed with Oracle Data Pump, which provides a fine-grained object selection feature for very granular selection without any performance impact on the source system, because the selection happens only at export or import time, depending on how you've set it up.
Export and Import Over the Network
Typically, Oracle Data Pump Export results in a binary format dumpfile that is then loaded (using Oracle Data Pump Import) into another database. However, in addition to this file-based Oracle Data Pump Export and Import, Oracle Data Pump provides a network-based approach for direct loading to or unloading from one database to another.
Most DBAs are probably familiar with using the legacy Oracle Export and Import utilities over a network, using named pipes as the mechanism through which to pass the data. One of the problems with that approach can be that named pipes aren't supported on every platform.
Instead of using named pipes, Oracle Data Pump uses database links for exporting and importing data between source and target over the network. And, as with the named pipe mechanism, the network_link feature entirely bypasses dumpfile creation, transmitting data from one database to another across the network.
This feature should be of keen interest to any DBAs spending lots of time doing production-to-test refreshes for development or extracting data from a read-only data warehouse. (For a read-only database, you must use Oracle Data Pump Export's network_link mechanism, initiating the process on the target database, because Oracle Data Pump won't be able to create the necessary master table on the read-only database.)
Because network bandwidth is the limiting factor, the network_link capability is best used in conjunction with the filtering capability, as a means of siphoning off a subset of data (rather than an entire multigigabyte or -terabyte database) from one database to another.
Here's how to set up an Oracle Data Pump Import, using a network-based approach (see Figure 2):
|
| Figure 2: Oracle Data Pump uses database links. To export tables from a read-only database, it creates a master table on the read-write database.
|
- Set up a remote link from a satellite database (database 2) to the master
database (database 1).
- Create the import as a SQL script to run weekly from the satellite database to import selected tables from the source system, filtering and transforming as needed (see Listing 2).
Code Listing 2: Sample script using DBMS_DATAPUMP API for import
DECLARE
ind NUMBER; Loop index number
jobhandle NUMBER; Data Pump job handle
percent_done NUMBER; Percentage of job complete
job_state VARCHAR2(30); Keeps track of job state
le ku$_LogEntry; work-in-progress and error messages
js ku$_JobStatus; Job status from get_status
jd ku$_JobDesc; Job description from get_status
sts ku$_Status; Status object returned by get_status
BEGIN
Create a (user-named) Data Pump job to do a "table-level" import
using a network link
jobhandle := DBMS_DATAPUMP.OPEN('IMPORT','TABLE', 'RM_TECHLAB', 'DEV_REFRESH');
Set parallelism to 1 network import/export always uses 1 anyway, may not
be necessary to explicitly set...
DBMS_DATAPUMP.SET_PARALLEL(jobhandle, 1);
Remap schema objects from source named BATCH_JOBS to target KFLOSS.
DBMS_DATAPUMP.METADATA_REMAP(jobhandle,'REMAP_SCHEMA','BATCH_JOBS','KFLOSS');
If a table already exists in the destination schema, don't override (this
is default behavior.)
DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'TABLE_EXISTS_ACTION','SKIP');
Use statistics (rather than blocks) to estimate time.
DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'ESTIMATE','STATISTICS');
Start the job. An exception is returned if something is not set up properly.
DBMS_DATAPUMP.START_JOB(jobhandle);
The import job should now be running. This loop monitors the job until
it completes, displaying progress information if there are problems
percent_done := 0;
job_state := 'UNDEFINED';
while (job_state != 'COMPLETED') and (job_state != 'STOPPED') loop
DBMS_DATAPUMP.get_status(jobhandle,
DBMS_DATAPUMP.ku$_status_job_error +
DBMS_DATAPUMP.ku$_status_job_status +
DBMS_DATAPUMP.ku$_status_wip,-1,job_state,sts);
js := sts.job_status;
As the percentage-complete changes in this loop, the new value displays.
if js.percent_done != percent_done
then
DBMS_OUTPUT.PUT_LINE('*** Job percent done = ' || to_char(js.percent_done));
percent_done := js.percent_done;
end if;
Displays any work-in-progress or error messages received for the job.
if (bitand(sts.mask,dbms_datapump.ku$_status_wip) != 0)
then
le := sts.wip;
else
if (bitand(sts.mask,dbms_datapump.ku$_status_job_error) != 0)
then
le := sts.error;
else
le := null;
end if;
end if;
if le is not null
then
ind := le.FIRST;
while ind is not null loop
DBMS_OUTPUT.PUT_LINE(le(ind).LogText);
ind := le.NEXT(ind);
end loop;
end if;
end loop;
When the job finishes, display status before detaching from job.
DBMS_OUTPUT.PUT_LINE('Job has completed');
DBMS_OUTPUT.PUT_LINE('Final job state = ' || job_state);
DBMS_DATAPUMP.DETACH(jobhandle);
END;
/
When you run the script on the satellite system, it connects to the source system and brings the data over directly to the target. The master table is created in the schema of the user on the initiating system and not in the database source. This approach works well for read-only databases and also as a means of refreshing development databases with extracts from a production system.
Data Pump Is Fast and Flexible
Oracle Data Pump imports are anywhere from 15 to 40 times as fast as with traditional import, even with parallelism set to 1, in part because of Oracle Data Pump's use of
direct path (when possible).
Speed improvements are always welcome, but so are improvements such as the ability to restart a job and get the status of export and import jobs from anywhere. And the rich API and its use of DBMS_METADATA lets DBAs develop comprehensive automated utilities for
a variety of data movement scenarios.
Kimberly Floss (kimberly_floss@ioug.org). president of the International Oracle Users Group, specializes in Oracle tuning techniques. She is the author of Oracle SQL Tuning and CBO Internals (Rampant TechPress).
|