Export/Import on Steroids: Oracle Data Pump
Data movemement gets a big lift with Oracle Database 10g utilities
Until now, the export/import toolset has been the utility of choice for transferring data across multiple platforms with minimal effort, despite common complaints about its lack of speed. Import merely reads each record from the export dump file and inserts it into the target table using the usual INSERT INTO command, so it's no surprise that import can be a slow process.
Enter Oracle Data Pump, the newer and faster sibling of the export/import toolkit in Oracle Database 10g, designed to speed up the process many times over.
Data Pump reflects a complete overhaul of the export/import process. Instead of using the usual SQL commands, it provides proprietary APIs to load and unload data significantly faster. In my tests, I have seen performance increases of 10-15 times over export in direct mode and 5-times-over performance increases in the import process. In addition, unlike with the export utility, it is possible to extract only specific types of objects such as procedures.
Data Pump Export
The new utility is known as expdp to differentiate it from exp, the original export. In this example, we will use Data Pump to export a large table, CASES, about 3GB in size. Data Pump uses file manipulation on the server side to create and read files; hence, directories are used as locations. In this case, we are going to use the filesystem /u02/dpdata1 to hold the dump files.
create directory dpdata1 as '/u02/dpdata1'; grant read, write on directory dpdata1 to ananda;Next, we will export the data:
expdp ananda/abc123 tables=CASES directory=DPDATA1 dumpfile=expCASES.dmp job_name=CASES_EXPORTLet's analyze various parts of this command. The userid/password combination, tables, and dumpfile parameters are self-explanatory. Unlike the original export, the file is created on the server (not the client). The location is specified by the directory parameter value DPDATA1, which points to /u02/dpdata1 as created earlier. The process also creates a log file, again on the server, in the location specified by the directory parameter. By default, a directory named DPUMP_DIR is used by this process; so it can be created instead of the DPDATA1.
Note the parameter job_name above, a special one not found in the original export. All Data Pump work is done though jobs. Data Pump jobs, unlike DBMS jobs, are merely server processes that process the data on behalf of the main process. The main process, known as a master control process, coordinates this effort via Advanced Queuing; it does so through a special table created at runtime known as a master table. In our example, if you check the schema of the user ANANDA while expdp is running you will notice the existence of a table CASES_EXPORT, corresponding to the parameter job_name. This table is dropped when expdp finishes.
While Data Pump Export (DPE) is running, press Control-C; it will stop the display of the messages on the screen, but not the export process itself. Instead, it will display the DPE prompt as shown below. The process is now said to be in "interactive" mode:
This approach allows several commands to be entered on that DPE job. To find a summary, use the STATUS command at the prompt:
Export> status Job: CASES_EXPORT Operation: EXPORT Mode: TABLE State: EXECUTING Degree: 1 Job Error Count: 0 Dump file: /u02/dpdata1/expCASES.dmp bytes written = 2048 Worker 1 Status: State: EXECUTING Object Schema: DWOWNER Object Name: CASES Object Type: TABLE_EXPORT/TBL_TABLE_DATA/TABLE/TABLE_DATA Completed Objects: 1 Total Objects: 1 Completed Rows: 4687818
Remember, this is merely the status display. The export is working in the background. To continue to see the messages on the screen, use the command CONTINUE_CLIENT from the Export> prompt.
You can accelerate jobs significantly using more than one thread for the export, through the PARALLEL parameter. Each thread creates a separate dumpfile, so the parameter dumpfile should have as many entries as the degree of parallelism. Instead of entering each one explicitly, you can specify wildcard characters as filenames such as:
expdp ananda/abc123 tables=CASES directory=DPDATA1 dumpfile=expCASES_%U.dmp parallel=4 job_name=Cases_ExportNote how the dumpfile parameter has a wild card %U, which indicates the files will be created as needed and the format will be expCASES_nn.dmp, where nn starts at 01 and goes up as needed. In parallel mode, the status screen will show four worker processes. (In default mode, only one process will be visible.) All worker processes extract data simultaneously and show their progress on the status screen. It's important to separate the I/O channels for access to the database files and the dumpfile directory filesystems. Otherwise, the overhead associated with maintaining the Data Pump jobs may outweigh the benefits of parallel threads and hence degrade performance. Parallelism will be in effect only if the number of tables is higher than the parallel value and the tables are big. Database Monitoring You can get more information on the Data Pump jobs running from the database views, too. The main view to monitor the jobs is DBA_DATAPUMP_JOBS, which tells you how many worker processes (column DEGREE) are working on the job. The other view that is important is DBA_DATAPUMP_SESSIONS, which when joined with the previous view and V$SESSION gives the SID of the session of the main foreground process.
select sid, serial# from v$session s, dba_datapump_sessions d where s.saddr = d.saddr;This instruction shows the session of the foreground process. More useful information is obtained from the alert log. When the process starts up, the MCP and the worker processes are shown in the alert log as follows:
kupprdp: master process DM00 started with pid=23, OS id=20530 to execute - SYS.KUPM$MCP.MAIN('CASES_EXPORT', 'ANANDA');It shows the PID of the sessions started for the data pump operation. You can find the actual SIDs using this query:
select sid, program from v$session where paddr in (select addr from v$process where pid in (23,24,25));The PROGRAM column will show the process DM (for master process) or DW (the worker proceses), corresponding to the names in the alert log file. If a parallel query is used by a worker process, say for SID 23, you can see it in the view V$PX_SESSION to find it out. It will show you all the parallel query sessions running from the worker process represented by SID 23:
select sid from v$px_session where qcsid = 23;Additional useful information can be obtained from the view V$SESSION_LONGOPS to predict the time it will take to complete the job.
select sid, serial#, sofar, totalwork from v$session_longops where opname = 'CASES_EXPORT' and sofar != totalwork;The column totalwork shows the total amount of work, of which the sofar amount has been done up until now --which you can then use to estimate how much longer it will take. Data Pump Import Data import performance is where Data Pump really stands out, however. To import the data exported earlier, we will use
impdp ananda/abc123 directory=dpdata1 dumpfile=expCASES.dmp job_name=cases_importThe default behavior of the import process is to create the table and all associated objects, and to produce an error when the table exists. Should you want to append the data to the existing table, you could use TABLE_EXISTS_ACTION=APPEND in the above command line. As with Data Pump Export, pressing Control-C on the process brings up the interactive mode of Date Pump Import (DPI); again, the prompt is Import>. Operating on Specific Objects Ever had a need to export only certain procedures from one user to be recreated in a different database or user? Unlike the traditional export utility, Data Pump allows you to export only a particular type of object. For instance, the following command lets you export only procedures, and nothing else --no tables, views, or even functions:
expdp ananda/iclaim directory=DPDATA1 dumpfile=expprocs.dmp include=PROCEDURETo export only a few specific objects --say, function FUNC1 and procedure PROC1 --you could use
expdp ananda/iclaim directory=DPDATA1 dumpfile=expprocs.dmp include=PROCEDURE:\"=\'PROC1\'\",FUNCTION:\"=\'FUNC1\'\"This dumpfile serves as a backup of the sources. You can even use it to create DDL scripts to be used later. A special parameter called SQLFILE allows the creation of the DDL script file.
impdp ananda/iclaim directory=DPDATA1 dumpfile=expprocs.dmp sqlfile=procs.sqlThis instruction creates a file named procs.sql in the directory specified by DPDATA1, containing the scripts of the objects inside the export dumpfile. This approach helps you create the sources quickly in another schema. Using the parameter INCLUDE allows you to define objects to be included or excluded from the dumpfile. You can use the clause INCLUDE=TABLE:"LIKE 'TAB%'" to export only those tables whose name start with TAB. Similarly, you could use the construct INCLUDE=TABLE:"NOT LIKE 'TAB%'" to exclude all tables starting with TAB. Alternatively you can use the EXCLUDE parameter to exclude specific objects. Data Pump can also be used to transport tablespaces using external tables; it's sufficiently powerful to redefine parallelism on the fly, attach more tables to an existing process, and so on (which are beyond the scope of this article; see Oracle Database Utilities 10g Release 1 10.1 for more details). The following command generates the list of all available parameters in the Data Pump export utility:
expdp help=ySimilarly, impdp help=y will show all the parameters in DPI. While Data Pump jobs are running, you can pause them by issuing STOP_JOB on the DPE or DPI prompts and then restart them with START_JOB. This functionality comes in handy when you run out of space and want to make corrections before continuing. For more information, read Part 1 of the Oracle Database Utilities 10g Release 1 10.1 guide.