As Published In
Oracle Magazine
July/August 2009

TECHNOLOGY: Data Integration


A Refreshing Move

By Arup Nanda Oracle ACE Director

Encrypt, compress, mask, and deliver information with Oracle Data Pump.

John the DBA at Acme Bank had some special visitors: a few development team leads and the head of IT security. The development teams regularly refresh certain tables from other databases, such as data warehouse tables refreshed from online transaction processing systems and various development database tables refreshed from their production counterparts. The refresh process involves exporting the data by using Oracle Data Pump to generate a dumpfile on the source server; FTPing the dumpfile to the target server; and, finally, importing the dumpfile to the target database. Sometimes a single dumpfile generated at the production database is used to refresh other production and development databases.

Because of certain new security requirements, such as those mandated by the Payment Card Industry (PCI), the security team objected to moving dumpfiles containing sensitive and personally identifiable information (PII), such as Social Security numbers and phone numbers, without encryption. The members of the development team had explored the possibility of using OS-level encryption utilities, but that would have resulted in a two-step process: first creating the dumpfile and then encrypting it. The two-step process also required additional storage, which they did not have, and that was another reason they were asking John the DBA for help.

With the growth of data in the production databases, the generated dumpfiles had also been growing and were starting to put some strain on storage and bandwidth. The development team had considered using OS-level utilities to compress the dumpfiles, but that option also involved a two-step process: first creating the dumpfile and then compressing it. This two-step process also required additional intermediate storage, which development did not have, and the idea of new, multistep processes and intermediate storage for both encrypted and compressed dumpfiles was not making the members of the development team very happy.

In addition, the security team mandated that the PII data in the development database be replaced with some random information or a generic character such as X to make it unidentifiable—a process known as masking . The development team proposed to issue a massive update on the development database after the data was refreshed from an encrypted dumpfile, but the DBAs objected to the idea, because the massive update would have caused severe performance issues on the development database. The idea wasn’t palatable to the head of security either—she wanted the PII data masked in the dumpfile itself before shipping the dumpfile to the development server, so that the development DBAs would not be able to see the PII data.

The development team leads were scratching their heads over the different requirements, and they asked John for help. He assured them that the Oracle Data Pump tool in Oracle Database 11g had all the features to address their different needs—encryption of dumpfiles, compression, and masking of PII data—in a single-step process.

Encryption

Oracle Database 10g Release 2 introduced Oracle Transparent Data Encryption, which enabled columns to be encrypted without any additional programming. The encryption is controlled through a wallet. (For information on setting up wallets for Oracle Transparent Data Encryption, see “Transparent Data Encryption” in the September/October 2005 issue of Oracle Magazine.)

John pointed out that the same encryption mechanism is available in Oracle Database 11g for encrypting the Oracle Data Pump dumpfiles. He demonstrated the dumpfile encryption, first making sure that the wallet was open and then executing the following command to export the CUSTOMERS table (found in the SH sample schema supplied with Oracle Database) and create the customers_norm.dmp file without encryption:

 

 
# expdp sh/sh dumpfile=customers_norm.dmp directory=tmp_dir tables=customers


Then he used the following command to export the CUSTOMERS table and create the customers_enc.dmp file with encryption:

 

 
# expdp sh/sh dumpfile=customers_enc.dmp directory=tmp_dir tables=customers encryption=all


To demonstrate that the customers_enc.dmp dumpfile was encrypted, John first searched for a specific customer name—Beatrice—in the unencrypted dumpfile:

 

# grep Beatrice customers_norm.dmp


The command returned

 

Binary file customers_norm.dmp matches 


which meant that the name Beatrice was found in the file. Then John searched for the same name in the encrypted dumpfile:

 

# grep Beatrice customers_enc.dmp


The command returned no output, confirming that the name Beatrice did not exist in the file. The name Beatrice did exist in the database table, however, so it was included in customers_norm.dmp (unencrypted). The value was encrypted in customers_enc.dmp when the ENCRYPTION parameter was used during export, so searching for the cleartext value yielded no result.

The ENCRYPTION parameter in the expdp command enabled encryption, John explained. The parameter value he supplied was ALL, but someone asked what other values were there. The ENCRYPTION parameter, he explained, accepts the following values:

 

  • NONE: No encryption will be performed.
  • ENCRYPTED_COLUMNS_ONLY: If some columns are encrypted by Oracle Transparent Data Encryption in the database, the values of those columns only are encrypted in the dumpfile.
  • DATA_ONLY: All data (not just data under Oracle Transparent Data Encryption) is encrypted, but metadata is not.
  • METADATA_ONLY: The metadata is encrypted, but the actual data is not.
  • ALL: Both the data and the metadata are encrypted.


Import

The import process automatically determines that the dumpfile is encrypted and uses the wallet to get the encryption keys and decrypt the dumpfile during the import. If the import is done into a different database, the wallets in both the source and target database must be identical.

John used the following command to demonstrate the import of the file:

 

# impdp sh/sh dumpfile=customers_enc directory=tmp_dir


Someone on John’s team remembered that the encryption feature was also available for Oracle Data Pump in Oracle Database 10g. She asked John what the difference was between Oracle Data Pump in Oracle Database 10g and in Oracle Database 11g. John explained that Oracle Data Pump in the earlier version allowed encryption of columns under Oracle Transparent Data Encryption only . Therefore, if Oracle Transparent Data Encryption wasn’t used at all, none of the dumpfile contents would be encrypted. In Oracle Database 11g, he explained, Oracle Data Pump can use dumpfile encryption even when none of the columns are under Oracle Transparent Data Encryption. Moreover, Oracle Data Pump in Oracle Database 11g can encrypt the whole dumpfile, not just a few Oracle Transparent Data Encryption-encrypted columns.

Inside the database, database security measures such as Oracle Transparent Data Encryption may encrypt critical data, but once the data leaves the database in a dumpfile, it loses that database security protection. John’s demonstration proved that complete data encryption was possible, making the security team very happy. The encryption happening in the same step as the creation of the dumpfile and not requiring that Oracle Transparent Data Encryption be enabled on individual columns in the database made the development team very happy.

Compression

Next John addressed the need to compress the generated dumpfile. Setting the COMPRESSION parameter in the exp command to ALL compresses both the data and the metadata.

John executed the following command to export the CUSTOMERS table with compression:

 

# expdp sh/sh dumpfile=customers_comp directory=tmp_dir tables=customers compression=all


The command produced the customers_comp.dmp file, and John compared the size of this compressed dumpfile with the uncompressed customers_norm.dmp file produced earlier. To expand the comparison, he compressed the customers_norm.dmp file with the regular gzip command:

 

# gzip -9 customers_norm.dmp


The command created a compressed file named customers_norm.dmp.gz. John then compared the sizes of the normal and different compressed files:

 

Type of File Size
customers_norm.dmp 31.58MB
customers_comp.dmp 2.98MB
customers_norm.dmp.gz 3.67MB




From John’s demonstration, everyone understood that Oracle Data Pump compression reduced the original file to almost the same size as the gzip compression (or even smaller, as shown in John’s test). What’s more, the compression occurred when the dumpfile was generated, eliminating a two-step process and consequently saving valuable storage. While importing, John used the same impdp command he’d used earlier, without any special parameters. The Oracle Data Pump utility automatically recognized the dumpfile as compressed, uncompressed it inline, and imported it.

John pointed out that, by default, Oracle Data Pump compresses metadata while exporting. If that compression is not needed, the COMPRESSION=NONE setting disables metadata compression during export.

Masking

Finally, John addressed the issue of PII data, such as phone numbers, stored in the database going to the dumpfile for use in the development database. The security department wanted the values randomized instead of sent as is. To demonstrate the data pump randomizing solution, John first created a package, datapump_masking_pkg, that produced different patterns from the input data, as shown in Listing 1. This simple package includes several functions, each of which accepts one parameter, transforms it as directed by the corresponding code, and returns the transformed value.

Code Listing 1: Package for masking data

 

create or replace package datapump_masking_pkg
as
   function random_phone_same_area (p_in varchar2) return varchar2;
   function random_area_same_phone (p_in varchar2) return varchar2;
   function random_all (p_in varchar2) return varchar2;
   function mask_all (p_in varchar2) return varchar2;
end;
/
create or replace package body datapump_masking_pkg
as
   function random_phone_same_area (p_in varchar2) return varchar2 is
   begin
      return 
         substr(p_in,1,4)||round(dbms_random.value (100,999))||'-'||
         lpad(round(dbms_random.value (1,9999)),4,'0');
   end;
   function random_area_same_phone (p_in varchar2) return varchar2 is
   begin
      return round(dbms_random.value (100,999))||substr(p_in,4,12);
   end;
   function random_all (p_in varchar2) return varchar2 is
   begin
      return round(dbms_random.value (100,999))||'-'||
         round(dbms_random.value (100,999))||'-'||
         lpad(round(dbms_random.value (1,9999)),4,'0');
   end;
   function mask_all (p_in varchar2)
   return varchar2 is
   begin
      return 'XXX-XXX-XXXX';
   end;
end;
/


In the datapump_masking_pkg package, John included several functions that use the DBMS_RANDOM package to produce different patterns. The first function—random_phone_same_area—takes a phone number and keeps the area code intact but randomizes the 7-digit phone number. For instance, 123-456-7890 becomes 123 followed by a random string of 7 digits, for example, 895-5429. The random_area_same_phone function does just the opposite—it keeps the 7-digit number but randomizes the area code. The random_all function randomizes all 10 digits, so it produces a number that resembles a real phone number but has nothing in common with the original value. Finally, the mask_all function does not use DBMS_RANDOM and is very different—it replaces all digits with X, so, for example, 123-456-7890 becomes XXX-XXX-XXXX.

In this case, the security team wanted to randomize the area code and keep the same phone number. John addressed this by using the REMAP_DATA parameter in the expdp command to call the datapump_masking_pkg package. The REMAP_DATA parameter value has the format <TableName>.<ColumnName>: <PackageName>.<FunctionName> . John called the random_area_same_phone function in the package:

 

# expdp sh/sh dumpfile=customers_remap directory=tmp_dir tables=customers
remap_data=customers.cust_main_phone_number:datapump_masking_pkg.random_area_same_phone


The dumpfile generated by this command does not have the actual data from the cust_main_phone_number column but, instead, the values supplied by the random_area_same_phone function. Only that column will be modified (or masked) in the dumpfile. All other columns will be exported as is.

After importing the customers_remap.dmp dumpfile to the target (in this case, development) database, John examined the value in one record:

 

SQL> select cust_main_phone_number
  2    from customers
  3    where cust_id = 1
  4    /
 
CUST_MAIN_PHONE_NUMBER
----------------------------------------------------------------
256-379-8954


In the source (production) database, John checked the value for the same record:


CUST_MAIN_PHONE_NUMBER
----------------------------------------------------------------
127-379-8954

Next Steps


READ more about
Oracle Data Pump
Oracle Database Utilities
Oracle Data Pump Quick Start

Oracle Transparent Data Encryption
oracle.com/technetwork/2005/05-sep/o55security.html
oracle.com/technetwork/issue-archive/2009/09-jan/o19tte.html

Note that the area code changed from 127 in the source database to a random number, 256, in the target database. The rest of the phone number remained the same, in accordance with the security requirement. Had the requirement been something else, such as to randomize the phone number, John could have used one of the other functions in the package.

What if we need to mask additional columns, someone asked. John explained that the solution was to use another REMAP_DATA parameter in the expdp command. For example, to mask an additional column, named customer_phone_2, with a different function—random_phone_same_area—John executed the following:

 

# expdp sh/sh dumpfile=customers_norm.dmp directory=tmp_dir tables=customers
remap_data=customers.cust_main_phone_number:datapump_masking_pkg.random_area_same_phone
remap_data=customers.customer_number_2:datapump_masking_pkg.random_phone_same_area


What if we already have a dumpfile with the original (unmasked) values we want to import into a development database but don’t want to perform a massive “masking” update after the import, someone mused. The head of security first reminded everyone that they couldn’t have unmasked PII information available in dumpfiles headed for the development database. John acknowledged the security requirement and offered a hypothetical answer: even if a dumpfile is initially exported without the REMAP_DATA parameter, that parameter can be applied during import, as shown in this command:

 

# impdp sh/sh dumpfile=customers_norm directory=tmp_dir tables=customers
remap_data=customers.cust_main_phone_number:datapump_masking_pkg.mask_all


Here the import transformed (masked) the data before inserting it, eliminating the need for a postimport update. All the columns were imported as is, except the cust_main_phone_number column, which was transformed while being imported.

Someone then asked if all of these Oracle Data Pump tasks—encryption, compression, and masking—can be used together in one dumpfile export. Of course, assured John. To demonstrate, he used the following command:

 

# expdp sh/sh dumpfile=customers_norm directory=tmp_dir tables=customers
remap_data=customers.cust_main_phone_number:datapump_masking_pkg.mask_all
encryption=all
compression=all


This command produced an encrypted and compressed dumpfile and masked the cust_main_phone_number column (in the dumpfile), all in one step.

Conclusion

John restated the original requirements—encryption, compression, and masking—and explained how the new features of Oracle Data Pump in Oracle Database 11g could be used to address them, as shown in Table 1. The advantage of Oracle Data Pump encryption and compression features over operating-system-level encryption and compression tools is twofold. First, it performs the actions inline—that is, it decompresses and/or decrypts (as appropriate) while importing from the compressed or encrypted dumpfile. Second, while importing parts of the dumpfile, such as a single table from a full database export, the decompression or decryption occurs on that portion alone, not on the entire dumpfile.

 

Requirement Solution
Dumpfiles need to be encrypted and decrypted inline, securely, and without human intervention. Set encryption=all during export; no special settings are required during import.
Dumpfiles need to be compressed and decompressed inline, with no separate step required at compression or decompression. Set compression=all during export; import automatically decompresses dumpfiles inline.

 Table 1: Analysis of dumpfile requirements and solutions



Over the next few days, Acme was able to encrypt dumpfiles automatically, without using Oracle Transparent Data Encryption in the database on the PII columns; substantially compress the dumpfiles; and eliminate the presence of PII data in the dumpfiles themselves. The security team was happy that mandated requirements had been met, and the development team was happy, because the process for moving data was faster, required fewer steps, and consumed less storage and I/O.

 

Encryption by Password


Oracle Data Pump also includes an ENCRYPTION_PASSWORD parameter, which enables users to specify a password while exporting. During import, the same password must be used to decrypt the dumpfile. To enable this feature, you have to set another parameter—ENCRYPTION_MODE—to either PASSWORD or DUAL. Here is an example of a data pump export using the ENCRYPTION_PASSWORD and ENCRYPTION_MODE parameters:

 

# expdp sh/sh dumpfile=customers_enc.dmp directory=tmp_dir 
tables=customers encryption=all encryption_mode=password 
encryption_password=TopSecret


The same parameter/value combination— encryption_password=TopSecret —needs to be used during import of this customers_enc.dmp file. Although this option does not require that Oracle Transparent Data Encryption be set up in the database and consequently eliminates the need to copy the wallet, it does require that passwords be provided either by command line, in a script, or manually—all options that are considered less secure. With the wallet, the need to enter passwords is eliminated, providing a truly secure environment.



 


Arup Nanda (arup@proligence.com) has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He was Oracle Magazine’s DBA of the Year in 2003.

Send us your comments