TECHNOLOGY: Data Integration
A Refreshing MoveBy Arup Nanda
Encrypt, compress, mask, and deliver information with Oracle Data Pump.
John the DBA at Acme Bank had some special visitors: a few development team leads and the head of IT security. The development teams regularly refresh certain tables from other databases, such as data warehouse tables refreshed from online transaction processing systems and various development database tables refreshed from their production counterparts. The refresh process involves exporting the data by using Oracle Data Pump to generate a dumpfile on the source server; FTPing the dumpfile to the target server; and, finally, importing the dumpfile to the target database. Sometimes a single dumpfile generated at the production database is used to refresh other production and development databases.
Because of certain new security requirements, such as those mandated by the Payment Card Industry (PCI), the security team objected to moving dumpfiles containing sensitive and personally identifiable information (PII), such as Social Security numbers and phone numbers, without encryption. The members of the development team had explored the possibility of using OS-level encryption utilities, but that would have resulted in a two-step process: first creating the dumpfile and then encrypting it. The two-step process also required additional storage, which they did not have, and that was another reason they were asking John the DBA for help.
With the growth of data in the production databases, the generated dumpfiles had also been growing and were starting to put some strain on storage and bandwidth. The development team had considered using OS-level utilities to compress the dumpfiles, but that option also involved a two-step process: first creating the dumpfile and then compressing it. This two-step process also required additional intermediate storage, which development did not have, and the idea of new, multistep processes and intermediate storage for both encrypted and compressed dumpfiles was not making the members of the development team very happy.
In addition, the security team mandated that the PII data in the development database be replaced with some random information or a generic character such as X to make it unidentifiable—a process known as masking . The development team proposed to issue a massive update on the development database after the data was refreshed from an encrypted dumpfile, but the DBAs objected to the idea, because the massive update would have caused severe performance issues on the development database. The idea wasn’t palatable to the head of security either—she wanted the PII data masked in the dumpfile itself before shipping the dumpfile to the development server, so that the development DBAs would not be able to see the PII data.
The development team leads were scratching their heads over the different requirements, and they asked John for help. He assured them that the Oracle Data Pump tool in Oracle Database 11g had all the features to address their different needs—encryption of dumpfiles, compression, and masking of PII data—in a single-step process.
Oracle Database 10g Release 2 introduced Oracle Transparent Data Encryption, which enabled columns to be encrypted without any additional programming. The encryption is controlled through a wallet. (For information on setting up wallets for Oracle Transparent Data Encryption, see “Transparent Data Encryption” in the September/October 2005 issue of Oracle Magazine.)
John pointed out that the same encryption mechanism is available in Oracle Database 11g for encrypting the Oracle Data Pump dumpfiles. He demonstrated the dumpfile encryption, first making sure that the wallet was open and then executing the following command to export the CUSTOMERS table (found in the SH sample schema supplied with Oracle Database) and create the customers_norm.dmp file without encryption:
# expdp sh/sh dumpfile=customers_norm.dmp directory=tmp_dir tables=customers
Then he used the following command to export the CUSTOMERS table and create the customers_enc.dmp file with encryption:
# expdp sh/sh dumpfile=customers_enc.dmp directory=tmp_dir tables=customers encryption=all
To demonstrate that the customers_enc.dmp dumpfile was encrypted, John first searched for a specific customer name—Beatrice—in the unencrypted dumpfile:
# grep Beatrice customers_norm.dmp
The command returned
Binary file customers_norm.dmp matches
which meant that the name Beatrice was found in the file. Then John searched for the same name in the encrypted dumpfile:
# grep Beatrice customers_enc.dmp
The command returned no output, confirming that the name Beatrice did not exist in the file. The name Beatrice did exist in the database table, however, so it was included in customers_norm.dmp (unencrypted). The value was encrypted in customers_enc.dmp when the ENCRYPTION parameter was used during export, so searching for the cleartext value yielded no result.
The ENCRYPTION parameter in the expdp command enabled encryption, John explained. The parameter value he supplied was ALL, but someone asked what other values were there. The ENCRYPTION parameter, he explained, accepts the following values:
The import process automatically determines that the dumpfile is encrypted and uses the wallet to get the encryption keys and decrypt the dumpfile during the import. If the import is done into a different database, the wallets in both the source and target database must be identical.
John used the following command to demonstrate the import of the file:
# impdp sh/sh dumpfile=customers_enc directory=tmp_dir
Someone on John’s team remembered that the encryption feature was also available for Oracle Data Pump in Oracle Database 10g. She asked John what the difference was between Oracle Data Pump in Oracle Database 10g and in Oracle Database 11g. John explained that Oracle Data Pump in the earlier version allowed encryption of columns under Oracle Transparent Data Encryption only . Therefore, if Oracle Transparent Data Encryption wasn’t used at all, none of the dumpfile contents would be encrypted. In Oracle Database 11g, he explained, Oracle Data Pump can use dumpfile encryption even when none of the columns are under Oracle Transparent Data Encryption. Moreover, Oracle Data Pump in Oracle Database 11g can encrypt the whole dumpfile, not just a few Oracle Transparent Data Encryption-encrypted columns.
Inside the database, database security measures such as Oracle Transparent Data Encryption may encrypt critical data, but once the data leaves the database in a dumpfile, it loses that database security protection. John’s demonstration proved that complete data encryption was possible, making the security team very happy. The encryption happening in the same step as the creation of the dumpfile and not requiring that Oracle Transparent Data Encryption be enabled on individual columns in the database made the development team very happy.
Next John addressed the need to compress the generated dumpfile. Setting the COMPRESSION parameter in the exp command to ALL compresses both the data and the metadata.
John executed the following command to export the CUSTOMERS table with compression:
# expdp sh/sh dumpfile=customers_comp directory=tmp_dir tables=customers compression=all
The command produced the customers_comp.dmp file, and John compared the size of this compressed dumpfile with the uncompressed customers_norm.dmp file produced earlier. To expand the comparison, he compressed the customers_norm.dmp file with the regular gzip command:
# gzip -9 customers_norm.dmp
The command created a compressed file named customers_norm.dmp.gz. John then compared the sizes of the normal and different compressed files:
From John’s demonstration, everyone understood that Oracle Data Pump compression reduced the original file to almost the same size as the gzip compression (or even smaller, as shown in John’s test). What’s more, the compression occurred when the dumpfile was generated, eliminating a two-step process and consequently saving valuable storage. While importing, John used the same impdp command he’d used earlier, without any special parameters. The Oracle Data Pump utility automatically recognized the dumpfile as compressed, uncompressed it inline, and imported it.
John pointed out that, by default, Oracle Data Pump compresses metadata while exporting. If that compression is not needed, the COMPRESSION=NONE setting disables metadata compression during export.
Finally, John addressed the issue of PII data, such as phone numbers, stored in the database going to the dumpfile for use in the development database. The security department wanted the values randomized instead of sent as is. To demonstrate the data pump randomizing solution, John first created a package, datapump_masking_pkg, that produced different patterns from the input data, as shown in Listing 1. This simple package includes several functions, each of which accepts one parameter, transforms it as directed by the corresponding code, and returns the transformed value.
Code Listing 1: Package for masking data
create or replace package datapump_masking_pkg as function random_phone_same_area (p_in varchar2) return varchar2; function random_area_same_phone (p_in varchar2) return varchar2; function random_all (p_in varchar2) return varchar2; function mask_all (p_in varchar2) return varchar2; end; / create or replace package body datapump_masking_pkg as function random_phone_same_area (p_in varchar2) return varchar2 is begin return substr(p_in,1,4)||round(dbms_random.value (100,999))||'-'|| lpad(round(dbms_random.value (1,9999)),4,'0'); end; function random_area_same_phone (p_in varchar2) return varchar2 is begin return round(dbms_random.value (100,999))||substr(p_in,4,12); end; function random_all (p_in varchar2) return varchar2 is begin return round(dbms_random.value (100,999))||'-'|| round(dbms_random.value (100,999))||'-'|| lpad(round(dbms_random.value (1,9999)),4,'0'); end; function mask_all (p_in varchar2) return varchar2 is begin return 'XXX-XXX-XXXX'; end; end; /
In the datapump_masking_pkg package, John included several functions that use the DBMS_RANDOM package to produce different patterns. The first function—random_phone_same_area—takes a phone number and keeps the area code intact but randomizes the 7-digit phone number. For instance, 123-456-7890 becomes 123 followed by a random string of 7 digits, for example, 895-5429. The random_area_same_phone function does just the opposite—it keeps the 7-digit number but randomizes the area code. The random_all function randomizes all 10 digits, so it produces a number that resembles a real phone number but has nothing in common with the original value. Finally, the mask_all function does not use DBMS_RANDOM and is very different—it replaces all digits with X, so, for example, 123-456-7890 becomes XXX-XXX-XXXX.
In this case, the security team wanted to randomize the area code and keep the same phone number. John addressed this by using the REMAP_DATA parameter in the expdp command to call the datapump_masking_pkg package. The REMAP_DATA parameter value has the format <TableName>.<ColumnName>: <PackageName>.<FunctionName> . John called the random_area_same_phone function in the package:
# expdp sh/sh dumpfile=customers_remap directory=tmp_dir tables=customers remap_data=customers.cust_main_phone_number:datapump_masking_pkg.random_area_same_phone
The dumpfile generated by this command does not have the actual data from the cust_main_phone_number column but, instead, the values supplied by the random_area_same_phone function. Only that column will be modified (or masked) in the dumpfile. All other columns will be exported as is.
After importing the customers_remap.dmp dumpfile to the target (in this case, development) database, John examined the value in one record:
SQL> select cust_main_phone_number 2 from customers 3 where cust_id = 1 4 / CUST_MAIN_PHONE_NUMBER ---------------------------------------------------------------- 256-379-8954
CUST_MAIN_PHONE_NUMBER ---------------------------------------------------------------- 127-379-8954
What if we need to mask additional columns, someone asked. John explained that the solution was to use another REMAP_DATA parameter in the expdp command. For example, to mask an additional column, named customer_phone_2, with a different function—random_phone_same_area—John executed the following:
# expdp sh/sh dumpfile=customers_norm.dmp directory=tmp_dir tables=customers
What if we already have a dumpfile with the original (unmasked) values we want to import into a development database but don’t want to perform a massive “masking” update after the import, someone mused. The head of security first reminded everyone that they couldn’t have unmasked PII information available in dumpfiles headed for the development database. John acknowledged the security requirement and offered a hypothetical answer: even if a dumpfile is initially exported without the REMAP_DATA parameter, that parameter can be applied during import, as shown in this command:
# impdp sh/sh dumpfile=customers_norm directory=tmp_dir tables=customers remap_data=customers.cust_main_phone_number:datapump_masking_pkg.mask_all
Here the import transformed (masked) the data before inserting it, eliminating the need for a postimport update. All the columns were imported as is, except the cust_main_phone_number column, which was transformed while being imported.
Someone then asked if all of these Oracle Data Pump tasks—encryption, compression, and masking—can be used together in one dumpfile export. Of course, assured John. To demonstrate, he used the following command:
# expdp sh/sh dumpfile=customers_norm directory=tmp_dir tables=customers remap_data=customers.cust_main_phone_number:datapump_masking_pkg.mask_all encryption=all compression=all
This command produced an encrypted and compressed dumpfile and masked the cust_main_phone_number column (in the dumpfile), all in one step.
John restated the original requirements—encryption, compression, and masking—and explained how the new features of Oracle Data Pump in Oracle Database 11g could be used to address them, as shown in Table 1. The advantage of Oracle Data Pump encryption and compression features over operating-system-level encryption and compression tools is twofold. First, it performs the actions inline—that is, it decompresses and/or decrypts (as appropriate) while importing from the compressed or encrypted dumpfile. Second, while importing parts of the dumpfile, such as a single table from a full database export, the decompression or decryption occurs on that portion alone, not on the entire dumpfile.
Table 1: Analysis of dumpfile requirements and solutions
Over the next few days, Acme was able to encrypt dumpfiles automatically, without using Oracle Transparent Data Encryption in the database on the PII columns; substantially compress the dumpfiles; and eliminate the presence of PII data in the dumpfiles themselves. The security team was happy that mandated requirements had been met, and the development team was happy, because the process for moving data was faster, required fewer steps, and consumed less storage and I/O.
Arup Nanda (email@example.com) has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He was Oracle Magazine’s DBA of the Year in 2003.