This tutorial shows you how to identify sensitive data, select appropriate mask formats for the sensitive fields, apply the mask and then replace the sensitive fields in the production version.
Approximately 1 hour
This tutorial covers the following topics:
Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
The Data Masking Pack for Databases helps organizations share production data in compliance with privacy and confidentiality policies by replacing sensitive data with realistic but scrubbed data based on masking rules.
There are 2 primary use cases for the Data Masking Pack. First, DBAs want to take a copy of production for testing purposes and use the Data Masking Pack to replace all sensitive data with innocuous but realistic information and then make this database available to developers. Second, organizations want to share production data with 3rd parties while hiding sensitive or personally identifiable information.
In this tutorial, you review the HR schema in a staging environment to identify sensitive data and then select appropriate mask formats for the sensitive data. After creating the mask, you then proceed to apply the mask and replace the sensitive fields in the production copy with realistic but scrubbed data based on the masking rules specified earlier.
Before you perform this tutorial, you should:
1. | Install Oracle Database 10gR2 or Oracle Database 11gR1.
|
|
2. | Install the Sample Schema that comes with Oracle Database 10gR2 or Oracle Database 11gR1.
|
|
3. | Upgrade your Oracle Enterprise Manager 10gR2 installation to Oracle Enterprise Manager Grid Control 10gR4 using the 10.2.0.4 patchset.
|
|
4. | Download and unzip the datamask.zip file into your working directory.
|
|
5. | This tutorial assumes that you have created some mask formats in the EM Repository. To load the mask formats, open a SQL*Plus session from your working directory and executed the following script: @dm_rule_setup Note: you will be prompted for the EM gridcontrol user. |
|
6. | There are also some other change you need to make to your default HR schema to support the data masking scenarios in this tutorial. From your SQL*Plus session, execute the following script as your HR user. @dm_changes Note: If you want to refresh your HR user schema after this tutorial, you need to run the default demo schema creation script which is mksample.sql in the $ORACLE_HOME/demo/schema directory. |
The Data Masking Pack provides security administrators with the ability to maintain a centralized definition for common masking formats. This centralized definition ensures that DBAs apply the same masking rules across the entire enterprise regardless of the database where the sensitive data resides. Thus organizations and enterprises can ensure that the sensitive data is masked consistently and in compliance with the established standards for data sharing.
In this topic, you create a masking format for California Drivers license numbers. Their format starts with A, B, C or D and then follow with a random 7 digit number. Perform the following steps:
1. | Open a browser and enter the following URL to log in to Oracle Enterprise Manager 10g (change the <hostname> to localhost or your specific hostname). http://<hostname>:4889/em
|
2. | Login as the user you created when you installed Oracle Enterprise Manager and click Login.
|
3. | Click the Targets tab.
|
4. | Click the Databases tab.
|
5. | Select your database from the list or enter search criteria and click Go.
|
6. | Select your database from the list (if you entered search criteria in the previous step).
|
7. | Click the Administration tab.
|
8. | Scroll down the page and under Data Masking, select Definitions.
|
9. | Select the Format Library link under Format Library.
|
10. | The format library contains a centralized set of data masking format masks for commonly used sensitive data. For example, there are mask formats for phone numbers, social security numbers, credit card numbers, and fictitious first and last names. Security administrators can define these mask formats and maintain them in the format library to be used by the DBAs across the entire enterprise. Select the Bay Area Phone Number mask.
|
11. | Notice that the phone number format is created using three types of mask primitives. First, the array lists consist of a set of area codes, followed by a fixed string, followed by a 4 digit random number. The Data Masking Pack provides mask primitives for characters, numbers or digits and dates to allow the administrators to construct any type of mask formats quickly and easily. Click Return.
|
12. |
Now you will create a new masking format. From the Format Library page, click Create.
|
13. |
Enter a name, in this case, 01_CR_DR_LICENSE, and make sure Array List is selected from the drop-down list and click Go.
|
14. |
Enter the List of Values A,B,C,D and click OK.
|
15. |
The Array List type was created. Select Random Digits and click Go.
|
16. |
Enter Start Length of 7 and End Length of 7 and click OK.
|
17. |
The Random Digits type was created. Notice that the types were used to generate some samples of the defined format. Click OK.
|
18. | The Masking Definition for California Driving Licenses has been created. Click the Database breadcrumb to return to the Administration page.
|
You want to make sure your data is masked appropriately. This is because data in relational databases are related to each other through a primary-key-foreign key relationships. For instance, an EMPLOYEE_ID in an EMPLOYEE table is related to the SALES_REP_ID in a CUSTOMERS table. To identify the sensitive data for masking, perform the following steps:
1. | From the Administration tab for your database, under Schema: Database Objects, select Tables.
|
2. | If your preferred credentials are not set, you will receive the login window. Login as the SYSTEM user.
|
3. | Enter your HR user for Schema and enter EMPLOYEE for the Object Name and click Go.
|
4. | To view the data, select View Data from the drop-down list box and click Go.
|
5. | The data is shown. There are several sources of confidential data that you want to mask, such as
EMPLOYEE_ID, FIRST and LAST NAMES, SALARY. In the next topic you will define the masking rules for this data.
|
A data masking definition is the association of tables and columns in a set of schema with a masking format. The Data Masking definition contains a list of sensitive columns in the application tables, e.g. employee social security numbers, and its corresponding association with data masking formats, e.g. a fake social security number generator.
Applications, such as Peoplesoft, do not store the primary key- foreign key relationships in the database in order to be database independent; they are enforced in the application. In those cases, the Data Masking Pack provides administrators with the ability to register these relationships so that the columns in the related tables, e.g. EMPLOYEE_ID, MGR_ID, are masked identically using the same masking rules.
In this topic, you associate the vulnerable columns in a table to a masking format. Perform the following steps:
1. | So that you can compare the data after the data masking definition is applied, you want to leave this window open and open another tab or window in your browser. Right-click the Database breakcrumb and select Open Link in New Window (or a New Tab for browsers that support Tabs).
|
2. | From the Administration tab for your Database, under Data Masking, select Definitions.
|
3. | On the Masking Definitions page, click Mask.
|
4. | You need to select the columns you want to include in this mask definition. Enter a name and under columns click Add.
|
5. | There are several ways by which sensitive data can be identified. One recommended approach is to identify the sensitive data by tagging the associated sensitive columns with a keyword, such as MASK. Enter your HR user for the Schema and MASK% for the Comment Name and click Search.
|
6. | A list of the columns that contain MASK in the COMMENTS field are shown. Select the column for EMPLOYEE_ID and click Add.
|
7. | In this example, there is a custom table called MANAGERS that is a part of the HR application but its constraints including foreign key relationships are enforced in the application. The MANAGERS table uses EMPLOYEE_ID as its driver table but the relationship is not registered in the database as a constraint, therefore, you need to add a Dependent column on the EMPLOYEE_ID column. Click the + for the EMPLOYEE_ID column under Add for Dependent Columns.
|
8. | Enter your HR user for the Schema and MANAGERS for the Table Name and click Search.
|
9. | Select the MGR_ID column from the list and click Add.
|
10. | The dependent column was added. Now you can define the mask format of the EMPLOYEE_ID column. Click the Format icon.
|
11. | Select Random Numbers from the Add drop-down list and click Go.
|
12. | Enter 100000000 for Start Value and 999999999 for End Value and click OK.
|
13. | Click OK.
|
14. | You want to mask a few more columns. Under Columns, click Add.
|
15. | This time enter your HR user for the Schema and EMPLOYEES for the Table Name and click Search.
|
16. | Notice that this time all the columns in the EMPLOYEES table are listed, not just the columns that contain a COMMENT with MASK in its name. Select the following columns from the list and click Add. FIRST_NAME
|
17. | Select the Format icon for LAST_NAME.
|
18. | You can again use the format already defined in the Format Library. The Format Library allows administrators to maintain central definitions of the standard mask formats so that they can be used by the DBAs across the enterprise. Click Import from Library.
|
19. | Select Anglo American Last Name and click Import.
|
20. | Click OK.
|
21. | Select the Format icon for FIRST_NAME.
|
22. | There is already a format defined for this column. You can import it from the library. Click Import from Library.
|
23. | Select Anglo American First Name and click Import.
|
24. | Click OK.
|
25. | Select the Format icon for PHONE_NUMBER.
|
26. | Click Import from Library.
|
27. | Select Bay Area Phone Number and click Import.
|
28. | Click OK.
|
29. | Select the Format icon for SALARY.
|
30. | For this column you want to randomly shuffle the original column data. Select Shuffle from the Add drop-down list and then click OK. Note: You cannot combine a column using this format type with other format types, such as Random Numbers or Random Strings.
|
31. | You have defined all the columns that you want to mask data for. Click Next.
|
32. | A window appears indicating that a Masking Script is being generated.
|
33. | Review the warnings and the space check messages and click Next.
|
34. | Enter a job name and click Next.
|
35. | View the script in summary and full form by switching between the radio buttons. Note that you can save the script as well. When done, click Submit.
|
36. | The job was submitted successfully. Click View Job Details.
|
37. | The job completed successfully. Now you can compare the masked data against what was previously in the table. Click Targets and then select your database. Click the Administration tab and then click the Tables link under Database Objects.
|
38. | Enter HR for Schema and EMPLOYEES for Object Name and click Go. Compare the data against the other window you previously had open. Notice that the data for the columns you defined a mask for is now changed.
|
You can now export the definition to XML, create another one like it and apply it to another database or use the definition to proceed with the data masking process. Perform the following steps:
1. | Navigate back to the Administration page for your database. Under Data Masking, select Definitions.
|
2. | Select Masking Definition from the search drop-down list and enter Search Criteria that identifies your definition and click Go.
|
3. | Select your masking definition and click Export.
|
4. | Click Export.
|
5. | Save the file and view the contents.
|
In the previous topic, you exported a masking definition to XML. XML definitions are useful for two purposes:
Backup a mask definition to XML | |
Import a pre-defined application mask template into Enterprise Manager without having to manually create the application table-column to mask format mappings. |
In this topic, you import an external definition stored in XML which you could then use to proceed with the data masking process. Perform the following steps:
1. |
Open the HR_MASK.xml file in a text editor and change the following elements: DEFINITION_NAME: <name for your masking definition> (i.e. HR mask in XML)
|
2. | Switch to Enterprise Manager and from the Masking Definition page, click Import.
|
3. | Enter the location of the XML file and click Continue.
|
4. | Select the import masking definition and click View.
|
5. | The definition is displayed. You can view the format. Click the eyeglasses icon next to PHONE_NUMBER.
|
6. | The format for PHONE_NUMBER is displayed.
|
User-defined masks are useful when there are very specific requirements for the application or the business. Perform the following steps:
1. | You want to navigate to the EMPLOYEES table and view the data.
|
2. | Right-click on the Database breadcrumb and select either to open a new window or tab. This allows you to compare the data with the data after it is masked at the end of this topic.
|
3. | Enter your HR user for the Schema and EMAIL_MASK for the Object Name and click Go.
|
4. | The function you use for this topic was already loaded in the prerequisites scripts. Select the Function EMAIL_MASK and click View.
|
5. | This user defined function generates dummy email addresses based on first and last name. It is used to replace actual email addresses with fictitious email addresses that are used to verify that alerts generated by the application reach the user as designed. Click the Database breadcrumb.
|
6. | Under Data Masking, select Definitions.
|
7. | You want to create a new definition that uses the EMAIL_MASK user defined function. Click Mask.
|
8. | Enter a Mask Definition name and under Columns click Add.
|
9. | Enter your HR user for the Schema and EMPLOYEES for the Object Name and click Search.
|
10. | Select the EMAIL column checkbox and click Add.
|
11. | Click the Format icon for the EMAIL column.
|
12. | Select User Defined Function from the Add drop-down list and click Go.
|
13. | Enter <HR user>.EMAIL_MASK for the User Defined Function and click OK.
|
14. | Notice the generated mask format for the EMAIL. Click OK.
|
15. | The definition is complete, click Next.
|
16. | A window appears indicating that a Masking Script is being generated.
|
17. |
Review the warnings and the space check messages and click Next.
|
18. |
Enter a job name and click Next.
|
19. |
Click Submit.
|
20. |
The job was submitted successfully. Click View Job Details.
|
21. |
The job completed successfully. Now you can compare the masked data against what was previously in the table. Click Targets and then select your database. Click the Administration tab and then click the Tables link under Database Objects.
|
22. |
Enter HR for Schema and EMPLOYEES for Object Name and click Go. Compare the data against the other window you previously had open. Notice that the data for the email column has changed.
|
In this tutorial, you learned how to:
Manage the data masking format library | ||
Create data masking definitions for your application schema | ||
Export and import masking definitions | ||
Create and apply a user-defined mask |