Faster, Safer, and Smaller LOBsBy Jonathan Gennick
Oracle SecureFiles provides increased performance, better security, and reduced disk usage.
Many organizations today face the challenge of an explosion in the amount of unstructured, often file-based data that must be managed in conjunction with the more-structured data typically associated with relational databases. Banking applications store check images for cleared checks, healthcare applications store digital images from X-rays and CAT scans, geospatial systems store satellite imagery, enterprise resource planning systems store invoice images, and the list goes on.
Oracle Database 11g makes significant strides in addressing the problem of unstructured data, by introducing a completely new storage infrastructure for data that today is often left in file systems. This new infrastructure is called Oracle SecureFiles, and it is a complete, drop-in replacement of Oracle's previous large-object (LOB) infrastructure (now called BasicFiles). Every facet of the Oracle SecureFiles LOB architecture, from disk format to network protocol to redo and undo algorithms, has been rethought and reimplemented to
Suppose you're migrating digital assets from databases and file systems and developing a more-complete digital asset management system such as one a magazine publisher might use. Your system must manage article drafts and the workflow involved in writing, editing, revising, and publishing that content. This article describes how to set up and configure Oracle SecureFiles and how to migrate sample content for this digital asset management system.
Configuring Your Instance
After installing Oracle Database 11g and before using Oracle SecureFiles, check the setting of the new Oracle Database 11g db_securefile initialization parameter. The default setting is
db_securefile = permitted
This setting makes BasicFiles, not SecureFiles, the default storage approach for any new LOB columns you create while enabling you to explicitly specify SecureFiles when needed. (BasicFiles is also the LOB architecture in previous database releases.) Other settings give you various options for forcing the use of one or the other storage type. For example, you can set db_securefile = force to ensure that all LOBs are created as SecureFiles even when the creating user explicitly specifies otherwise. You should be aware of which db_securefile setting is in effect for your database.
Any tablespaces you plan to use with Oracle SecureFiles need to be configured for automatic segment space management (ASSM). The key here is to specify SEGMENT SPACE MANAGEMENT AUTO when creating your LOB tablespaces. The following creates the ARTICLE_LOBS tablespace for this article:
CREATE TABLESPACE article_lobs DATAFILE 'H:\APP\JONATHAN\ORADATA\ORCL\ARTICLE_LOBS.DBF' SIZE 5M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;
If you are following along with this article's example, be sure to give yourself adequate quota on the ARTICLE_LOBS tablespace. You'll also need to adjust the datafile path to something appropriate for your system.
Setting Up for Encryption
Oracle SecureFiles provides optional transparent data encryption of LOBs. Transparent data encryption protects LOB data from unauthorized users who might somehow gain access to a datafile. This encryption is completely transparent to end users and applications.
If you plan to use encryption with Oracle SecureFiles, you'll need to create a transparent data encryption wallet to hold the encryption key. First, create a directory to contain the wallet. You can place that directory underneath $ORACLE_HOME. For example
Next, add an ENCRYPTION_WALLET_LOCATION setting to your sqlnet.ora file. You'll find sqlnet.ora in $ORACLE_HOME/network/admin. Following is the setting corresponding to the directory just created:
ENCRYPTION_WALLET_LOCATION= (SOURCE=(METHOD=FILE)(METHOD_DATA= (DIRECTORY= H:\app\Jonathan\product\ 11.1.0\db_1\wallet)))
Be certain to include the ENCRYPTION_WALLET_LOCATION setting in sqlnet.ora as one long line. And again, if you are following along with this article's example, you'll need to adjust the wallet directory to something appropriate for your own system.
Finally, set an encryption key:
ALTER SYSTEM SET ENCRYPTION KEY AUTHENTICATED BY "My secret key";
With this setting, your Oracle Database instance will generate an encryption key and place it in the wallet. The wallet is then protected by the password "My secret key."
Creating SecureFile LOBs
Creating an Oracle SecureFiles LOB can be as simple as specifying STORE AS SECUREFILE in the LOB storage clause of a CREATE TABLE or ALTER TABLE statement. (Be sure to assign your SecureFiles LOBs to appropriate, ASSM-enabled tablespaces.) Listing 1 creates the article_draft_redef table, with a single, SecureFiles LOB column named article_content. Data for that LOB is assigned to the ARTICLE_LOBS tablespace (created earlier in this article). Each row in the table contains a magazine article at a specific stage in the writing/editing/revising cycle.
Code Listing 1: Creating an Oracle SecureFiles LOB
CREATE TABLE article_draft_redef ( article_id NUMBER, article_stage VARCHAR(10), article_content BLOB, CONSTRAINT article_stage_check_b CHECK (article_stage IN ( '1st Draft', 'Edit Pass', '2nd Draft', 'Copyedit', 'Review', 'Final')), CONSTRAINT article_draft_pk_b PRIMARY KEY (article_id) ) LOB (article_content) STORE AS SECUREFILE article_content ( TABLESPACE article_lobs RETENTION MIN 3600 KEEP_DUPLICATES NOCOMPRESS DECRYPT CACHE READS);
Listing 1 shows some of the options available when creating a new LOB.RETENTION MIN 3600 ensures that at least one hour (60 minutes x 60 seconds/minute = 3,600) of undo data is retained for the LOB column. This ability to specify a minimum retention time is helpful for running your database in flashback mode. The default is RETENTION AUTO, which retains only enough undo to satisfy consistent read requests while a transaction is in progress. You can also specify a maximum retention in bytes or no retention at all. See the Oracle SQL Reference for details on the syntax for retention.
KEEP_DUPLICATES, NOCOMPRESS, and DECRYPT explicitly disable deduplication, compression, and encryption. I'll come back to these options later in the article. In real life, you would likely enable one or more of them from the beginning.
CACHE READS causes LOB data to be placed into the buffer cache during read operations but not during write operations. You gain improved read performance here at the price of potentially crowding other data out of the buffer cache. NOCACHE specifies the default behavior, which is never to bring a LOB's data into the buffer cache.
I've created the article_draft_redef table in Listing 1 to migrate a table that uses the older BasicFiles to the newer SecureFiles storage method. To follow along with the migration, download the o57securefiles.zip file, unzip that file, and follow the instructions in readme.txt to create the original table named article_draft that I will be migrating. The logical column structure of article_draft is exactly the same as that of article_draft_redef.
The article_draft table contains existing LOB data stored with BasicFiles. I'll assume that that data is in use by running applications and that I want to migrate that data from BasicFiles to SecureFiles. The article_draft_redef table, with its SecureFiles LOB column, represents the destination of the article_draft table data. (Bonus! There is even a real article for you to read in the example data.)
Planning a Migration
Because Oracle SecureFiles represents a completely new way of writing LOB data to the database, the only way to migrate LOB data from BasicFiles to SecureFiles is essentially to rewrite the data by recreating or redefining the tables containing BasicFiles columns. If you can afford to take your data offline, you can simply select data from your old table and insert it into the new one. Begin with a statement such as
INSERT INTO article_draft_redef SELECT article_id, article_stage, article_content FROM article_draft;
Follow this INSERT with a DROP and a RENAME to make the new table take the place of the old one:
DROP TABLE article_draft; RENAME article_draft_redef TO article_draft;
There's really no need to take your data offline, though. Instead, consider using the online table redefinition feature introduced in Oracle9i Database Release 1. Listing 2 shows a PL/SQL block that migrates the article_draft table to the structure indicated by the article_draft_redef table. Online table definition is very easy to do, and it makes the migration transparent to users and applications, because the table being migrated remains available the entire time.
Code Listing 2: Migrating by online redefinition
DECLARE error_counter PLS_INTEGER; BEGIN --Begin the redefinition process DBMS_REDEFINITION.START_REDEF_TABLE ( 'gennick', 'article_draft', 'article_draft_redef', 'article_id, article_stage, article_content'); --Finish the redefinition process DBMS_REDEFINITION.FINISH_REDEF_TABLE ( 'gennick', 'article_draft', 'article_draft_redef'); END;
If your LOB table happens to be partitioned, another approach to consider is partition exchange. You can specify LOB storage on a per- partition basis, giving you the option of migrating one partition at a time from BasicFiles to SecureFiles. Use the URL under "Next Steps" to go to Arup Nanda's article on partitioning features, "Partition Decisions," where you will find an example showing how partition exchange is done.
Be sure you have enough tablespace and disk space set aside to support whatever migration method you choose. For the online redefinition approach, you'll need enough space for two complete copies of your data to coexist. Partition exchange requires only enough disk space to hold two copies of whichever partition you are exchanging. Bear in mind that one downside of partition exchange is that the partition being swapped needs to be briefly taken offline. In the end, you'll have to weigh the different approaches and their trade-offs and choose the approach that works best in your own situation.
You can confirm the SecureFiles status of a given LOB column by checking that column's segment subtype. For example, execute the following to confirm that article_content is now a SecureFiles LOB
SELECT segment_subtype FROM user_segments WHERE segment_name='ARTICLE_CONTENT';
The result you want to see is SECUREFILE, which indicates SecureFiles storage. If you see ASSM, you're still set for BasicFiles storage.
Having migrated to Oracle SecureFiles, you can choose to enable deduplication, an Oracle SecureFiles feature whereby the database server stores only one copy of a given LOB in a given column within the same partition. Two or more users can independently store the same data in a LOB column—storing the same article draft twice, for example—and the database keeps track of that duplication, storing only one copy of the data. Deduplication is transparent. Users each perceive that they have their own copy of deduplicated data, even though that is not really the case.
Code Listing 3: Checking space used by a LOB
DECLARE seg_blocks NUMBER; seg_bytes NUMBER; used_blocks NUMBER; used_bytes NUMBER; expired_blocks NUMBER; expired_bytes NUMBER; unexpired_blocks NUMBER; unexpired_bytes NUMBER; BEGIN DBMS_SPACE.SPACE_USAGE ( 'GENNICK', 'ARTICLE_CONTENT', 'LOB' , seg_blocks, seg_bytes, used_blocks, used_bytes , expired_blocks, expired_bytes, unexpired_blocks, unexpired_bytes); DBMS_OUTPUT.PUT_LINE ('Bytes used = ' || to_char(used_bytes)); END;
In a real-life migration, it would likely make sense to deduplicate during, not after, the migration process. For example purposes, though, it's nice to be able to see that removal of duplicates is actually occurring. Run the code in Listing 3 to see how many bytes are in use by the LOB segment. (Be sure to change the first parameter of the procedure call to a valid schema name on your system.) On my system, the result is
Bytes used = 450560
To deduplicate the data in the article_content column in the article_draft table, issue the following ALTER TABLE statement:
ALTER TABLE article_draft MODIFY LOB(article_content) (DEDUPLICATE LOB);
Existing LOBs in the article_content column will be scanned, and any current duplicates will be detected and eliminated. Any new LOB values written to the column will be checked against existing values to prevent future duplication.
Now, run the code in Listing 3 again and you can see the effects of the deduplication. The number of bytes used by the segment should be reduced. On my system, I see the following, lower usage:
Bytes used = 376832
The number of bytes used in the LOB segment has dropped because document #6 is a duplicate of an article stored in a different row. Because of deduplication, that article is not stored twice. Instead, a pointer is stored to the LOB in that other row. When you read the contents of the LOB for document #6, that pointer is dereferenced for you in a completely transparent manner.
Duplicate detection is based on checksums computed with the SHA1 cryptographic hash algorithm. When you write a new LOB into a LOB segment, a checksum is computed over the first n bytes of that new LOB. If that checksum doesn't match that for any existing LOB in the segment, the new LOB is written to the database. Checksums are saved for each LOB, for future deduplication.
When the checksum for a newly incoming LOB matches an existing checksum, there is the strong possibility that the two LOBs are identical. The database instance will begin doing a byte-by-byte comparison of the incoming LOB data with the possible duplicate that is already stored. That possible duplicate is referred to as the primary LOB. The comparison process reads the primary LOB, but no new data is written, so long as the LOB data continues to match. If the LOBs compare as equal, only a pointer to the primary LOB is written. If the comparison fails, data from the primary LOB up to that point is used to construct the first part of the new LOB, and remaining incoming data for the new LOB is written to disk.
Compression encodes the data in a LOB to reduce the number of bytes required to store that LOB. Compression is transparent and reduces the amount of storage needed for your LOB data, giving you more benefit from each dollar your organization spends on disk storage systems. Compression requires additional CPU cycles, so be sure that compression is worthwhile before enabling it on a LOB column. For example, image data, such as PNG files and JPEG files, is often compressed as part of the encoding scheme used. Further compression of image files may be a waste of CPU cycles. But text-based data such as XML and word-processing documents tends to compress a great deal, making the trade-off of CPU time for more disk space very worthwhile.
To enable compression on the sample SecureFiles LOB column, issue the following ALTER TABLE statement:
ALTER TABLE article_draft MODIFY LOB(article_content) (COMPRESS HIGH);
This statement activates a high level of compression on the column data. You can specify MEDIUM to trade away some of that compression for reduced CPU cycles. There is currently no LOW compression option, but that keyword is reserved for future implementation.
If you rerun the code in Listing 3, you should see that even less space is now used by the LOB segment. I get the following result:
Bytes used = 90112
Remember that compression has multiple benefits. Disk space requirements are reduced. Less data needs to be transferred back and forth, leading to better performance. Redo generation is reduced. Less buffer cache data gets used, leaving more room for other objects to be in the cache.
Finally, we come to transparent data encryption. LOB data that is encrypted is protected even when datafiles or backup files fall into the wrong hands. Encrypted LOB data is protected even while being sent across the network.
ALTER TABLE article_draft MODIFY LOB(article_content) (ENCRYPT USING 'AES256');
Remember that transparent data encryption protects your data from malevolent users who might happen to come into possession of a database datafile or backup piece. You still must manage access to database objects within your database, though, so that only authorized database users are allowed access to query the LOB data you are protecting.
Reap the Benefits!Oracle SecureFiles offers three compelling features you can take advantage of immediately without making any changes to your LOB-using applications:
Preliminary testing also shows significant increases in performance through use of Oracle SecureFiles. Oracle's in-house testing shows a 200 to 900 percent increase in write performance, depending on LOB size and whether existing segment space is being reused. Early-adopter clients have reported 300 to 700 percent increases in performance of key application functions.
Because Oracle SecureFiles is completely backward-compatible with BasicFiles, even to the point of using the same datatype names, you can begin to take advantage of everything Oracle SecureFiles has to offer, almost from the moment you upgrade to Oracle Database 11g. You do need to rewrite your LOB data in the database, but you can do that while keeping the data online, and the change is completely transparent to both users and applications.
Oracle SecureFiles is a gift. Drop it in. Take immediate advantage of improved performance, save money on disk storage, and secure your data from those who would do you harm.
Jonathan Gennick (www.gennick.com) is an experienced Oracle professional and member of the Oak Table Network. He wrote the best-selling SQL Pocket Guide and the Oracle SQL*Plus Pocket Reference, both from O'Reilly Media.
Send us your comments