Oracle Database 11g:
The Top Features for DBAs and Developers
by Arup Nanda Oracle, ACE Director
SecureFiles: The New LOBs
Learn how to use the next generation of LOBs: SecureFiles, which offer the best of both worlds of external files and database LOBs to store unstructured data allowing encryption, compression, deduplication, and more.
Database Resident BLOBS or OS Files
What do you store in an Oracle database? Mostly it's data that you store in a relational format for easy mapping into some type of defined pattern and in a defined datatype: customer names, account balances, status codes, and so on. But it's also increasingly likely that you may need to store information in a non-stuctured or semi-structured form. Examples include pictures, word processing documents, spreadsheets, XML files, and so on. How are these types of data stored?
There are usually two approaches: The data is stored in the database as LOB fields (BLOB for binary and CLOB for character data), or in OS files with the references to the files stored in the database.
Each approach has advantages and challenges. OS files can be cached by the OS and journaled filesystems that expedite recovery after crashes. They also generally consume less space than the data in the database since they can be compressed.
There are also tools that can intelligently identify patterns in the files and remove duplication for a more efficient storage; however, they are external to the database so the properties of the database do not apply to them. These files are not backed up, fine grained security does not apply to them, and such files are not part of a transaction--so concepts so innate to the Oracle database like read consistency do not apply.
What if you could get the best of both worlds? In Oracle Database 11g, you have the answer with SecureFiles, a completely new infrastructure inside the database that gives you the best features of database-resident LOBs and OS files. Let's see how. (By the way, traditional LOBs are still available in the form of BasicFiles.)
Perhaps it's best to introduce the concept of SecureFiles through a simple example. Suppose you are developing a contract management system in which you want to put the copies of all the contracts into a table. The scanned documents are usually PDF files, not text. Some could be MS Word documents or even scanned pictures. This is a perfect use case for BLOBs because the column must be able to support binary data.
Traditionally, priot to Oracle Database 11g, you would have defined the table as follows:
create table contracts_basic ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file blob ) tablespace users lob (orig_file) ( tablespace users enable storage in row chunk 4096 pctversion 20 nocache nologging ); \
The column ORIG_FILE is where the actual file in binary format is stored. The various parameters indicate that the LOB should not be cached and logged during operations, should be stored in line with the table row, and should have a chunk size of 4KB and stored in the tablespace USERS. As you didn't specify it explicitly, the LOB is stored in the conventional format (BasicFiles) under Oracle Database 11g.
If you want to store the LOB as a SecureFile, all you have to do is place a clause—
store as securefile—in the table creation, as shown below:
create table contracts_sec ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file blob ) tablespace users lob (orig_file) store as securefile ( tablespace users enable storage in row chunk 4096 pctversion 20 nocache nologging ) /
To create SecureFile LOBs, you need to comply with two conditions, both of which are default (so you may already be compliant).
- The initialization parameter db_securefile should be set to permitted (the default). I will explain what this parameter does later.
- The tablespace where you are creating the securefile should be Automatic Segment Space Management (ASSM) enabled. In Oracle Database 11g, the default mode of tablespace creation is ASSM so it may already be so for the tablespace. If it's not, then you have to create the SecureFile on a new ASSM tablespace.
After the table is created, you can load data in the same way you do for a regular pre-11g LOB (BasicFile). Your applications do not need to change and you don't need to remember some special syntax.
Here is a small program that loads into this table.
declare l_size number; l_file_ptr bfile; l_blob blob; begin l_file_ptr := bfilename('SECFILE', 'contract.pdf'); dbms_lob.fileopen(l_file_ptr); l_size := dbms_lob.getlength(l_file_ptr); for ctr in 1 .. 100 loop insert into contracts_sec ( contract_id, contract_name, file_size, orig_file ) values ( ctr, 'Contract '||ctr, null, empty_blob() ) returning orig_file into l_blob; dbms_lob.loadfromfile(l_blob, l_file_ptr, l_size); end loop; commit; dbms_lob.close(l_file_ptr); end; /
This loads the file contract.pdf 100 times into 100 rows of the table. You would have already defined a directory object called SECFILE for the OS directory where the file contract.pdf is stored. Here is an example where the file contract.pdf is located in /opt/oracle.
SQL> create directory secfile as ’/opt/oracle’;
Once the LOB is stored as a SecureFile, you have a lot features available to you for optimal operation. Here are some of those very useful features.
Deduplication is likely to be the most popular feature in SecureFiles because it is the most widely sought after benefit of OS files in some high-end filesystems as opposed to database-resident blobs. Suppose a table has five records each with a BLOB. Three of the BLOBs are identical. If it were possible to store the BLOB only once and store only the reference to that copy on other two records, it would reduce the space consumption substantially. This is possible in OS files but would not have been possible in Oracle Database 10g LOBs. But with SecureFiles it's actually trivial via a property called deduplication. You can specify it during the table creation or modify it later as:
SQL> alter table contracts_sec 2 modify lob(orig_file) 3 (deduplicate) 4 / Table altered.
After the deduplication, the database calculates the hash values of the columns values in each row and compares them to the others. If the hash values match, the hash value is stored&m;dashnot the actual BLOB. When a new record is inserted its hash value is calculated, and if it matches to another value then the hash value is inserted; otherwise the real value is stored.
Now, let's see the space savings after the deduplication process. You can examine the space consumption in the LOB segment through the package DBMS_SPACE. Here is a program that displays the space consumption:
declare l_segment_name varchar2(30); l_segment_size_blocks number; l_segment_size_bytes number; l_used_blocks number; l_used_bytes number; l_expired_blocks number; l_expired_bytes number; l_unexpired_blocks number; l_unexpired_bytes number; begin select segment_name into l_segment_name from dba_lobs where table_name = 'CONTRACTS_SEC'; dbms_output.put_line('Segment Name=' || l_segment_name); dbms_space.space_usage( segment_owner => 'ARUP', segment_name => l_segment_name, segment_type => 'LOB', partition_name => NULL, segment_size_blocks => l_segment_size_blocks, segment_size_bytes => l_segment_size_bytes, used_blocks => l_used_blocks, used_bytes => l_used_bytes, expired_blocks => l_expired_blocks, expired_bytes => l_expired_bytes, unexpired_blocks => l_unexpired_blocks, unexpired_bytes => l_unexpired_bytes ); dbms_output.put_line('segment_size_blocks => '|| l_segment_size_blocks); dbms_output.put_line('segment_size_bytes => '|| l_segment_size_bytes); dbms_output.put_line('used_blocks => '|| l_used_blocks); dbms_output.put_line('used_bytes => '|| l_used_bytes); dbms_output.put_line('expired_blocks => '|| l_expired_blocks); dbms_output.put_line('expired_bytes => '|| l_expired_bytes); dbms_output.put_line('unexpired_blocks => '|| l_unexpired_blocks); dbms_output.put_line('unexpired_bytes => '|| l_unexpired_bytes); end; /
This script shows various space related statistics for the LOB. Before the deduplication process, here is the output:
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1072 segment_size_bytes => 8781824 used_blocks => 601 used_bytes => 4923392 expired_blocks => 448 expired_bytes => 3670016 unexpired_blocks => 0 unexpired_bytes => 0
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1456 segment_size_bytes => 11927552 used_blocks => 7 used_bytes => 57344 expired_blocks => 127 expired_bytes => 1040384 unexpired_blocks => 1296 unexpired_bytes => 10616832
Only one metric from the above output is enough to study: used_bytes, which show the exact bytes stored by the LOB column. Before deduplication, it used to consume 4,923,392 bytes or about 5MB but after deduplication it shrank to 57,344 bytes or about 57KB, almost 1 percent of the original value. It happened because the deduplication process found the rows repeated with the same value 100 times (remember, we put the same value in LOB column for all rows) and kept only one row and made the other ones as pointers.
You can also reverse the deduplication process:
SQL> alter table contracts_sec 2 modify lob(orig_file) 3 (keep_duplicates) 4 / Table altered.
After this, if you check the space again:
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1456 segment_size_bytes => 11927552 used_blocks => 601 used_bytes => 4923392 expired_blocks => 0 expired_bytes => 0 unexpired_blocks => 829 unexpired_bytes => 6791168
Note the USED_BYTES went up to the original value of about 5MB.
Another feature of SecureFiles is compression. You can compress the values stored in the LOBs using the following SQL:
SQL> alter table contracts_sec 2 modify lob(orig_file) 3 (compress high) 4 / Table altered.
Now if you run the space finding PL/SQL block:
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1456 segment_size_bytes => 11927552 used_blocks => 201 used_bytes => 1646592 expired_blocks => 0 expired_bytes => 0 unexpired_blocks => 1229 unexpired_bytes => 10067968
Note the used_bytes metric is now 1,646,592 or about 1.5 MB, down from 5MB.
Compression is not the same as deduplication. Compression happens inside a LOB column, inside a row—each LOB column is compressed independently. In deduplication, all the rows are examined and duplicate values in the columns are removed and replaced with pointers. If you have two very different rows, deduplication will not reduce the size; but compression may optimize the space inside the LOB value. You can compress as well as deduplicate the table.
Compression takes up CPU cycles so depending on how much data is compressible, it may not be worthy of compression. For instance, if you have a lot of JPEG pictures they are compressed already, so further compression will not save any space. On the other hand, if you have an XML document as a CLOB, then compression may produce substantial reduction. SecureFiles compression automatically detects if the data is compressible and only spends CPU cycles if compression yields gains.
Oracle Text indexes can be installed on compressed SecureFiles LOBs. This is a major benefit of storing unstructured data inside the Oracle database as opposed to compressed files in a filesystem.
Also note that the LOB compression is independent of table compression. If you compress the table CONTRACTS_SEC, the LOBs are not compressed. The LOB compression occurs only when you issue the above SQL.
In Oracle Database 11g Release 2, there is a third compression option in addition to HIGH and MEDIUM: LOW. As the name suggests, it compresses less but also consumes a lot less CPU and completes faster. This approach uses a block-based lossless compression similar to the fast Lempel–Ziv–Oberhumer (LZO) algorithm.
Let's see an example of a table with the SecureFiles LOW compression:
create table docs ( doc_id number, clearance varchar2(20), doc_body clob ) LOB(doc_body) store as securefile ( compress low ) /
If you omit the LOW clause, the default is MEDIUM. The LOW compression is not just for table creation; you can use it to alter an existing column as well.
Let's see an example with the same table and column. First, we modify the column to uncompressed:
SQL> alter table docs 2 modify LOB(doc_body) 3 ( 4 nocompress 5 ) 6 / Table altered.
Now, let’s modify the column for low compression:
SQL> alter table docs 2 modify LOB(doc_body) 3 ( 4 compress low 5 ) 6 / Table altered.
You can use Transparent Database Encryption for the SecureFiles as you would do for any column. Here is how you will encrypt the column orig_file LOB using AES 128-bit encryption.
alter table contracts_sec modify lob(orig_file) (encrypt using 'AES128') /
Before enabling encryption you have to set up encryption wallet. (A complete description of encryption wallets can be found in this Oracle Magazine article.) Here are the steps in summary:
- 1. Set the parameter in sqlnet.ora, if not set already to specify the location of the wallet:
ENCRYPTION_WALLET_LOCATION= (SOURCE= (METHOD=FILE) (METHOD_DATA= (DIRECTORY= /opt/oracle/orawall) ) )
- 2. Create the wallet:
alter system set encryption key authenticated by "mypass"
- 3. The above steps are needed only once. After the wallet is created and open, it stays open as long as the database is up (unless it is explicitly closed). If the database is restarted, you have to open the wallet with:
alter system set encryption wallet open identified by "mypass"
When a SecureFile LOB column is encrypted, the column values of all the rows of that table are encrypted. After the encryption, you can't use the Conventional Export or Import in the table; you have to use Data Pump.
You can check the view dba_encrypted_columns to see which columns have been encrypted and how.
SQL> select table_name, column_name, encryption_alg 2 from dba_encrypted_columns 3 / TABLE_NAME COLUMN_NAME ENCRYPTION_ALG ------------------------------ ------------------ ----------------------------- CONTRACTS_SEC ORIG_FILE AES 128 bits key
One of the advantages of storing unstructured data in OS files instead of database resident objects is the facility of caching. Files can be cached in the operating system's file buffers. A database resident object can also be cached in the database buffer cache. However in some cases the caching may actually be detrimental to performance. LOBs are usually very large (hence the term large objects) and if they come to the buffer cache, most other data blocks will need to be pushed out of the cache to make room for the incoming LOB. The LOB may never be used later yet its entry into the buffer cache causes necessary blocks to be flushed out. Thus in most cases you may want to disable caching for the LOBs.
In the example script for CONTRACTS_SEC you used the nocache clause to disable caching. To enable caching for the LOB, you can alter the table:
alter table contracts_sec modify lob(orig_file) (cache) /
This enables the LOB caching. Note that the caching refers to the LOB only. The rest of the table is placed into the buffer cache and follow the same logic as any other table regardless of the setting of the LOB caching on that table.
The benefits of caching are very application dependent. In an application manipulating thumbnail images, performance may be improved with caching. However, for larger documents or images, it is better to turn off caching. With securefiles, you have the control.
Logging clause determines how the data changes in the LOB are recorded in the redo log stream. The default is full logging, as in case of any other data, but since the data in LOBs are usually large, you may want to eliminate logging in some cases. The NOLOGING clause used in the example above does exactly that.
SecureFiles offer another value for this clause—
filesystem_like_logging —as shown below:
create table contracts_sec_fs ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file blob ) tablespace users lob (orig_file) store as securefile ( tablespace users enable storage in row chunk 4096 pctversion 20 nocache filesystem_like_logging )
Note the line shown in bold, which makes the metadata of the LOB logged in the redo logs, not the entire LOB. This is similar to a filesystem. The file metadata is logged in the filesystem journals. Similarly this clause on the SecureFiles expedites recovery after a crash.
The data dictionary view DBA_LOBS show the properties of the LOBs in the database including SecureFiles. Here are the columns of the view:
|OWNER||The owner of the table|
|TABLE_NAME||The name of the table|
|COLUMN_NAME||The name of the column that is a LOB|
|SEGMENT_NAME||A LOB is stored as separate segment, named either by the user or by default as SYS_LOB…|
|TABLESPACE_NAME||The name of the tablespace|
|INDEX_NAME||A LOB has an index, whose name comes here|
|CHUNK||The chunksize of the LOB|
|PCTVERSION||Ignored in SecureFiles|
|RETENTION||If a SecureFile LOB is updated, the past image, like any other database block, is kept in undo segments; but unlike database blocks, you can specify how long the past images are kept (retention).|
|FREEPOOLS||Ignored for SecureFiles|
|CACHE||If the SecureFile LOB is cached in the buffer pool or not (Yes/No), explained in this article|
|LOGGING||If the changes to the SecureFile LOB should be logged (Yes/No), explained in this article|
|ENCRYPT||If the SecureFile LOB is encrypted (Yes/No); explained in this article|
|COMPRESSION||If the SecureFile LOB is compressed (Yes/No); explained in this article|
|DEDUPLICATION||If the Securefile LOB is Deduplicated (Yes/No); explained in this article|
|IN_ROW||Whether the LOB is stored inline with the table row|
|FORMAT||Whether a LOB is dependent on the byte order of the platform|
|PARTITIONED||If the LOB is on a partitioned table|
|SECUREFILE||If the LOB is a SECUREFILE (Yes or No) or a BASICFILE|
In case of partitioned tables, the LOB information is stored in the view DBA_LOB_PARTITIONS.
Migration of LOBs to SecureFiles
Now that you have seen how useful SecureFiles are, you may want to convert existing tables. The easiest way to do that is to create a new table, load data from the old one, and rename it. (Of course that requires that the tables be made unavailable during the operation.) The other approach is to use dbms_redefinition package to redefine the table online, without impacting availability.
Let's see how to do that with an example. Suppose you want to migrate the original table CONTRACTS_BASIC to store as SecureFiles. The following steps will accomplish that.
- 1. Make sure you have a primary key. If there isn't one, create it.
alter table contracts_basic add constraint pk_contacts primary key (contract_id) /
- 3. Build the new table.
create table contracts_new ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file BLOB ) lob (orig_file) store as securefile (nocache nologging) /
- 5. Map the columns to the new table.
declare l_col_mapping varchar2(1000); begin l_col_mapping := 'contract_id contract_id , '|| 'contract_name contract_name , '|| 'file_size file_size, '|| 'orig_file orig_file'; dbms_redefinition.start_redef_table ('ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW', l_col_mapping); end; /
- 7. Start the Redefinition Process.
- 8. Finish the redefinition process.
begin dbms_redefinition.finish_redef_table ('ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW'); end; /
- 10. Confirm that the table was converted.
- 12. Drop the interim table CONTRACTS_NEW.
SQL> drop table contracts_new; Table dropped.
declare l_error_count pls_integer := 0; begin dbms_redefinition.copy_table_dependents ( 'ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW', 1, TRUE, TRUE, TRUE, FALSE, l_error_count ); dbms_output.put_line('Errors Occurred := ' || to_char(l_error_count)); end; /
This will copy all the rows from CONTRACTS_BASIC to CONTRACTS_NEW so it may take a long time, depending on the rows of the table.
select securefile from dba_lobs where table_name = 'CONTRACTS_BASIC' / SEC --- YES
The column shows YES, meaning the column was converted to SecureFiles.
You can try to enable parallel DML in the beginning to make the copy process faster. Here is how you enable parallel DML in the session:
alter session force parallel dml;
The initialization parameter db_securefile determines the use of SecureFiles in the database. Here are the various values of the parameter and their effects:
|PERMITTED||The default value. The value indicates that SecureFile LOBs can be created in the database.|
|ALWAYS||Now that you see how the SecureFiles are so useful, you may want to make sure all LOBs from then onward should only be SecureFiles instead of the default BasicFiles, even if the user does not specify securefile. This parameter value ensures all the LOBs are created as SecureFiles by default. Remember, SecureFiles require ASSM tablespaces (which is default in 11g anyway), so if you are trying to create the LOB in a non-ASSM tablespace, you will get an error.|
|NEVER||Well, it’s just the opposite of always. You don’t like SecureFiles for some reason and do not want to allow its creation in the database. This parameter value will make the LOB created as a BasicFile when though a SecureFile clause is mentioned. The user does not get an error when the SecureFile clause is used but the LOB is silently created as BasicFile.|
|IGNORE||The securefile clause, along with all storage clauses is ignored.|
SecureFiles are not merely the next generation of LOBs; they add much more value to them, especially the features that were formerly only in the domain of filesystems. SecureFiles can be encrypted for security, de-duplicated and compressed for more efficient storage, cached (or not) for faster access (or save the buffer pool space), and logged at several levels to reduce the mean time to recover after a crash. With the introduction of SecureFiles, you can store more unstructured documents in the database without incurring too much overhead or losing any critical functionality afforded by an OS filesystem.