Oracle Database 11g: by Arup Nanda
Learn how to use the next generation of LOBs: SecureFiles, which offer the best of both worlds of external files and database LOBs to store unstructured data allowing encryption, compression, deduplication, and more.
What do you store in an Oracle database? Mostly it's data that you store in a relational format for easy mapping into some type of defined pattern and in a defined datatype: customer names, account balances, status codes, and so on. But it's also increasingly likely that you may need to store information in a non-stuctured or semi-structured form. Examples include pictures, word processing documents, spreadsheets, XML files, and so on. How are these types of data stored?
There are usually two approaches: The data is stored in the database as LOB fields (BLOB for binary and CLOB for character data), or in OS files with the references to the files stored in the database.
Each approach has advantages and challenges. OS files can be cached by the OS and journaled filesystems that expedite recovery after crashes. They also generally consume less space than the data in the database since they can be compressed.
There are also tools that can intelligently identify patterns in the files and remove duplication for a more efficient storage; however, they are external to the database so the properties of the database do not apply to them. These files are not backed up, fine grained security does not apply to them, and such files are not part of a transaction--so concepts so innate to the Oracle database like read consistency do not apply.
What if you could get the best of both worlds? In Oracle Database 11g, you have the answer with SecureFiles, a completely new infrastructure inside the database that gives you the best features of database-resident LOBs and OS files. Let's see how. (By the way, traditional LOBs are still available in the form of BasicFiles.)
Perhaps it's best to introduce the concept of SecureFiles through a simple example. Suppose you are developing a contract management system in which you want to put the copies of all the contracts into a table. The scanned documents are usually PDF files, not text. Some could be MS Word documents or even scanned pictures. This is a perfect use case for BLOBs because the column must be able to support binary data.
Traditionally, priot to Oracle Database 11g, you would have defined the table as follows:
create table contracts_basic ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file blob ) tablespace users lob (orig_file) ( tablespace users enable storage in row chunk 4096 pctversion 20 nocache nologging ); \
If you want to store the LOB as a SecureFile, all you have to do is place a clause— store as securefile—in the table creation, as shown below:
create table contracts_sec ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file blob ) tablespace users lob (orig_file)To create SecureFile LOBs, you need to comply with two conditions, both of which are default (so you may already be compliant).
store as securefile ( tablespace users enable storage in row chunk 4096 pctversion 20 nocache nologging ) /
After the table is created, you can load data in the same way you do for a regular pre-11g LOB (BasicFile). Your applications do not need to change and you don't need to remember some special syntax.
Here is a small program that loads into this table.
declare l_size number; l_file_ptr bfile; l_blob blob; begin l_file_ptr := bfilename('SECFILE', 'contract.pdf'); dbms_lob.fileopen(l_file_ptr); l_size := dbms_lob.getlength(l_file_ptr); for ctr in 1 .. 100 loop insert into contracts_sec ( contract_id, contract_name, file_size, orig_file ) values ( ctr, 'Contract '||ctr, null, empty_blob() ) returning orig_file into l_blob; dbms_lob.loadfromfile(l_blob, l_file_ptr, l_size); end loop; commit; dbms_lob.close(l_file_ptr); end; /
SQL> create directory secfile as ’/opt/oracle’;
Deduplication is likely to be the most popular feature in SecureFiles because it is the most widely sought after benefit of OS files in some high-end filesystems as opposed to database-resident blobs. Suppose a table has five records each with a BLOB. Three of the BLOBs are identical. If it were possible to store the BLOB only once and store only the reference to that copy on other two records, it would reduce the space consumption substantially. This is possible in OS files but would not have been possible in Oracle Database 10g LOBs. But with SecureFiles it's actually trivial via a property called deduplication. You can specify it during the table creation or modify it later as:
SQL> alter table contracts_sec 2 modify lob(orig_file) 3 (deduplicate) 4 / Table altered.
Now, let's see the space savings after the deduplication process. You can examine the space consumption in the LOB segment through the package DBMS_SPACE. Here is a program that displays the space consumption:
declare l_segment_name varchar2(30); l_segment_size_blocks number; l_segment_size_bytes number; l_used_blocks number; l_used_bytes number; l_expired_blocks number; l_expired_bytes number; l_unexpired_blocks number; l_unexpired_bytes number; begin select segment_name into l_segment_name from dba_lobs where table_name = 'CONTRACTS_SEC'; dbms_output.put_line('Segment Name=' || l_segment_name); dbms_space.space_usage( segment_owner => 'ARUP', segment_name => l_segment_name, segment_type => 'LOB', partition_name => NULL, segment_size_blocks => l_segment_size_blocks, segment_size_bytes => l_segment_size_bytes, used_blocks => l_used_blocks, used_bytes => l_used_bytes, expired_blocks => l_expired_blocks, expired_bytes => l_expired_bytes, unexpired_blocks => l_unexpired_blocks, unexpired_bytes => l_unexpired_bytes ); dbms_output.put_line('segment_size_blocks => '|| l_segment_size_blocks); dbms_output.put_line('segment_size_bytes => '|| l_segment_size_bytes); dbms_output.put_line('used_blocks => '|| l_used_blocks); dbms_output.put_line('used_bytes => '|| l_used_bytes); dbms_output.put_line('expired_blocks => '|| l_expired_blocks); dbms_output.put_line('expired_bytes => '|| l_expired_bytes); dbms_output.put_line('unexpired_blocks => '|| l_unexpired_blocks); dbms_output.put_line('unexpired_bytes => '|| l_unexpired_bytes); end; /
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1072 segment_size_bytes => 8781824 used_blocks => 601 used_bytes => 4923392 expired_blocks => 448 expired_bytes => 3670016 unexpired_blocks => 0 unexpired_bytes => 0
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1456 segment_size_bytes => 11927552 used_blocks => 7 used_bytes => 57344 expired_blocks => 127 expired_bytes => 1040384 unexpired_blocks => 1296 unexpired_bytes => 10616832
You can also reverse the deduplication process:
SQL> alter table contracts_sec 2 modify lob(orig_file) 3 (keep_duplicates) 4 / Table altered.
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1456 segment_size_bytes => 11927552 used_blocks => 601 used_bytes => 4923392 expired_blocks => 0 expired_bytes => 0 unexpired_blocks => 829 unexpired_bytes => 6791168
Another feature of SecureFiles is compression. You can compress the values stored in the LOBs using the following SQL:
SQL> alter table contracts_sec 2 modify lob(orig_file) 3 (compress high) 4 / Table altered.
Segment Name=SYS_LOB0000070763C00004$$ segment_size_blocks => 1456 segment_size_bytes => 11927552 used_blocks => 201 used_bytes => 1646592 expired_blocks => 0 expired_bytes => 0 unexpired_blocks => 1229 unexpired_bytes => 10067968
Compression is not the same as deduplication. Compression happens inside a LOB column, inside a row—each LOB column is compressed independently. In deduplication, all the rows are examined and duplicate values in the columns are removed and replaced with pointers. If you have two very different rows, deduplication will not reduce the size; but compression may optimize the space inside the LOB value. You can compress as well as deduplicate the table.
Compression takes up CPU cycles so depending on how much data is compressible, it may not be worthy of compression. For instance, if you have a lot of JPEG pictures they are compressed already, so further compression will not save any space. On the other hand, if you have an XML document as a CLOB, then compression may produce substantial reduction. SecureFiles compression automatically detects if the data is compressible and only spends CPU cycles if compression yields gains.
Oracle Text indexes can be installed on compressed SecureFiles LOBs. This is a major benefit of storing unstructured data inside the Oracle database as opposed to compressed files in a filesystem.
Also note that the LOB compression is independent of table compression. If you compress the table CONTRACTS_SEC, the LOBs are not compressed. The LOB compression occurs only when you issue the above SQL.
In Oracle Database 11g Release 2, there is a third compression option in addition to HIGH and MEDIUM: LOW. As the name suggests, it compresses less but also consumes a lot less CPU and completes faster. This approach uses a block-based lossless compression similar to the fast Lempel–Ziv–Oberhumer (LZO) algorithm.
Let's see an example of a table with the SecureFiles LOW compression:
create table docs ( doc_id number, clearance varchar2(20), doc_body clob ) LOB(doc_body) store as securefile ( compress low ) /
Let's see an example with the same table and column. First, we modify the column to uncompressed:
SQL> alter table docs 2 modify LOB(doc_body) 3 ( 4 nocompress 5 ) 6 / Table altered.
SQL> alter table docs 2 modify LOB(doc_body) 3 ( 4 compress low 5 ) 6 / Table altered.
You can use Transparent Database Encryption for the SecureFiles as you would do for any column. Here is how you will encrypt the column orig_file LOB using AES 128-bit encryption.
alter table contracts_sec modify lob(orig_file) (encrypt using 'AES128') /
ENCRYPTION_WALLET_LOCATION= (SOURCE= (METHOD=FILE) (METHOD_DATA= (DIRECTORY= /opt/oracle/orawall) ) )The directory /opt/oracle/orawall should already exist; if not then you should create it.
alter system set encryption key authenticated by "mypass"This creates the wallet with the password mypass and opens it.
alter system set encryption wallet open identified by "mypass"
You can check the view dba_encrypted_columns to see which columns have been encrypted and how.
SQL> select table_name, column_name, encryption_alg 2 from dba_encrypted_columns 3 / TABLE_NAME COLUMN_NAME ENCRYPTION_ALG ------------------------------ ------------------ ----------------------------- CONTRACTS_SEC ORIG_FILE AES 128 bits key
One of the advantages of storing unstructured data in OS files instead of database resident objects is the facility of caching. Files can be cached in the operating system's file buffers. A database resident object can also be cached in the database buffer cache. However in some cases the caching may actually be detrimental to performance. LOBs are usually very large (hence the term large objects) and if they come to the buffer cache, most other data blocks will need to be pushed out of the cache to make room for the incoming LOB. The LOB may never be used later yet its entry into the buffer cache causes necessary blocks to be flushed out. Thus in most cases you may want to disable caching for the LOBs.
In the example script for CONTRACTS_SEC you used the nocache clause to disable caching. To enable caching for the LOB, you can alter the table:
alter table contracts_sec modify lob(orig_file) (cache) /
The benefits of caching are very application dependent. In an application manipulating thumbnail images, performance may be improved with caching. However, for larger documents or images, it is better to turn off caching. With securefiles, you have the control.
Logging clause determines how the data changes in the LOB are recorded in the redo log stream. The default is full logging, as in case of any other data, but since the data in LOBs are usually large, you may want to eliminate logging in some cases. The NOLOGING clause used in the example above does exactly that.
SecureFiles offer another value for this clause— filesystem_like_logging—as shown below:
create table contracts_sec_fs ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file blob ) tablespace users lob (orig_file) store as securefile ( tablespace users enable storage in row chunk 4096 pctversion 20 nocacheNote the line shown in bold, which makes the metadata of the LOB logged in the redo logs, not the entire LOB. This is similar to a filesystem. The file metadata is logged in the filesystem journals. Similarly this clause on the SecureFiles expedites recovery after a crash.
The data dictionary view DBA_LOBS show the properties of the LOBs in the database including SecureFiles. Here are the columns of the view:
The owner of the table
The name of the table
The name of the column that is a LOB
A LOB is stored as separate segment, named either by the user or by default as SYS_LOB…
The name of the tablespace
A LOB has an index, whose name comes here
The chunksize of the LOB
Ignored in SecureFiles
If a SecureFile LOB is updated, the past image, like any other database block, is kept in undo segments; but unlike database blocks, you can specify how long the past images are kept (retention).
Ignored for SecureFiles
If the SecureFile LOB is cached in the buffer pool or not (Yes/No), explained in this article
If the changes to the SecureFile LOB should be logged (Yes/No), explained in this article
If the SecureFile LOB is encrypted (Yes/No); explained in this article
If the SecureFile LOB is compressed (Yes/No); explained in this article
If the Securefile LOB is Deduplicated (Yes/No); explained in this article
Whether the LOB is stored inline with the table row
Whether a LOB is dependent on the byte order of the platform
If the LOB is on a partitioned table
If the LOB is a SECUREFILE (Yes or No) or a BASICFILE
In case of partitioned tables, the LOB information is stored in the view DBA_LOB_PARTITIONS.
Now that you have seen how useful SecureFiles are, you may want to convert existing tables. The easiest way to do that is to create a new table, load data from the old one, and rename it. (Of course that requires that the tables be made unavailable during the operation.) The other approach is to use dbms_redefinition package to redefine the table online, without impacting availability.
Let's see how to do that with an example. Suppose you want to migrate the original table CONTRACTS_BASIC to store as SecureFiles. The following steps will accomplish that.
alter table contracts_basic add constraint pk_contacts primary key (contract_id) /
create table contracts_new ( contract_id number(12), contract_name varchar2(80), file_size number, orig_file BLOB ) lob (orig_file) store as securefile (nocache nologging) /
declare l_col_mapping varchar2(1000); begin l_col_mapping := 'contract_id contract_id , '|| 'contract_name contract_name , '|| 'file_size file_size, '|| 'orig_file orig_file'; dbms_redefinition.start_redef_table ('ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW', l_col_mapping); end; /
declare l_error_count pls_integer := 0; begin dbms_redefinition.copy_table_dependents ( 'ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW', 1, TRUE, TRUE, TRUE, FALSE, l_error_count ); dbms_output.put_line('Errors Occurred := ' || to_char(l_error_count)); end; /
begin dbms_redefinition.finish_redef_table ('ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW'); end; /
select securefile from dba_lobs where table_name = 'CONTRACTS_BASIC' / SEC --- YES
SQL> drop table contracts_new; Table dropped.
alter session force parallel dml;
The initialization parameter db_securefile determines the use of SecureFiles in the database. Here are the various values of the parameter and their effects:
The default value. The value indicates that SecureFile LOBs can be created in the database.
Now that you see how the SecureFiles are so useful, you may want to make sure all LOBs from then onward should only be SecureFiles instead of the default BasicFiles, even if the user does not specify securefile. This parameter value ensures all the LOBs are created as SecureFiles by default. Remember, SecureFiles require ASSM tablespaces (which is default in 11g anyway), so if you are trying to create the LOB in a non-ASSM tablespace, you will get an error.
Well, it’s just the opposite of always. You don’t like SecureFiles for some reason and do not want to allow its creation in the database. This parameter value will make the LOB created as a BasicFile when though a SecureFile clause is mentioned. The user does not get an error when the SecureFile clause is used but the LOB is silently created as BasicFile.
The securefile clause, along with all storage clauses is ignored.
SecureFiles are not merely the next generation of LOBs; they add much more value to them, especially the features that were formerly only in the domain of filesystems. SecureFiles can be encrypted for security, de-duplicated and compressed for more efficient storage, cached (or not) for faster access (or save the buffer pool space), and logged at several levels to reduce the mean time to recover after a crash. With the introduction of SecureFiles, you can store more unstructured documents in the database without incurring too much overhead or losing any critical functionality afforded by an OS filesystem.