11g logo

Oracle Database 11g:
The Top Features for DBAs and Developers

by Arup Nanda Oracle ACE Director

SecureFiles: The New LOBs

Learn how to use the next generation of LOBs: SecureFiles, which offer the best of both worlds of external files and database LOBs to store unstructured data allowing encryption, compression, deduplication, and more.

See Series TOC

Database Resident BLOBS or OS Files

What do you store in an Oracle database? Mostly it's data that you store in a relational format for easy mapping into some type of defined pattern and in a defined datatype: customer names, account balances, status codes, and so on. But it's also increasingly likely that you may need to store information in a non-stuctured or semi-structured form. Examples include pictures, word processing documents, spreadsheets, XML files, and so on. How are these types of data stored?

There are usually two approaches: The data is stored in the database as LOB fields (BLOB for binary and CLOB for character data), or in OS files with the references to the files stored in the database.

Each approach has advantages and challenges. OS files can be cached by the OS and journaled filesystems that expedite recovery after crashes. They also generally consume less space than the data in the database since they can be compressed.

There are also tools that can intelligently identify patterns in the files and remove duplication for a more efficient storage; however, they are external to the database so the properties of the database do not apply to them. These files are not backed up, fine grained security does not apply to them, and such files are not part of a transaction--so concepts so innate to the Oracle database like read consistency do not apply.

What if you could get the best of both worlds? In Oracle Database 11g, you have the answer with SecureFiles, a completely new infrastructure inside the database that gives you the best features of database-resident LOBs and OS files. Let's see how. (By the way, traditional LOBs are still available in the form of BasicFiles.)

Real-life Example

Perhaps it's best to introduce the concept of SecureFiles through a simple example. Suppose you are developing a contract management system in which you want to put the copies of all the contracts into a table. The scanned documents are usually PDF files, not text. Some could be MS Word documents or even scanned pictures. This is a perfect use case for BLOBs because the column must be able to support binary data.

Traditionally, priot to Oracle Database 11g, you would have defined the table as follows:

create table contracts_basic
(
        contract_id     number(12),
        contract_name   varchar2(80),
        file_size       number,
        orig_file       blob
)
tablespace users
lob (orig_file)
(
        tablespace users
        enable storage in row
        chunk 4096
        pctversion 20
        nocache
        nologging
);
\

The column ORIG_FILE is where the actual file in binary format is stored. The various parameters indicate that the LOB should not be cached and logged during operations, should be stored in line with the table row, and should have a chunk size of 4KB and stored in the tablespace USERS. As you didn't specify it explicitly, the LOB is stored in the conventional format (BasicFiles) under Oracle Database 11g.


If you want to store the LOB as a SecureFile, all you have to do is place a clause— store as securefile—in the table creation, as shown below:

create table contracts_sec 
(
        contract_id     number(12),
        contract_name   varchar2(80),
        file_size       number,
        orig_file       blob
)
tablespace users
lob (orig_file)
                               
store as securefile
( tablespace users enable storage in row chunk 4096 pctversion 20 nocache nologging ) /
To create SecureFile LOBs, you need to comply with two conditions, both of which are default (so you may already be compliant).
  • The initialization parameter db_securefile should be set to permitted (the default). I will explain what this parameter does later.
  • The tablespace where you are creating the securefile should be Automatic Segment Space Management (ASSM) enabled. In Oracle Database 11g, the default mode of tablespace creation is ASSM so it may already be so for the tablespace. If it's not, then you have to create the SecureFile on a new ASSM tablespace.

After the table is created, you can load data in the same way you do for a regular pre-11g LOB (BasicFile). Your applications do not need to change and you don't need to remember some special syntax.

Here is a small program that loads into this table.

declare
    l_size      number;  
    l_file_ptr  bfile;   
    l_blob      blob;    
begin
    l_file_ptr := bfilename('SECFILE', 'contract.pdf'); 
    dbms_lob.fileopen(l_file_ptr); 
    l_size := dbms_lob.getlength(l_file_ptr); 
    for ctr in 1 .. 100 loop 
        insert into contracts_sec 
        (        
            contract_id, 
            contract_name, 
            file_size,   
            orig_file    
        )        
        values   
        (        
            ctr,         
            'Contract '||ctr, 
            null,        
            empty_blob() 
        )        
        returning orig_file into l_blob;  
        dbms_lob.loadfromfile(l_blob, l_file_ptr, l_size); 
    end loop; 
    commit; 
    dbms_lob.close(l_file_ptr); 
end;
/

This loads the file contract.pdf 100 times into 100 rows of the table. You would have already defined a directory object called SECFILE for the OS directory where the file contract.pdf is stored. Here is an example where the file contract.pdf is located in /opt/oracle.

SQL> create directory secfile as ’/opt/oracle’;

Once the LOB is stored as a SecureFile, you have a lot features available to you for optimal operation. Here are some of those very useful features.


Deduplication

Deduplication is likely to be the most popular feature in SecureFiles because it is the most widely sought after benefit of OS files in some high-end filesystems as opposed to database-resident blobs. Suppose a table has five records each with a BLOB. Three of the BLOBs are identical. If it were possible to store the BLOB only once and store only the reference to that copy on other two records, it would reduce the space consumption substantially. This is possible in OS files but would not have been possible in Oracle Database 10g LOBs. But with SecureFiles it's actually trivial via a property called deduplication. You can specify it during the table creation or modify it later as:

SQL> alter table contracts_sec
  2  modify lob(orig_file)
  3  (deduplicate)
  4  /
 
Table altered.

After the deduplication, the database calculates the hash values of the columns values in each row and compares them to the others. If the hash values match, the hash value is stored&m;dashnot the actual BLOB. When a new record is inserted its hash value is calculated, and if it matches to another value then the hash value is inserted; otherwise the real value is stored.

Now, let's see the space savings after the deduplication process. You can examine the space consumption in the LOB segment through the package DBMS_SPACE. Here is a program that displays the space consumption:

declare  
    l_segment_name          varchar2(30); 
    l_segment_size_blocks   number; 
    l_segment_size_bytes    number; 
    l_used_blocks           number;  
    l_used_bytes            number;  
    l_expired_blocks        number;  
    l_expired_bytes         number;  
    l_unexpired_blocks      number;  
    l_unexpired_bytes       number;  

begin
    select segment_name 
    into l_segment_name 
    from dba_lobs 
    where table_name = 'CONTRACTS_SEC'; 
        dbms_output.put_line('Segment Name=' || l_segment_name);
 
    dbms_space.space_usage( 
        segment_owner           => 'ARUP',  
        segment_name            => l_segment_name, 
        segment_type            => 'LOB', 
        partition_name          => NULL, 
        segment_size_blocks     => l_segment_size_blocks, 
        segment_size_bytes      => l_segment_size_bytes, 
        used_blocks             => l_used_blocks, 
        used_bytes              => l_used_bytes, 
        expired_blocks          => l_expired_blocks, 
        expired_bytes           => l_expired_bytes, 
        unexpired_blocks        => l_unexpired_blocks, 
        unexpired_bytes         => l_unexpired_bytes 
    );   


    dbms_output.put_line('segment_size_blocks       => '||  l_segment_size_blocks);
    dbms_output.put_line('segment_size_bytes        => '||  l_segment_size_bytes);
    dbms_output.put_line('used_blocks               => '||  l_used_blocks);
    dbms_output.put_line('used_bytes                => '||  l_used_bytes);
    dbms_output.put_line('expired_blocks            => '||  l_expired_blocks);
    dbms_output.put_line('expired_bytes             => '||  l_expired_bytes);
    dbms_output.put_line('unexpired_blocks          => '||  l_unexpired_blocks);
    dbms_output.put_line('unexpired_bytes           => '||  l_unexpired_bytes);
end;
/

This script shows various space related statistics for the LOB. Before the deduplication process, here is the output:

Segment Name=SYS_LOB0000070763C00004$$
segment_size_blocks     => 1072
segment_size_bytes      => 8781824
used_blocks             => 601
used_bytes              => 4923392
expired_blocks          => 448
expired_bytes           => 3670016
unexpired_blocks        => 0
unexpired_bytes         => 0

After deduplication:

Segment Name=SYS_LOB0000070763C00004$$
segment_size_blocks     => 1456
segment_size_bytes      => 11927552
used_blocks             => 7
used_bytes              => 57344
expired_blocks          => 127
expired_bytes           => 1040384
unexpired_blocks        => 1296
unexpired_bytes         => 10616832

Only one metric from the above output is enough to study: used_bytes, which show the exact bytes stored by the LOB column. Before deduplication, it used to consume 4,923,392 bytes or about 5MB but after deduplication it shrank to 57,344 bytes or about 57KB, almost 1 percent of the original value. It happened because the deduplication process found the rows repeated with the same value 100 times (remember, we put the same value in LOB column for all rows) and kept only one row and made the other ones as pointers.


You can also reverse the deduplication process:

SQL> alter table contracts_sec
  2  modify lob(orig_file)
  3  (keep_duplicates)
  4  /
 
Table altered.

After this, if you check the space again:

Segment Name=SYS_LOB0000070763C00004$$
segment_size_blocks     => 1456
segment_size_bytes      => 11927552
used_blocks             => 601
used_bytes              => 4923392
expired_blocks          => 0
expired_bytes           => 0
unexpired_blocks        => 829
unexpired_bytes         => 6791168

Note the USED_BYTES went up to the original value of about 5MB.

Compression

Another feature of SecureFiles is compression. You can compress the values stored in the LOBs using the following SQL:

SQL> alter table contracts_sec
  2   modify lob(orig_file)
  3  (compress high)
  4  /
 
Table altered.

Now if you run the space finding PL/SQL block:

Segment Name=SYS_LOB0000070763C00004$$
segment_size_blocks     => 1456
segment_size_bytes      => 11927552
used_blocks             => 201
used_bytes              => 1646592
expired_blocks          => 0
expired_bytes           => 0
unexpired_blocks        => 1229
unexpired_bytes         => 10067968

Note the used_bytes metric is now 1,646,592 or about 1.5 MB, down from 5MB.


Compression is not the same as deduplication. Compression happens inside a LOB column, inside a row—each LOB column is compressed independently. In deduplication, all the rows are examined and duplicate values in the columns are removed and replaced with pointers. If you have two very different rows, deduplication will not reduce the size; but compression may optimize the space inside the LOB value. You can compress as well as deduplicate the table.

Compression takes up CPU cycles so depending on how much data is compressible, it may not be worthy of compression. For instance, if you have a lot of JPEG pictures they are compressed already, so further compression will not save any space. On the other hand, if you have an XML document as a CLOB, then compression may produce substantial reduction. SecureFiles compression automatically detects if the data is compressible and only spends CPU cycles if compression yields gains.

Oracle Text indexes can be installed on compressed SecureFiles LOBs. This is a major benefit of storing unstructured data inside the Oracle database as opposed to compressed files in a filesystem.

Also note that the LOB compression is independent of table compression. If you compress the table CONTRACTS_SEC, the LOBs are not compressed. The LOB compression occurs only when you issue the above SQL.

In Oracle Database 11g Release 2, there is a third compression option in addition to HIGH and MEDIUM: LOW. As the name suggests, it compresses less but also consumes a lot less CPU and completes faster. This approach uses a block-based lossless compression similar to the fast Lempel–Ziv–Oberhumer (LZO) algorithm.

Let's see an example of a table with the SecureFiles LOW compression:

create table docs
(
        doc_id  number,
        clearance       varchar2(20),
        doc_body        clob
)
LOB(doc_body) store as securefile
(
        compress low
)
/

If you omit the LOW clause, the default is MEDIUM. The LOW compression is not just for table creation; you can use it to alter an existing column as well.


Let's see an example with the same table and column. First, we modify the column to uncompressed:

SQL> alter table docs 
  2  modify LOB(doc_body)
  3  (
  4     nocompress
  5  )
  6  /

Table altered.

Now, let’s modify the column for low compression:

SQL> alter table docs 
  2  modify LOB(doc_body)
  3  (
  4     compress low
  5  )
  6  /

Table altered.


Encryption

You can use Transparent Database Encryption for the SecureFiles as you would do for any column. Here is how you will encrypt the column orig_file LOB using AES 128-bit encryption.

alter table contracts_sec
  modify lob(orig_file)
  (encrypt using 'AES128')
/

Before enabling encryption you have to set up encryption wallet. (A complete description of encryption wallets can be found in this Oracle Magazine article.) Here are the steps in summary:

  1. Set the parameter in sqlnet.ora, if not set already to specify the location of the wallet:
    ENCRYPTION_WALLET_LOCATION= 
       (SOURCE= 
           (METHOD=FILE) 
           (METHOD_DATA= 
              (DIRECTORY= /opt/oracle/orawall)
         )        
    )       
    
    The directory /opt/oracle/orawall should already exist; if not then you should create it.
  2. Create the wallet:
    alter system set encryption key authenticated by "mypass"
    
    This creates the wallet with the password mypass and opens it.
  3. The above steps are needed only once. After the wallet is created and open, it stays open as long as the database is up (unless it is explicitly closed). If the database is restarted, you have to open the wallet with:
    alter system set encryption wallet open identified by "mypass"
    
When a SecureFile LOB column is encrypted, the column values of all the rows of that table are encrypted. After the encryption, you can't use the Conventional Export or Import in the table; you have to use Data Pump.

You can check the view dba_encrypted_columns to see which columns have been encrypted and how.

SQL> select table_name, column_name, encryption_alg
  2  from dba_encrypted_columns
  3  /
 
TABLE_NAME                     COLUMN_NAME        ENCRYPTION_ALG
------------------------------                ------------------          -----------------------------
CONTRACTS_SEC                  ORIG_FILE          AES 128 bits key

Caching

One of the advantages of storing unstructured data in OS files instead of database resident objects is the facility of caching. Files can be cached in the operating system's file buffers. A database resident object can also be cached in the database buffer cache. However in some cases the caching may actually be detrimental to performance. LOBs are usually very large (hence the term large objects) and if they come to the buffer cache, most other data blocks will need to be pushed out of the cache to make room for the incoming LOB. The LOB may never be used later yet its entry into the buffer cache causes necessary blocks to be flushed out. Thus in most cases you may want to disable caching for the LOBs.

In the example script for CONTRACTS_SEC you used the nocache clause to disable caching. To enable caching for the LOB, you can alter the table:

alter table contracts_sec
modify lob(orig_file)
(cache)
/

This enables the LOB caching. Note that the caching refers to the LOB only. The rest of the table is placed into the buffer cache and follow the same logic as any other table regardless of the setting of the LOB caching on that table.


The benefits of caching are very application dependent. In an application manipulating thumbnail images, performance may be improved with caching. However, for larger documents or images, it is better to turn off caching. With securefiles, you have the control.

Logging

Logging clause determines how the data changes in the LOB are recorded in the redo log stream. The default is full logging, as in case of any other data, but since the data in LOBs are usually large, you may want to eliminate logging in some cases. The NOLOGING clause used in the example above does exactly that.

SecureFiles offer another value for this clause— filesystem_like_logging—as shown below:

create table contracts_sec_fs
(
        contract_id     number(12),
        contract_name   varchar2(80),
        file_size       number,
        orig_file       blob
)
tablespace users
lob (orig_file)
store as securefile
(
        tablespace users
        enable storage in row
        chunk 4096
        pctversion 20
        nocache
        
                              
filesystem_like_logging
)
Note the line shown in bold, which makes the metadata of the LOB logged in the redo logs, not the entire LOB. This is similar to a filesystem. The file metadata is logged in the filesystem journals. Similarly this clause on the SecureFiles expedites recovery after a crash.


Administration

The data dictionary view DBA_LOBS show the properties of the LOBs in the database including SecureFiles. Here are the columns of the view:

Column Name Description

OWNER

The owner of the table

TABLE_NAME

The name of the table

COLUMN_NAME

The name of the column that is a LOB

SEGMENT_NAME

A LOB is stored as separate segment, named either by the user or by default as SYS_LOB…

TABLESPACE_NAME

The name of the tablespace

INDEX_NAME

A LOB has an index, whose name comes here

CHUNK

The chunksize of the LOB

PCTVERSION

Ignored in SecureFiles

RETENTION

If a SecureFile LOB is updated, the past image, like any other database block, is kept in undo segments; but unlike database blocks, you can specify how long the past images are kept (retention).

FREEPOOLS

Ignored for SecureFiles

CACHE

If the SecureFile LOB is cached in the buffer pool or not (Yes/No), explained in this article

LOGGING

If the changes to the SecureFile  LOB should be logged (Yes/No), explained in this article

ENCRYPT

If the SecureFile LOB is encrypted (Yes/No); explained in this article

COMPRESSION

If the SecureFile LOB is compressed (Yes/No); explained in this article

DEDUPLICATION

If the Securefile LOB is Deduplicated (Yes/No); explained in this article

IN_ROW

Whether the LOB is stored inline with the table row

FORMAT

Whether a LOB is dependent on the byte order of the platform

PARTITIONED

If the LOB is on a partitioned table

SECUREFILE

If the LOB is a SECUREFILE (Yes or No) or a BASICFILE

In case of partitioned tables, the LOB information is stored in the view DBA_LOB_PARTITIONS.

Migration of LOBs to SecureFiles

Now that you have seen how useful SecureFiles are, you may want to convert existing tables. The easiest way to do that is to create a new table, load data from the old one, and rename it. (Of course that requires that the tables be made unavailable during the operation.) The other approach is to use dbms_redefinition package to redefine the table online, without impacting availability.

Let's see how to do that with an example. Suppose you want to migrate the original table CONTRACTS_BASIC to store as SecureFiles. The following steps will accomplish that.

  1. Make sure you have a primary key. If there isn't one, create it.
    
    alter table contracts_basic
    add constraint pk_contacts
    primary key (contract_id)
    /
    
  2.  
  3. Build the new table.
    
    create table contracts_new
    (
    contract_id     number(12),
    contract_name   varchar2(80),
    file_size       number,
    orig_file       BLOB
    )
    lob (orig_file)
    store as securefile
    (nocache nologging)
    /
    
  4.  
  5. Map the columns to the new table.
    
    declare
      l_col_mapping varchar2(1000);
    begin
      l_col_mapping :=
         'contract_id contract_id , '||
         'contract_name contract_name , '||
         'file_size file_size, '||
         'orig_file orig_file';
         dbms_redefinition.start_redef_table
           ('ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW', l_col_mapping);
    end;
    /
    
  6.  
  7. Start the Redefinition Process.
    
    declare
       l_error_count pls_integer := 0;
    begin
       dbms_redefinition.copy_table_dependents
       (
          'ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW',
          1, TRUE, TRUE, TRUE, FALSE, l_error_count
       );
       dbms_output.put_line('Errors Occurred := ' || 
           to_char(l_error_count));
    end;
    /
    

    This will copy all the rows from CONTRACTS_BASIC to CONTRACTS_NEW so it may take a long time, depending on the rows of the table.
  8. Finish the redefinition process.
    
    begin  
        dbms_redefinition.finish_redef_table
           ('ARUP', 'CONTRACTS_BASIC', 'CONTRACTS_NEW');
    end;
    /
    
  9.  
  10. Confirm that the table was converted.
    
    select securefile
    from dba_lobs
    where table_name = 'CONTRACTS_BASIC'
    /
    
    SEC
    ---
    YES
    

    The column shows YES, meaning the column was converted to SecureFiles.
  11.  
  12. Drop the interim table CONTRACTS_NEW.
    
    SQL> drop table contracts_new;
    
    Table dropped.
    

You can try to enable parallel DML in the beginning to make the copy process faster. Here is how you enable parallel DML in the session:

alter session force parallel dml;


Initialization Parameter

The initialization parameter db_securefile determines the use of SecureFiles in the database. Here are the various values of the parameter and their effects:

Value

Effect

PERMITTED

The default value. The value indicates that SecureFile LOBs can be created in the database.

ALWAYS

Now that you see how the SecureFiles are so useful, you may want to make sure all LOBs from then onward should only be SecureFiles instead of the default BasicFiles, even if the user does not specify securefile. This parameter value ensures all the LOBs are created as SecureFiles by default. Remember, SecureFiles require ASSM tablespaces (which is default in 11g anyway), so if you are trying to create the LOB in a non-ASSM tablespace, you will get an error.

NEVER

Well, it’s just the opposite of always. You don’t like SecureFiles for some reason and do not want to allow its creation in the database. This parameter value will make the LOB created as a BasicFile when though a SecureFile clause is mentioned. The user does not get an error when the SecureFile clause is used but the LOB is silently created as BasicFile.

IGNORE

The securefile clause, along with all storage clauses is ignored.

 

Conclusion

SecureFiles are not merely the next generation of LOBs; they add much more value to them, especially the features that were formerly only in the domain of filesystems. SecureFiles can be encrypted for security, de-duplicated and compressed for more efficient storage, cached (or not) for faster access (or save the buffer pool space), and logged at several levels to reduce the mean time to recover after a crash. With the introduction of SecureFiles, you can store more unstructured documents in the database without incurring too much overhead or losing any critical functionality afforded by an OS filesystem.

Back to Series TOC