As Published In
Oracle Magazine
July/August 2008

TECHNOLOGY: Advanced Compression


Compress to Impress

By Arup Nanda Oracle ACE Director

Use Oracle Advanced Compression to conserve resources and improve performance.

John the DBA at Acme Bank is looking at charts and spreadsheets from the company's capacity planning group. Amid all the fancy-looking graphics and numbers, the message is clear—the bank's database size is going to grow incredibly in the near future. There are many reasons—the natural growth of the business, moving existing applications from desktop systems to the enterprise domain, the constant development of new applications, legal requirements to retain data longer, and so on. The data volume is going to grow exponentially, and the processing capacity (CPU, network, and I/O bandwidth) will also grow, but not at the same pace. This poses major challenges for database operations:

1. Storage costs skyrocket. To contain the costs, management has suggested using cheaper (and slower) disks, which will negatively affect performance.

2. More data also means more backup space is required . To contain these costs, one option is to reduce the number of backups to disk, which is not ideal, because it will increase the restore time.

3. More backed-up data means the backup window will be much larger , affecting—and likely conflicting with—other backups.

4. The increase in data volume also means an increase in I/O through components such as host bus adapter (HBA) cards and network-attached storage (NAS) heads, negatively affecting performance.

5. More data in the database means that a larger system global area (SGA) is needed, and a larger SGA means more memory . If more memory is not available, a smaller proportion of the data will be cached, reducing performance.

To address these challenges, the DBAs at Acme have proposed the usual solutions: adding disk space, memory, network capacity, and I/O bandwidth. But, with the financial belt-tightening, these solutions are not going to get approved by management. Jill, the DBA manager, hopes John can find an alternative solution to sustain the current level of performance, if not improve it, while using the same amount of disk space, memory, and I/O—or even less.

Solution: Compression

Even though the problems are many, the root cause can be summed up easily: exponential growth in data volumes. More data requires more disk space, which, in turn, means higher costs for database and backup storage and more I/O. If there were a way to store the current data volume more efficiently on disk, all the other issues would disappear. The question is how to reduce the space consumption without deleting data.

Oracle Advanced Compression, a new option in Oracle Database 11g Enterprise Edition, is the answer to Acme's data growth problem.

Compression is not new in Oracle Database. Oracle8i Database introduced compression in indexes, by eliminating duplicates from index leaves. Oracle9i Database Release 2 added a new table compression feature that eliminated the storage of repeated values in tables. It supported bulk load operations, and it was very powerful in data warehouse environments. The Oracle Advanced Compression option in Oracle Database 11g Enterprise Edition has richer and far more-powerful features, however, suitable even for transaction-intensive online transaction processing (OLTP) databases.

Compression Steps and Disk Space

The steps to enable compression are very simple. To demonstrate, John creates two tables for account holder information—one compressed (ACCOUNTS_COMP) and one "regular" (ACCOUNTS_REG). John creates the ACCOUNTS_COMP compressed table by using the script shown in Listing 1. Note the "compress for all operations" line in the listing, which creates the table as compressed for all data manipulation language (DML) operations—therefore enabling compression for OLTP tables. (The other compression option is "compress for direct_load operations," which enables compression only for direct path loads, such as INSERT /*+ APPEND */ or SQL*Loader with the direct path option.)

Code Listing 1: Creating the ACCOUNTS table 

create table accounts_comp (
                acc_no                  number                   not null,
                first_name              varchar2(30)             not null,
                last_name               varchar2(30)             not null,
                acc_type                varchar2(15),
                acc_open_dt             date,
                acc_mgr_id              number,
                city                    varchar2(30),
                street_name             varchar2(30)
)
compress for all operations
/


John creates the ACCOUNTS_REG table by using the script in Listing 1, except that he changes the table name to ACCOUNTS_REG and omits the "compress for all operations" clause. Then he uses the ins-acc.sql script to simulate an ACME application and insert records into both tables. 

INS_ACC.SQL Script for loading sample data into ACCOUNTS tables 

/* Instruction: This script will load data into both the 
ACCOUNTS_REG and ACCOUNTS_COMP tables. At runtime, it will 
ask the user for the value of the &reg_or_comp variable. 
Enter REG or COMP to load data into the regular (ACCOUNTS_REG) 
or compressed (ACCOUNTS_COMP) table, respectively. */

begin
   for l_acc_no in 1..1000000 loop
   insert into accounts_&reg_or_comp
   values
   (
      l_acc_no,
      -- First Name
      decode (
         floor(dbms_random.value(1,21)),
         1, 'Alan',
         2, 'Alan',
         3, 'Barbara',
         4, 'Barbara',
         5, 'Charles',
         6, 'David',
         7, 'Ellen',
         8, 'Ellen',
         9, 'Ellen',
         10, 'Frank',
         11, 'Frank',
         12, 'Frank',
         13, 'George',
         14, 'George',
         15, 'George',
         16, 'Hillary',
         17, 'Iris',
         18, 'Iris',
         19, 'Josh',
         20, 'Josh',
             'XXX'
      ),
      -- Last Name
      decode (         
         floor(dbms_random.value(1,5)),
            1,'Smith',            
            dbms_random.string ('A',dbms_random.value(4,30))
      ),  
      -- Account Type
      decode (         
         floor(dbms_random.value (1,5)),
              1,'S',2,'C',3,'M',4,'D','X'
      ),  
      -- Folio ID
      case            
           when dbms_random.value (1,100) < 51 then null
      else             
           l_acc_no + floor(dbms_random.value(1,100))
      end, 
      -- Sub Acc Type
      case            
           when dbms_random.value (1,100) < 76 then null
      else 
          decode (floor(dbms_random.value (1,6)),
             1,'S',2,'C',3,'C',4,'C',5,'C',null)
      end, 
      -- Acc Opening Date 
      sysdate - dbms_random.value(1,500),
      -- Account Manager ID
      decode (         
         floor(dbms_random.value (1,11)),            
            1,1,2,1,3,1,4,1,5,2,6,3,7,4,8,5,9,5,10,5,0
      )
   ); 
   end loop;
   commit;
end;
/


After the inserts, John checks the number of blocks on each table: 

select segment_name, blocks
from user_segments
where segment_name like 'ACCOUNTS_%';

SEGMENT_NAME      BLOCKS
---------         ----
ACCOUNTS_COMP     4096
ACCOUNTS_REG      11264


The space savings in the ACCOUNTS_COMP table are clear. Both tables have the same number of identical records, but the ACCOUNTS_COMP compressed table has only 4,096 blocks, a space savings of about 64 percent compared to the 11,264 blocks in the ACCOUNTS_REG table. Of course, notes John, the space savings will vary, depending on the nature of the data.

John also tests whether the compression ratio is the same for later manipulations of the table. He runs the ins_acc.sql script once again on both the tables and rechecks the number of blocks: 

select segment_name, blocks
from user_segments
where segment_name like 'ACCOUNTS_%';

SEGMENT_NAME      BLOCKS
--------          ----
ACCOUNTS_COMP     8192
ACCOUNTS_REG      22528


The compression ratio is still around 64 percent (1-(8192/22528)).

I/O and Memory

To demonstrate other savings from Oracle Advanced Compression, John creates a test query against the example tables; he also collects the statistics and the optimizer plan, by issuing the SET AUTOT command, and notes the execution time. Before he runs each test query, he flushes the buffer pool so that the table will be read from the database rather than from the SGA.

First he issues the test query against the regular table, as shown in Listing 2; Listing 3 shows the output. Then he issues the test query in Listing 2 against the compressed table, by replacing ACCOUNT_REG with ACCOUNT_COMP in the FROM clause; Listing 4 shows that output.

Code Listing 2: Test query 

set serveroutput on size unlimited
alter system flush buffer_cache
/
col value noprint new_value start_cpu
select value
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('CPU used by this session')
/
col value noprint new_value start_reads
select value
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('session logical reads')
/
set autot on explain stat
set timing on
select acc_mgr_id, acc_type,
                 avg((sysdate-acc_open_dt))
                                from accounts_reg  
group by acc_mgr_id, acc_type
order by acc_mgr_id, acc_type
/
set autot off
select value - &start_cpu cpu_consumed
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('CPU used by this session')
/
select value - &start_reads logical_reads
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('session logical reads')
/
                            


Code Listing 3: Statistics for regular table 

...
Elapsed: 00:00:05.33
...

-------------------------------------------------------------------------------
| Id | Operation           | Name        | Rows  |Bytes |Cost (%CPU)| Time    |
-------------------------------------------------------------------------------
|  0 | SELECT STATEMENT    |             |    15 |  285 | 3095  (3)| 00:00:38 |
|  1 |  SORT GROUP BY      |             |    15 |  285 | 3095  (3)| 00:00:38 |
|  2 |   TABLE ACCESS FULL | ACCOUNTS_REG|  1000K|  18M | 3035  (1)| 00:00:37 |
-------------------------------------------------------------------------------
 
Statistics
--------------------------------
      ...
      11121  consistent gets
      11118  physical reads
      ...
 
CPU_CONSUMED
---------
                354
 
LOGICAL_READS
---------
             11212


Code Listing 4: Statistics for compressed table 

...
Elapsed: 00:00:04.24
...
---------------------------------------------------------------------------------
| Id | Operation          | Name         | Rows  | Bytes| Cost (%CPU | Time     |
---------------------------------------------------------------------------------
|  0 | SELECT STATEMENT   |              |    15 |  285 |  1170  (7) | 00:00:15 |
|  1 |  SORT GROUP BY     |              |    15 |  285 |  1170  (7) | 00:00:15 |
|  2 |   TABLE ACCESS FULL| ACCOUNTS_COMP|  1000K|   18M|  1109  (2) | 00:00:14 |
---------------------------------------------------------------------------------
 
Statistics
--------------------------------
...
       4031  consistent gets
       4025  physical reads
...
 
CPU_CONSUMED
---------
                304

LOGICAL_READS
---------
              4122


The execution time was 5.33 seconds for the regular table but 4.24 seconds for the compressed table—a 20 percent improvement in execution time. The number of logical reads done explains the improvement: 11,212 in the regular table, as opposed to 4,122 for the compressed table. This resulted in some savings in CPU_CONSUMED (354 for the regular table, to 304 for the compressed table), and although this CPU savings is not substantial (in fact, the documentation states that in some cases, the CPU consumption might be higher), John observes that the savings in I/O are large enough to create a drop in the response time. This performance gain will be more pronounced in large, full-table scans, typically found in Acme's reporting and batch-oriented applications.

Because each block in the compressed table now packs more rows, a smaller SGA can handle the same amount of data. Conversely, the existing memory will accommodate more blocks, reducing the probability that a trip to the disk will be necessary, which further improves I/O and subsequently enables better response time.

John looks at his tests and concludes that space savings are not the only advantage of compression. It also improves memory and I/O efficiencies, resulting in better query performance in many cases. For table scans, there can also be some CPU savings from reduced I/O.

Types of Compression

Oracle Advanced Compression offers more than one type of compression, which is important, because Acme needs to use compression for different types of data processing, data formats, and system usage. John looks at the different types of compression available and how they might address Acme's different needs.

OLTP table compression. With OLTP table compression, the compression occurs at the block level. Data is not compressed as it is inserted; rather, an internal algorithm is used to decide when a block has enough uncompressed data and is ready for compression. When a block is ready, the compression algorithm takes a value from a record and creates a reference to that value in a "symbol table" toward the header of the block. Then the algorithm tries to find an identical value in that block, and for each one it finds, it replaces that value with the symbol from the symbol table. This way the data inside the block is compressed anywhere it occurs—not just in specific columns or rows. Each block is independently compressed with its own symbol table, so the compression is usually faster than it is for non-Oracle algorithms, which normally use a global symbol table. This independent compression at the block level is adaptive to new or modified data—blocks can be compressed and recompressed to update the symbol tables as necessary. A global symbol table, however, is much more difficult to update without locking, meaning that those algorithms are not adaptive to changes in the data, which can reduce compression ratios over time.

Oracle Advanced Compression does not occur the moment data is inserted or changed. Rather, data comes in as is (uncompressed) and when a specific internally defined threshold is reached, the compression algorithm kicks in for all data in the block, in batch mode. So all transactions on a block do not experience the CPU overhead of compression—only the transaction that triggers compression experiences a compression-related wait, which is minimal. This batch algorithm keeps the CPU overhead for DML operations very low, thereby making Oracle Advanced Compression suitable for OLTP systems.

Compression of unstructured data. Acme also stores a lot of nonstructured data, such as XML documents, signatures (as GIF files), and statements (as PDF documents). Due to several regulations governing the bank—Sarbanes-Oxley, the Gramm-Leach-Bliley Act, and so on, Acme must preserve this unstructured data in the database, which takes sizable chunks of storage.

Oracle Advanced Compression is not only for relational data. In Oracle Database 11g, other datatypes can also be compressed, sometimes with slight syntax variants. For instance, Oracle SecureFiles (the next-generation large objects [LOBs] introduced in Oracle Database 11g) has two ways to reduce storage: compression and deduplication. (For more information on Oracle SecureFiles compression, John bookmarks the "SecureFiles: The New LOBs" article on Oracle Technology Network [OTN]. He creates an Oracle SecureFiles test example in which the CONTRACTS_SEC table is altered to include a compressed LOB column—ORIG_FILE: 

alter table contracts_sec modify lob 
(orig_file) (compress high)


And if Oracle SecureFiles data repeats in the ORIG_FILE column of the CONTRACTS_SEC table, deduplication will remove the duplicates and store only the pointers to them, eliminating duplicate storage. Thus, if four copies of the same document are stored in the ORIG_FILE column, the file will be stored only once after deduplication, providing a 75 percent reduction in space used.

John confirms the syntax to enable deduplication in the ORIG_FILE column: 

alter table contracts_sec modify lob
(orig_file) (deduplicate)


Backup compression. An Oracle Database instance can be backed up in two ways: by using Oracle Data Pump—the logical backup—and Oracle Recovery Manager (Oracle RMAN)—the physical backup. Oracle Advanced Compression comes in handy in both cases. In Oracle RMAN, a new compression algorithm—ZLIB—compresses the data more efficiently (as shown in the " RMAN " article on OTN). In Oracle Data Pump, the dump file can be compressed.

John tests the syntax to create a compressed dump file from the accounts_reg table, using the compression=all clause: 

expdp <User>/<Pass> directory=<DirName> tables=accounts_reg dumpfile=reg.dmp compression=all 


This creates a 41MB compressed dump file named reg.dmp. John runs the same expdp command without the compression clause, which produces a 90MB file. So, Oracle Advanced Compression compressed the dump file by more than 50 percent, quite a savings on any scale. When John imports the compressed dump file back into the table, the Oracle Data Pump import reads the compressed file and decompresses it inline, during the import.

Network compression. John continues to explore the benefits of compression. Acme uses Oracle Data Guard as its disaster recovery solution. During a network outage or when a standby server is down for planned or unplanned maintenance, the standby database can fall behind the primary. Oracle Data Guard automatically ships the missing redo data for sync up—a process known as "gap resolution." In Oracle Database 11g, you can compress the redo stream for gap resolution with Oracle Advanced Compression by putting a special clause into the archive destination for the standby database. The archive destination is set in the initialization parameter file: 

log_archive_dest_2 = 'service=acmedr 
lgwr async 
valid_for=(online_logfiles,primary_role) 
db_unique_name=acmedr compression=enable'


John notes the compression=enable clause, which causes the redo stream to be compressed during gap resolution, making the synchronization process faster, which, in turn, reduces the risk to Acme's high-availability environment.

Summing It Up

A reduction in the space used in the database directly translates to reduced space requirements for backups, so Acme does not need to buy additional backup disks. Because fewer blocks are being backed up, the backup finishes faster and the backup window allocated now will also be sufficient for the future.

Next Steps



READ more about
Oracle Advanced Compression

Acme also has a staging database as a copy of the production database. The storage requirements for the staging database as well as other database copies such as QA and development drop too. In short, with Oracle Advanced Compression, there is no need for any additional hardware—disks, I/O capacity, memory, tape, or network bandwidth—when Acme's database grows. John gathers the metrics collected and heads for Jill's office to give her the good news.

Conclusion

The last few years have seen enhancements that were mere imaginings earlier, such as 16-core CPUs and 720GB hard drives. The trend is clearly "up"; we will see more processing power and higher disk capacity in the future, but one thing that hasn't changed much—and isn't likely to change—is disk throughput.

This article showed how a forward-thinking John started looking at how to address exponential data growth and ended up addressing both that growth and the limitations of disk throughput by using Oracle Advanced Compression. He has effectively stored more data in less space, accomplishing more data transfer with the same throughput. Not only does it save Acme storage space, but it increases performance as well. 



Arup Nanda
(arup@proligence.com) has been an Oracle DBA for more than 14 years, handling all aspects of database administration. He was Oracle Magazine's DBA of the Year in 2003.

Send us your comments