As Published In
Oracle Magazine
January/February 2010

TECHNOLOGY: Compression

Compressing Columns

By Arup Nanda Oracle ACE Director

Compress more with Oracle Exadata Hybrid Columnar Compression.

Organizations are storing increasingly large amounts of data in their data warehouses to support their businesses, and many government regulations and industry mandates require this data to be archived rather than deleted. Unchecked, this exponential explosion in active and archived data will lead to skyrocketing storage costs and slowing query performance as a result of increased I/O.

Fortunately, Hybrid Columnar Compression—a key feature in Oracle Exadata Storage Server Software in Oracle Exadata V2—addresses both the data explosion and potential I/O bottlenecks. It minimizes storage requirements and significantly improves query performance, by changing how compressed data is organized. Instead of storing rows of the table together, Hybrid Columnar Compression stores the columns —where the data has similar characteristics —together. The new organization and the similar characteristics of the column data allow a much better compression ratio. This article explains how to enable and manage Hybrid Columnar Compression and examines the storage and performance impact.

Rows Versus Columns

Traditionally, database table rows have been stored in blocks, as shown in Figure 1. Typically, a row is fully contained in a block, with the columns of the row stored next to each other. However, when the row becomes too large to fit into a block, the row overflows into the next block—a phenomenon known as row chaining —but the organization of the columns being stored next to each other still remains the same.


figure 1
Figure 1: Row-centric data block structure

Oracle Database 11g Release 1 introduced online transaction processing (OLTP) compression, and that compression mechanism replaces a value in a row with a much smaller symbol, reducing the length of the row.

In real-world situations, however, data is more often repeated in columns, not rows. For example, here is the data for a simple (abbreviated) table: 

----------       ---------
Albert           Smith
Bernie           Smith
Charles          Smith
David            Smith
John             Smith

... and so on ...

The Smith value repeats many times, so a great deal of compression can be achieved by replacing the Smith value with a much smaller symbol. And because the same symbol can represent all of the repeated Smith values, fewer unique symbols will need to be stored, reducing the size of the compressed data significantly.

With Hybrid Columnar Compression, Oracle Exadata Storage Server in Oracle Exadata V2 creates a column vector for each column, compresses the column vectors, and stores the column vectors in data blocks. The collection of blocks is called a compression unit . The blocks in a compression unit contain all the columns for a set of rows, as shown in Figure 2. (In Hybrid Columnar Compression, a row typically spans several data blocks.)


figure 2
Figure 2: A compression unit in Hybrid Columnar Compression

Types of Hybrid Columnar Compression
Hybrid Columnar Compression comes in two basic flavors: warehouse compression and archive compression.

Warehouse compression. For warehouse compression, the compression algorithm has been optimized for query performance, specifically for scan-oriented queries used heavily in data warehouses. This approach is ideal for tables that will be queried frequently.

Here is how you create a table with warehouse compression: 


Archive compression. With archive compression, the compression algorithm has been optimized for maximum storage savings. This approach is ideal for tables that are infrequently accessed. (Note that for compressing or decompressing data, archive compression may consume a significant amount of CPU compared to warehouse compression.)

Here is how you create a table with archive compression: 


This is all that’s necessary for enabling Hybrid Columnar Compression—no application changes are necessary. Note that within each of the two Hybrid Columnar Compression flavors, you can also set a HIGH or LOW modifier to control the amount of compression.

Comparing compression performance. When a table compressed with Hybrid Columnar Compression is read, the CPU consumption may be higher than for an uncompressed table. However, because the number of blocks returned by a query against a compressed table is significantly lower, the logical reads and consistent gets are lower as well, often resulting in a reduction in both CPU consumption and I/O. So, the overall CPU consumption may actually be lower for queries against tables compressed with Hybrid Columnar Compression.

Now let’s look at the effects of different compression types and settings on performance with an example table. The following syntax creates our compressed tables: 

create table loc_nocomp
select * from locations
create table loc_compqrylow
compress for query low
select * from loc_nocomp
create table loc_compqryhigh
compress for query high
select * from loc_nocomp
create table loc_comparclow
compress for archive low
select * from loc_nocomp
create table loc_comparchigh
compress for archive high
select * from loc_nocomp

We use data from a table called LOCATIONS, available from a sample Oracle E-Business Suite database, and then we create one uncompressed table and four compressed tables from this source table, with different compression clauses. The LOC_COMPQRYLOW, LOC_COMPQRYHIGH, LOC_COMPARCLOW, and LOC_COMPARCHIGH tables are created with the query low, query high, archive low, and archive high compression settings, respectively.

After the tables have been created, we can check the space consumption of the tables by using this query: 

BYTES/1024/1024 MB

---------------    ------
LOC_NOCOMP         33020

From the output, we can calculate the compression ratio (the size of the compressed table as a percentage of the uncompressed one) for each type of compression.

Finally, let’s run a small test to identify the resource consumption of a query against the tables—the uncompressed as well as the four compressed tables. The code is shown in Listing 1. (Please note that the exact output may be different in your case.) Table 1 shows the summary of the performance metrics as well as the compression ratios in each case.

Code Listing 1: Performance test for accessing the compressed tables


set serveroutput on size unlimited
alter system flush buffer_cache
col value noprint new_value start_cpu
select value
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and in ('CPU used by this session')
col value noprint new_value start_reads
select value
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and in ('session logical reads')
set autot on explain stat
set timing on
select city, APPLICATION_ID,avg((sysdate-CREATION_DATE)) 
group by city, APPLICATION_ID 
order by city, APPLICATION_ID
-- Repeat the test by replacing the tablename above with 
-- the other compressed table names
set autot off
select value - &start_cpu cpu_consumed
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and in ('CPU used by this session')
select value - &start_reads logical_reads
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and in ('session logical reads')

Complete query results .


Compression Setting Table Size,
Compared to
Uncompressed Table
Elapsed Query
Time (min:sec)
Consistent Gets Physical Reads CPU Utilization Logical Reads
Uncompressed 100.00% 1:39 2,507,953 2,098,606 14,637 2,512,966
Query low 16.52% 0:18 551,861 350,109 14,936 552,684
Query high 10.15% 0:12 401,043 220,973 14,783 401,666
Archive low 8.26% 0:15 314,644 185,706 15,966 315,292
Archive high 6.34% 0:08 228,167 143,629 15,452 228,630

Table 1: Compression of LOCATIONS table and its effect on query performance

As you can see, with compression set at query low, the table consumes only 17 percent of the storage required by the original table, whereas the table consumes only 6 percent of the original storage when using archive high. As a result of compression, the query against the compressed tables required significantly fewer logical reads, with only modest increases in CPU consumption, and in the case of warehouse compression, this significantly reduces the elapsed time of execution. With the query high compression setting, for example, elapsed query time was 12 percent of the time required to run the same query against the uncompressed table. Because the number of blocks queried was reduced, the logical I/O was reduced significantly—ultimately reducing the elapsed time.

Checking for Compression Candidates
The more aggressive the compression setting, the greater the space savings, but how can you determine the space savings that would result from using a particular compression setting?

You can use the DBMS_COMPRESSION package to estimate the space savings. Listing 2 shows how to use this package to estimate the compression ratio for the LOCATIONS table compressed with query low. From the output, we can see that the compression ratio would be 7.71 times. You can re-execute the package for each of the other compression types by changing the comptype parameter. In this example, the value is DBMS_COMPRESSION.comp_for_query_low, which estimates the compression factor for the “for query low” clause during compression. By changing it to comp_for_query_high, comp_for_archive_low, or comp_for_archive_high, you can estimate the compression factor for the clauses “for query high,” “for archive low,” and “for archive high,” respectively. Listing 3 shows the output for all the Hybrid Columnar Compression options.

Code Listing 2: Estimating the compressed table size reduction



   l_blkcnt_cmp     BINARY_INTEGER;
   l_blkcnt_uncmp   BINARY_INTEGER;
   l_row_cmp         BINARY_INTEGER;
   l_row_uncmp      BINARY_INTEGER;
   l_cmp_ratio       NUMBER;
   l_comptype_str   VARCHAR2 (200);


   DBMS_COMPRESSION.get_compression_ratio (
      scratchtbsname   => 'USERS',
      ownname           => 'ARUP',
      tabname            => 'LOCATIONS',
      partname           => NULL,
      comptype           => DBMS_COMPRESSION.comp_for_query_low,
      blkcnt_cmp        => l_blkcnt_cmp,
      blkcnt_uncmp     => l_blkcnt_uncmp,
      row_cmp            => l_row_cmp,
      row_uncmp        => l_row_uncmp,
      cmp_ratio          => l_cmp_ratio,
      comptype_str      => l_comptype_str

   DBMS_OUTPUT.put_line ('l_blkcnt_cmp=' || l_blkcnt_cmp);
   DBMS_OUTPUT.put_line ('l_blkcnt_uncmp=' || l_blkcnt_uncmp);
   DBMS_OUTPUT.put_line ('l_row_cmp=' || l_row_cmp);
   DBMS_OUTPUT.put_line ('l_row_uncmp=' || l_row_uncmp);
   DBMS_OUTPUT.put_line ('l_cmp_ratio=' || l_cmp_ratio);
   DBMS_OUTPUT.put_line ('l_comptype_str=' || l_comptype_str);


l_comptype_str="Compress For Query Low"

Code Listing 3: Output of DBMS_COMPRESSION for all compression types


l_COMPTYPE_STR="Compress For Query Low"  

l_COMPTYPE_STR="Compress For Query High"  

l_COMPTYPE_STR="Compress For Archive Low"  

l_COMPTYPE_STR="Compress For Archive High"

Existing Tables

Tables under Hybrid Columnar Compression can have data loaded or modified with any type of SQL operation. Data that is direct-path-loaded will be compressed with the Hybrid Columnar Compression format, whereas data loaded or modified with conventional data manipulation language (DML) will be compressed with OLTP table compression. Because there are no DML restrictions and because Oracle Database automatically manages all compression operations, the compression is completely application transparent.

You can also alter an existing, noncompressed table to use Hybrid Columnar Compression. However, when you alter an existing noncompressed table to use Hybrid Columnar Compression, the existing table data will not be immediately compressed. Newly loaded rows, but not the existing data, will be compressed. To compact the existing data, use ALTER TABLE MOVE. (This same method can be used to migrate OLTP-compressed rows to Hybrid Columnar Compression.) Listing 4 shows how to use ALTER TABLE MOVE to compact a table called LOCATIONS.

Code Listing 4: Compressing existing tables


SQL> create table loc_normal nologging
  2    as select * from locations where rownum < 1000001;

-- Let's check for space consumption

SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL';

--------------  --------
1,140,850,688   200

-- Now alter the table to add compression

SQL> alter table loc_normal compress for query low;

-- checking for space again

SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL';

BYTES             EXTENTS
--------------    --------
1,140,850,688     200

-- The space consumption didn't change. Now, move the table to compress the data

SQL> alter table loc_normal move;

SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL';

BYTES            EXTENTS
--------------   --------
167,772,160      91

-- The table is now compressed


If a table to be compressed is partitioned, you may want to apply different degrees of compression to each partition. For instance, you may want to keep the most recent partition uncompressed, the partition for last month compressed for query, and older partitions compressed for archive. You can easily accomplish that, as shown in Listing 5.

Code Listing 5: Separate compression mechanisms for different partitions


    acc_no  number           not null,
    txn_id  number            not null,
    txn_dt  date                not null,
    txn_amt number(15,2)  not null
 partition y09m06 VALUES LESS THAN(TO_DATE('2009-07-01', 'yyyy-mm-dd')) COMPRESS FOR archive high,
partition y09m07 VALUES LESS THAN(TO_DATE('2009-08-01', 'yyyy-mm-dd')) COMPRESS FOR archive low,
partition y09m08 VALUES LESS THAN(TO_DATE('2009-09-01', 'yyyy-mm-dd')) COMPRESS FOR query high,
partition y09m09 VALUES LESS THAN(TO_DATE('2009-10-01', 'yyyy-mm-dd')) COMPRESS FOR query low,
partition y09m10 VALUES LESS THAN(TO_DATE('2009-11-01', 'yyyy-mm-dd')) nocompress


You can check the compression settings in place for tables by using the DBA_TABLES view. Compression setting information for individual partitions and subpartitions is available from the DBA_TAB_PARTITIONS and DBA_TAB_SUBPARTITIONS views, respectively. The noteworthy columns are


  • COMPRESSION , which indicates the compression status of the table or partition. The values are ENABLED and DISABLED. For partitioned tables, NULL is shown as the value for tables; for subpartitioned tables, NONE is shown as the value for partitions. 
  • COMPRESS_FOR , which indicates the compression type. The values are BASIC, OLTP, QUERY LOW, QUERY HIGH, ARCHIVE LOW, and ARCHIVE HIGH.

You can also add and drop columns from hybrid columnar compressed tables. When adding a column, you have to define the column as NOT NULL if you are specifying a default value. When dropping a column, the database marks the column as unused immediately and drops it later. This avoids a potentially long decompression and recompression operation on the table. (Note that unlike OLTP table compression, which allows only tables with 255 or fewer columns, Hybrid Columnar Compression allows any number of columns.)

You can also check for the compression characteristics of a specific row in the table, by executing the GET_COMPRESSION_TYPE function in the DBMS_COMPRESSION package, shown in Listing 6. The output “8” means that the row was compressed with the clause “for query low.” Here are the possible outputs and their meanings:


1    Uncompressed
2    Compressed for OLTP
4    For query high
8    For query low
16  For archive high
32  For archive low

Code Listing 6: Checking for compression for a specific row


SQL> variable ret number

SQL> begin
   2   :ret := dbms_compression.get_compression_type (
   4   end;
   5   /

PL/SQL procedure successfully completed.

SQL> print ret



Next Steps

 LEARN more about
Hybrid Columnar Compression

When a session locks a row in an uncompressed table, it puts an entry in a construct known as the Interested Transaction List, inside the block header. When the table is under Hybrid Columnar Compression, there is no specific block for a given row, so where does Oracle Database put the locking information?

In a hybrid columnar compressed table, when a lock is placed on a row, Oracle Database locks all the rows in that compression unit, not just that row. Be aware of this behavior change and the impact it may have on your applications.


OLTP table compression, introduced in Oracle Database 11g Release 1, allows DML against compressed tables. To accommodate the demands of exponential data growth, Oracle Exadata V2 has introduced a new technology called Hybrid Columnar Compression that changes the fundamental way the data is structured inside a block. Instead of grouping rows, it groups like values in columns to make the compression ratio orders of magnitude higher. This enables databases to store more data for data warehousing and archival purposes without a corresponding size increase.


Arup Nanda Headshot

Arup Nanda
( has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He was Oracle Magazine’s DBA of the Year in 2003.

Send us your comments