As Published In
Oracle Magazine
January/February 2010

TECHNOLOGY: Compression


Compressing Columns

By Arup Nanda Oracle ACE Director

Compress more with Oracle Exadata Hybrid Columnar Compression.

Organizations are storing increasingly large amounts of data in their data warehouses to support their businesses, and many government regulations and industry mandates require this data to be archived rather than deleted. Unchecked, this exponential explosion in active and archived data will lead to skyrocketing storage costs and slowing query performance as a result of increased I/O.

Fortunately, Hybrid Columnar Compression—a key feature in Oracle Exadata Storage Server Software in Oracle Exadata V2—addresses both the data explosion and potential I/O bottlenecks. It minimizes storage requirements and significantly improves query performance, by changing how compressed data is organized. Instead of storing rows of the table together, Hybrid Columnar Compression stores the columns —where the data has similar characteristics —together. The new organization and the similar characteristics of the column data allow a much better compression ratio. This article explains how to enable and manage Hybrid Columnar Compression and examines the storage and performance impact.

Rows Versus Columns

Traditionally, database table rows have been stored in blocks, as shown in Figure 1. Typically, a row is fully contained in a block, with the columns of the row stored next to each other. However, when the row becomes too large to fit into a block, the row overflows into the next block—a phenomenon known as row chaining —but the organization of the columns being stored next to each other still remains the same.

 

figure 1
Figure 1: Row-centric data block structure


Oracle Database 11g Release 1 introduced online transaction processing (OLTP) compression, and that compression mechanism replaces a value in a row with a much smaller symbol, reducing the length of the row.

In real-world situations, however, data is more often repeated in columns, not rows. For example, here is the data for a simple (abbreviated) table: 

FIRST_NAME       LAST_NAME
----------       ---------
Albert           Smith
Bernie           Smith
Charles          Smith
David            Smith
John             Smith

... and so on ...


The Smith value repeats many times, so a great deal of compression can be achieved by replacing the Smith value with a much smaller symbol. And because the same symbol can represent all of the repeated Smith values, fewer unique symbols will need to be stored, reducing the size of the compressed data significantly.

With Hybrid Columnar Compression, Oracle Exadata Storage Server in Oracle Exadata V2 creates a column vector for each column, compresses the column vectors, and stores the column vectors in data blocks. The collection of blocks is called a compression unit . The blocks in a compression unit contain all the columns for a set of rows, as shown in Figure 2. (In Hybrid Columnar Compression, a row typically spans several data blocks.)

 

figure 2
Figure 2: A compression unit in Hybrid Columnar Compression


Types of Hybrid Columnar Compression
Hybrid Columnar Compression comes in two basic flavors: warehouse compression and archive compression.

Warehouse compression. For warehouse compression, the compression algorithm has been optimized for query performance, specifically for scan-oriented queries used heavily in data warehouses. This approach is ideal for tables that will be queried frequently.

Here is how you create a table with warehouse compression: 

CREATE TABLE XXX
COMPRESS FOR QUERY
AS
SELECT * FROM YYY;


Archive compression. With archive compression, the compression algorithm has been optimized for maximum storage savings. This approach is ideal for tables that are infrequently accessed. (Note that for compressing or decompressing data, archive compression may consume a significant amount of CPU compared to warehouse compression.)

Here is how you create a table with archive compression: 

CREATE TABLE XXX
COMPRESS FOR ARCHIVE
AS
SELECT * FROM YYY;


This is all that’s necessary for enabling Hybrid Columnar Compression—no application changes are necessary. Note that within each of the two Hybrid Columnar Compression flavors, you can also set a HIGH or LOW modifier to control the amount of compression.

Comparing compression performance. When a table compressed with Hybrid Columnar Compression is read, the CPU consumption may be higher than for an uncompressed table. However, because the number of blocks returned by a query against a compressed table is significantly lower, the logical reads and consistent gets are lower as well, often resulting in a reduction in both CPU consumption and I/O. So, the overall CPU consumption may actually be lower for queries against tables compressed with Hybrid Columnar Compression.

Now let’s look at the effects of different compression types and settings on performance with an example table. The following syntax creates our compressed tables: 

create table loc_nocomp
nologging
as
select * from locations
/
create table loc_compqrylow
nologging
compress for query low
as
select * from loc_nocomp
/
create table loc_compqryhigh
nologging
compress for query high
as
select * from loc_nocomp
/
create table loc_comparclow
nologging
compress for archive low
as
select * from loc_nocomp
/
create table loc_comparchigh
nologging
compress for archive high
as
select * from loc_nocomp
/


We use data from a table called LOCATIONS, available from a sample Oracle E-Business Suite database, and then we create one uncompressed table and four compressed tables from this source table, with different compression clauses. The LOC_COMPQRYLOW, LOC_COMPQRYHIGH, LOC_COMPARCLOW, and LOC_COMPARCHIGH tables are created with the query low, query high, archive low, and archive high compression settings, respectively.

After the tables have been created, we can check the space consumption of the tables by using this query: 

 
SELECT SEGMENT_NAME, 
BYTES/1024/1024 MB
FROM USER_SEGMENTS
WHERE SEGMENT_NAME LIKE 'LOC%';

SEGMENT_NAME       MB
---------------    ------
LOC_NOCOMP         33020
LOC_COMPQRYLOW      5455
LOC_COMPQRYHIGH     3352
LOC_COMPARCLOW      2727
LOC_COMPARCHIGH     2093


From the output, we can calculate the compression ratio (the size of the compressed table as a percentage of the uncompressed one) for each type of compression.

Finally, let’s run a small test to identify the resource consumption of a query against the tables—the uncompressed as well as the four compressed tables. The code is shown in Listing 1. (Please note that the exact output may be different in your case.) Table 1 shows the summary of the performance metrics as well as the compression ratios in each case.

Code Listing 1: Performance test for accessing the compressed tables

 

set serveroutput on size unlimited
alter system flush buffer_cache
/
col value noprint new_value start_cpu
select value
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('CPU used by this session')
/
col value noprint new_value start_reads
select value
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('session logical reads')
/
set autot on explain stat
set timing on
select city, APPLICATION_ID,avg((sysdate-CREATION_DATE)) 
from LOC_COMQRYLOW 
group by city, APPLICATION_ID 
order by city, APPLICATION_ID
 /
--
-- Repeat the test by replacing the tablename above with 
-- the other compressed table names
--
set autot off
select value - &start_cpu cpu_consumed
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('CPU used by this session')
/
select value - &start_reads logical_reads
from v$sesstat s, v$statname n
where sid = (select sid from v$mystat where rownum < 2)
and s.statistic# = n.statistic#
and n.name in ('session logical reads')
/


Complete query results .

 

Compression Setting Table Size,
Compared to
Uncompressed Table
Elapsed Query
Time (min:sec)
Consistent Gets Physical Reads CPU Utilization Logical Reads
Uncompressed 100.00% 1:39 2,507,953 2,098,606 14,637 2,512,966
Query low 16.52% 0:18 551,861 350,109 14,936 552,684
Query high 10.15% 0:12 401,043 220,973 14,783 401,666
Archive low 8.26% 0:15 314,644 185,706 15,966 315,292
Archive high 6.34% 0:08 228,167 143,629 15,452 228,630


Table 1: Compression of LOCATIONS table and its effect on query performance


As you can see, with compression set at query low, the table consumes only 17 percent of the storage required by the original table, whereas the table consumes only 6 percent of the original storage when using archive high. As a result of compression, the query against the compressed tables required significantly fewer logical reads, with only modest increases in CPU consumption, and in the case of warehouse compression, this significantly reduces the elapsed time of execution. With the query high compression setting, for example, elapsed query time was 12 percent of the time required to run the same query against the uncompressed table. Because the number of blocks queried was reduced, the logical I/O was reduced significantly—ultimately reducing the elapsed time.

Checking for Compression Candidates
The more aggressive the compression setting, the greater the space savings, but how can you determine the space savings that would result from using a particular compression setting?

You can use the DBMS_COMPRESSION package to estimate the space savings. Listing 2 shows how to use this package to estimate the compression ratio for the LOCATIONS table compressed with query low. From the output, we can see that the compression ratio would be 7.71 times. You can re-execute the package for each of the other compression types by changing the comptype parameter. In this example, the value is DBMS_COMPRESSION.comp_for_query_low, which estimates the compression factor for the “for query low” clause during compression. By changing it to comp_for_query_high, comp_for_archive_low, or comp_for_archive_high, you can estimate the compression factor for the clauses “for query high,” “for archive low,” and “for archive high,” respectively. Listing 3 shows the output for all the Hybrid Columnar Compression options.

Code Listing 2: Estimating the compressed table size reduction

 

DECLARE

   l_blkcnt_cmp     BINARY_INTEGER;
   l_blkcnt_uncmp   BINARY_INTEGER;
   l_row_cmp         BINARY_INTEGER;
   l_row_uncmp      BINARY_INTEGER;
   l_cmp_ratio       NUMBER;
   l_comptype_str   VARCHAR2 (200);

BEGIN

   DBMS_COMPRESSION.get_compression_ratio (
      scratchtbsname   => 'USERS',
      ownname           => 'ARUP',
      tabname            => 'LOCATIONS',
      partname           => NULL,
      comptype           => DBMS_COMPRESSION.comp_for_query_low,
      blkcnt_cmp        => l_blkcnt_cmp,
      blkcnt_uncmp     => l_blkcnt_uncmp,
      row_cmp            => l_row_cmp,
      row_uncmp        => l_row_uncmp,
      cmp_ratio          => l_cmp_ratio,
      comptype_str      => l_comptype_str
   );

   DBMS_OUTPUT.put_line ('l_blkcnt_cmp=' || l_blkcnt_cmp);
   DBMS_OUTPUT.put_line ('l_blkcnt_uncmp=' || l_blkcnt_uncmp);
   DBMS_OUTPUT.put_line ('l_row_cmp=' || l_row_cmp);
   DBMS_OUTPUT.put_line ('l_row_uncmp=' || l_row_uncmp);
   DBMS_OUTPUT.put_line ('l_cmp_ratio=' || l_cmp_ratio);
   DBMS_OUTPUT.put_line ('l_comptype_str=' || l_comptype_str);

END;

l_blkcnt_cmp=184
l_blkcnt_uncmp=1419 
l_row_cmp=216 
l_row_uncmp=27    
l_cmp_ratio=7.71   
l_comptype_str="Compress For Query Low"


Code Listing 3: Output of DBMS_COMPRESSION for all compression types

 

l_BLKCNT_CMP=178   
l_BLKCNT_UNCMP=2147       
l_ROW_CMP=2501    
l_ROW_UNCMP=207   
l_CMP_RATIO=12.06
l_COMPTYPE_STR="Compress For Query Low"  

l_BLKCNT_CMP=103       
l_BLKCNT_UNCMP=2006   
l_ROW_CMP=4041   
l_ROW_UNCMP=207  
l_CMP_RATIO=19.47 
l_COMPTYPE_STR="Compress For Query High"  

l_BLKCNT_CMP=99   
l_BLKCNT_UNCMP=1992   
l_ROW_CMP=4169   
l_ROW_UNCMP=207  
l_CMP_RATIO=20.12 
l_COMPTYPE_STR="Compress For Archive Low"  

l_BLKCNT_CMP=66   
l_BLKCNT_UNCMP=1607   
l_ROW_CMP=5049   
l_ROW_UNCMP=207   
l_CMP_RATIO=24.34 
l_COMPTYPE_STR="Compress For Archive High"


Existing Tables

Tables under Hybrid Columnar Compression can have data loaded or modified with any type of SQL operation. Data that is direct-path-loaded will be compressed with the Hybrid Columnar Compression format, whereas data loaded or modified with conventional data manipulation language (DML) will be compressed with OLTP table compression. Because there are no DML restrictions and because Oracle Database automatically manages all compression operations, the compression is completely application transparent.

You can also alter an existing, noncompressed table to use Hybrid Columnar Compression. However, when you alter an existing noncompressed table to use Hybrid Columnar Compression, the existing table data will not be immediately compressed. Newly loaded rows, but not the existing data, will be compressed. To compact the existing data, use ALTER TABLE MOVE. (This same method can be used to migrate OLTP-compressed rows to Hybrid Columnar Compression.) Listing 4 shows how to use ALTER TABLE MOVE to compact a table called LOCATIONS.

Code Listing 4: Compressing existing tables

 

SQL> create table loc_normal nologging
  2    as select * from locations where rownum < 1000001;

-- Let's check for space consumption

SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL';

BYTES           EXTENTS
--------------  --------
1,140,850,688   200

-- Now alter the table to add compression

SQL> alter table loc_normal compress for query low;

-- checking for space again

SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL';

BYTES             EXTENTS
--------------    --------
1,140,850,688     200

-- The space consumption didn't change. Now, move the table to compress the data

SQL> alter table loc_normal move;

SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL';

BYTES            EXTENTS
--------------   --------
167,772,160      91

-- The table is now compressed


Partitions

If a table to be compressed is partitioned, you may want to apply different degrees of compression to each partition. For instance, you may want to keep the most recent partition uncompressed, the partition for last month compressed for query, and older partitions compressed for archive. You can easily accomplish that, as shown in Listing 5.

Code Listing 5: Separate compression mechanisms for different partitions

 

CREATE TABLE trans (
    acc_no  number           not null,
    txn_id  number            not null,
    txn_dt  date                not null,
    txn_amt number(15,2)  not null
)
PARTITION BY RANGE (txn_dt)
(
 partition y09m06 VALUES LESS THAN(TO_DATE('2009-07-01', 'yyyy-mm-dd')) COMPRESS FOR archive high,
partition y09m07 VALUES LESS THAN(TO_DATE('2009-08-01', 'yyyy-mm-dd')) COMPRESS FOR archive low,
partition y09m08 VALUES LESS THAN(TO_DATE('2009-09-01', 'yyyy-mm-dd')) COMPRESS FOR query high,
partition y09m09 VALUES LESS THAN(TO_DATE('2009-10-01', 'yyyy-mm-dd')) COMPRESS FOR query low,
partition y09m10 VALUES LESS THAN(TO_DATE('2009-11-01', 'yyyy-mm-dd')) nocompress
)


Administration

You can check the compression settings in place for tables by using the DBA_TABLES view. Compression setting information for individual partitions and subpartitions is available from the DBA_TAB_PARTITIONS and DBA_TAB_SUBPARTITIONS views, respectively. The noteworthy columns are

 

  • COMPRESSION , which indicates the compression status of the table or partition. The values are ENABLED and DISABLED. For partitioned tables, NULL is shown as the value for tables; for subpartitioned tables, NONE is shown as the value for partitions. 
  • COMPRESS_FOR , which indicates the compression type. The values are BASIC, OLTP, QUERY LOW, QUERY HIGH, ARCHIVE LOW, and ARCHIVE HIGH.


You can also add and drop columns from hybrid columnar compressed tables. When adding a column, you have to define the column as NOT NULL if you are specifying a default value. When dropping a column, the database marks the column as unused immediately and drops it later. This avoids a potentially long decompression and recompression operation on the table. (Note that unlike OLTP table compression, which allows only tables with 255 or fewer columns, Hybrid Columnar Compression allows any number of columns.)

You can also check for the compression characteristics of a specific row in the table, by executing the GET_COMPRESSION_TYPE function in the DBMS_COMPRESSION package, shown in Listing 6. The output “8” means that the row was compressed with the clause “for query low.” Here are the possible outputs and their meanings:

 

1    Uncompressed
2    Compressed for OLTP
4    For query high
8    For query low
16  For archive high
32  For archive low


Code Listing 6: Checking for compression for a specific row

 

SQL> variable ret number

SQL> begin
   2   :ret := dbms_compression.get_compression_type (
   3      'ARUP', 'LOC_COMPQRYLOW', 'AAASOgAAEAAAAaLAAA');
   4   end;
   5   /

PL/SQL procedure successfully completed.

SQL> print ret

    RET
--------------
       8


Locking 

Next Steps


 LEARN more about
Hybrid Columnar Compression

When a session locks a row in an uncompressed table, it puts an entry in a construct known as the Interested Transaction List, inside the block header. When the table is under Hybrid Columnar Compression, there is no specific block for a given row, so where does Oracle Database put the locking information?

In a hybrid columnar compressed table, when a lock is placed on a row, Oracle Database locks all the rows in that compression unit, not just that row. Be aware of this behavior change and the impact it may have on your applications.

Conclusion

OLTP table compression, introduced in Oracle Database 11g Release 1, allows DML against compressed tables. To accommodate the demands of exponential data growth, Oracle Exadata V2 has introduced a new technology called Hybrid Columnar Compression that changes the fundamental way the data is structured inside a block. Instead of grouping rows, it groups like values in columns to make the compression ratio orders of magnitude higher. This enables databases to store more data for data warehousing and archival purposes without a corresponding size increase.

 


Arup Nanda Headshot


Arup Nanda
(arup@proligence.com) has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He was Oracle Magazine’s DBA of the Year in 2003.





Send us your comments