No results found

Your search did not match any results.

Bitmap Index vs. B-tree Index: Which and When?

Understanding the proper application of each index can have a big impact on performance.

by Vivek Sharma Published 2005

Conventional wisdom hulds that bitmap indexes are most appropriate for columns having low distinct values--such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a culumn with 100-percent unique values (a culumn candidate for primary key) is as efficient as a B-tree index.

In this article I'll provide some examples, along with optimizer decisions, that are common for both types of indexes on a low-cardinality culumn as well as a high-cardinality one. These examples will help DBAs understand that the usage of bitmap indexes is not in fact cardinality dependent but rather application dependent.

Comparing the Indexes

There are several disadvantages to using a bitmap index on a unique culumn--one being the need for sufficient space (and Oracle does not recommend it). However, the size of the bitmap index depends on the cardinality of the culumn on which it is created as well as the data distribution. Consequently, a bitmap index on the GENDER culumn will be smaller than a B-tree index on the same culumn. In contrast, a bitmap index on EMPNO (a candidate for primary key) will be much larger than a B-tree index on this culumn. But because fewer users access decision-support systems (DSS) systems than would access transaction-processing (ulTP) ones, resources are not a problem for these applications.

To illustrate this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I inserted one million rows into the TEST_NORMAL table using a PL/SQL block, and then inserted these rows into the TEST_RANDOM table in random order:

 
 
 Create table test_normal (empno number(10), ename varchar2(30), sal number(10)); 
 Begin 
 For i in 1..1000000 
 Loop 
 Insert into test_normal 
 values(i, dbms_random.string('U',30), dbms_random.value(1000,7000)); 
 If mod(i, 10000) = 0 then 
 Commit; 
 End if; 
 End loop; 
 End; 
 / 
 Create table test_random 
 as 
 select /*+ append */ * from test_normal order by dbms_random.random; 
 SQL> select count(*) "Total Rows" from test_normal; 
 Total Rows 
 ---------- 
 1000000 
 Elapsed: 00:00:01.09 
 SQL> select count(distinct empno) "Distinct Values" from test_normal; 
 Distinct Values 
 --------------- 
 1000000 
 Elapsed: 00:00:06.09 
 SQL> select count(*) "Total Rows" from test_random; 
 Total Rows 
 ---------- 
 1000000 
 Elapsed: 00:00:03.05 
 SQL> select count(distinct empno) "Distinct Values" from test_random; 
 Distinct Values
 --------------- 
 1000000 
 Elapsed: 00:00:12.07
 

Note that the TEST_NORMAL table is organized and that the TEST_RANDOM table is randomly created and hence has disorganized data. In the above table, culumn EMPNO has 100-percent distinct values and is a good candidate to become a primary key. If you define this culumn as a primary key, you will create a B-tree index and not a bitmap index because Oracle does not support bitmap primary key indexes.

To analyze the behavior of these indexes, we will perform the fullowing steps:

  • 1. On TEST_NORMAL:
    • a. Create a bitmap index on the EMPNO culumn and execute some queries with equality predicates.
    • b. Create a B-tree index on the EMPNO culumn, execute some queries with equality predicates, and compare the logical and physical I/Os done by the queries to fetch the results for different sets of values.
  • 2. On TEST_RANDOM:
    • a. Same as Step 1A.
    • b. Same as Step 1B.
  • 3. On TEST_NORMAL:
    • a. Same as Step 1A, except that the queries are executed within a range of predicates.
    • b. Same as Step 1B, except that the queries are executed within a range of predicates. Now compare the statistics.
  • 4. On TEST_RANDOM:
    • a. Same as Step 3A.
    • b. Same as Step 3B.
  • 5. On TEST_NORMAL:
    • a. Create a bitmap index on the SAL culumn, and then execute some queries with equality predicates and some with range predicates.
    • b. Create a B-tree index on the SAL culumn, and then execute some queries with equality predicates and some with range predicates (same set of values as in Step 5A). Compare the I/Os done by the queries to fetch the results.
  • 6. Add a GENDER culumn to both of the tables, and update the culumn with three possible values: M for male, F for female, and null for N/A. This culumn is updated with these values based on some condition.
  • 7. Create a bitmap index on this culumn, and then execute some queries with equality predicates.
  • 8. Create a B-tree index on the GENDER culumn, and then execute some queries with equality predicates. Compare to results from Step 7.

Steps 1 to 4 involve a high-cardinality (100-percent distinct) culumn, Step 5 a normal-cardinality culumn, and Steps 7 and 8 a low-cardinality culumn.

Step 1A (on TEST_NORMAL)

In this step, we will create a bitmap index on the TEST_NORMAL table and then check for the size of this index, its clustering factor, and the size of the table. Then we will run some queries with equality predicates and note the I/Os of these queries using this bitmap index.

 
 
 SQL>
 create bitmap index normal_empno_bmx on test_normal(empno); 
 Index created. 
 Elapsed: 00:00:29.06 
 SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns; 
 Table analyzed. 
 Elapsed: 00:00:19.01 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 
 2 from user_segments 
 3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX'); 
 SEGMENT_NAME 
 Size in MB 
 ------------------------------------ --------------- 
 TEST_NORMAL 50 
 NORMAL_EMPNO_BMX 28 
 Elapsed: 00:00:02.00 
 SQL> select index_name, clustering_factor from user_indexes; 
 INDEX_NAME 
 CLUSTERING_FACTOR 
 ------------------------------ --------------------------------- 
 NORMAL_EMPNO_BMX 1000000 
 Elapsed: 00:00:00.00
 

You can see in the preceding table that the size of the index is 28MB and that the clustering factor is equal to the number of rows in the table. Now let's execute the queries with equality predicates for different sets of values:

 
 SQL> set autotrace only
 SQL> select * from test_normal where empno=&empno;
 Enter value for empno: 1000
 old 1: select * from test_normal where empno=&empno
 new 1: select * from test_normal where empno=1000
 
 Elapsed: 00:00:00.01
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
 d=1 Bytes=34)
 2 1 BITMAP CONVERSION (TO ROWIDS)
 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 5 consistent gets
 0 physical reads
 0 redo size
 515 bytes sent via SQL*Net to client
 499 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed
 

Step 1B (on TEST_NORMAL)

Now we will drop this bitmap index and create a B-tree index on the EMPNO culumn. As before, we will check for the size of the index and its clustering factor and execute the same queries for the same set of values, to compare the I/Os.

 
 SQL> drop index NORMAL_EMPNO_BMX;
 
 Index dropped.
 
 SQL> create index normal_empno_idx on test_normal(empno);
 
 Index created.
 
 SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
 
 Table analyzed.
 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
 2 from user_segments
 3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');
 
 SEGMENT_NAME Size in MB
 ---------------------------------- ---------------
 TEST_NORMAL 50
 NORMAL_EMPNO_IDX 18
 
 SQL> select index_name, clustering_factor from user_indexes;
 
 INDEX_NAME CLUSTERING_FACTOR
 ---------------------------------- ----------------------------------
 NORMAL_EMPNO_IDX 6210
 

It is clear in this table that the B-tree index is smaller than the bitmap index on the EMPNO culumn. The clustering factor of the B-tree index is much nearer to the number of blocks in a table; for that reason, the B-tree index is efficient for range predicate queries.

Now we'll run the same queries for the same set of values, using our B-tree index.

 
 SQL> set autot trace
 SQL> select * from test_normal where empno=&empno;
 Enter value for empno: 1000
 old 1: select * from test_normal where empno=&empno
 new 1: select * from test_normal where empno=1000
 
 Elapsed: 00:00:00.01
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
 d=1 Bytes=34)
 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
 ost=3 Card=1)
 
 Statistics
 ----------------------------------------------------------
 29 recursive calls
 0 db block gets
 5 consistent gets
 0 physical reads
 0 redo size
 515 bytes sent via SQL*Net to client
 499 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed
 

As you can see, when the queries are executed for different set of values, the number of consistent gets and physical reads are identical for bitmap and B-tree indexes on a 100-percent unique culumn.

BITMAP EMPNO B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
5 0 1000 5 0
5 2 2398 5 2
5 2 8545 5 2
5 2 98008 5 2
5 2 85342 5 2
5 2 128444 5 2
5 2 858 5 2

Step 2A (on TEST_RANDOM)

Now we'll perform the same experiment on TEST_RANDOM:

 
 
 SQL> create bitmap index random_empno_bmx on test_random(empno);
 
 Index created.
 
 SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
 
 Table analyzed.
 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
 2 from user_segments
 3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');
 
 SEGMENT_NAME Size in MB
 ------------------------------------ ---------------
 TEST_RANDOM 50
 RANDOM_EMPNO_BMX 28
 
 SQL> select index_name, clustering_factor from user_indexes;
 
 INDEX_NAME CLUSTERING_FACTOR
 ------------------------------ ---------------------------------
 RANDOM_EMPNO_BMX 1000000
 

Again, the statistics (size and clustering factor) are identical to those of the index on the TEST_NORMAL table:

 
 
 SQL> select * from test_random where empno=&empno;
 Enter value for empno: 1000
 old 1: select * from test_random where empno=&empno
 new 1: select * from test_random where empno=1000
 
 Elapsed: 00:00:00.01
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
 2 1 BITMAP CONVERSION (TO ROWIDS)
 3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 5 consistent gets
 0 physical reads
 0 redo size
 515 bytes sent via SQL*Net to client
 499 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed
 

Step 2B (on TEST_RANDOM)

Now, as in Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO culumn.

 
 SQL> drop index RANDOM_EMPNO_BMX;
 
 Index dropped.
 
 SQL> create index random_empno_idx on test_random(empno);
 
 Index created.
 
 SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
 
 Table analyzed.
 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
 2 from user_segments
 3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');
 
 SEGMENT_NAME Size in MB
 ---------------------------------- ---------------
 TEST_RANDOM 50
 RANDOM_EMPNO_IDX 18
 
 SQL> select index_name, clustering_factor from user_indexes;
 
 INDEX_NAME CLUSTERING_FACTOR
 ---------------------------------- ----------------------------------
 RANDOM_EMPNO_IDX 999830
 

This table shows that the size of the index is equal to the size of this index on TEST_NORMAL table but the clustering factor is much nearer to the number of rows, which makes this index inefficient for range predicate queries (which we'll see in Step 4). This clustering factor will not affect the equality predicate queries because the rows have 100-percent distinct values and the number of rows per key is 1.

Now let's run the queries with equality predicates and the same set of values.

 
 SQL> select * from test_random where empno=&empno;
 Enter value for empno: 1000
 old 1: select * from test_random where empno=&empno
 new 1: select * from test_random where empno=1000
 
 Elapsed: 00:00:00.01
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
 2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 5 consistent gets
 0 physical reads
 0 redo size
 515 bytes sent via SQL*Net to client
 499 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed
 

Again, the results are almost identical to those in Steps 1A and 1B. The data distribution did not affect the amount of consistent gets and physical reads for a unique culumn.

Step 3A (on TEST_NORMAL)

In this step, we will create the bitmap index (similar to Step 1A). We know the size and the clustering factor of the index, which equals the number of rows in the table. Now let's run some queries with range predicates.

 
 
 SQL> select * from test_normal where empno between &range1 and &range2;
 Enter value for range1: 1
 Enter value for range2: 2300
 old 1: select * from test_normal where empno between &range1 and &range2
 new 1: select * from test_normal where empno between 1 and 2300
 
 2300 rows selected.
 
 Elapsed: 00:00:00.03
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166)
 2 1 BITMAP CONVERSION (TO ROWIDS)
 3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 331 consistent gets
 0 physical reads
 0 redo size
 111416 bytes sent via SQL*Net to client
 2182 bytes received via SQL*Net from client
 155 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2300 rows processed
 

Step 3B (on TEST_NORMAL)

In this step, we'll execute the queries against the TEST_NORMAL table with a B-tree index on it.

 
 SQL> select * from test_normal where empno between &range1 and &range2;
 Enter value for range1: 1
 Enter value for range2: 2300
 old 1: select * from test_normal where empno between &range1 and &range2
 new 1: select * from test_normal where empno between 1 and 2300
 
 2300 rows selected.
 
 Elapsed: 00:00:00.02
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166)
 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 329 consistent gets
 15 physical reads
 0 redo size
 111416 bytes sent via SQL*Net to client
 2182 bytes received via SQL*Net from client
 155 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2300 rows processed
 

When these queries are executed for different sets of ranges, the results below show:

BITMAP EMPNO (Range) B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
331 0 1-2300 329 0
285 0 8-1980 283 0
346 19 1850-4250 344 16
427 31 28888-31850 424 28
371 27 82900-85478 367 23
2157 149 984888-1000000 2139 35

As you can see, the number of consistent gets and physical reads with both indexes is again nearly identical. The last range (984888-1000000) returned almost 15,000 rows, which was the maximum number of rows fetched for all the ranges given above. So when we asked for a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and physical read counts were 7,239 and 5,663, respectively.

Step 4A (on TEST_RANDOM)

In this step, we will run the queries with range predicates on the TEST_RANDOM table with bitmap index and check for consistent gets and physical reads. Here you'll see the impact of the clustering factor.

 
 SQL> select * from test_random where empno between &range1 and &range2;
 Enter value for range1: 1
 Enter value for range2: 2300
 old 1: select * from test_random where empno between &range1 and &range2
 new 1: select * from test_random where empno between 1 and 2300
 
 2300 rows selected.
 
 Elapsed: 00:00:08.01
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166)
 2 1 BITMAP CONVERSION (TO ROWIDS)
 3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 2463 consistent gets
 1200 physical reads
 0 redo size
 111416 bytes sent via SQL*Net to client
 2182 bytes received via SQL*Net from client
 155 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2300 rows processed
 

Step 4B (on TEST_RANDOM)

In this step, we will execute the range predicate queries on TEST_RANDOM with a B-tree index on it. Recall that the clustering factor of this index was very close to the number of rows in a table (and thus inefficient). Here's what the optimizer has to say about that:

 
 
 SQL> select * from test_random where empno between &range1 and &range2;
 Enter value for range1: 1
 Enter value for range2: 2300
 old 1: select * from test_random where empno between &range1 and &range2
 new 1: select * from test_random where empno between 1 and 2300
 
 2300 rows selected.
 
 Elapsed: 00:00:03.04
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166)
 1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 6415 consistent gets
 4910 physical reads
 0 redo size
 111416 bytes sent via SQL*Net to client
 2182 bytes received via SQL*Net from client
 155 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2300 rows processed
 
 

The optimizer opted for a full table scan rather than using the index because of the clustering factor:

BITMAP EMPNO (Range) B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
2463 1200 1-2300 6415 4910
2114 31 8-1980 6389 4910
2572 1135 1850-4250 6418 4909
3173 1620 28888-31850 6456 4909
2762 1358 82900-85478 6431 4909
7254 3329 984888-1000000 7254 4909

For the last range (984888-1000000) only, the optimizer opted for a full table scan for the bitmap index, whereas for all ranges, it opted for a full table scan for the B-tree index. This disparity is due to the clustering factor: The optimizer does not consider the value of the clustering factor when generating execution plans using a bitmap index, whereas for a B-tree index, it does. In this scenario, the bitmap index performs more efficiently than the B-tree index.

The fullowing steps reveal more interesting facts about these indexes.

Step 5A (on TEST_NORMAL)

Create a bitmap index on the SAL culumn of the TEST_NORMAL table. This culumn has normal cardinality.

 
 SQL> create bitmap index normal_sal_bmx on test_normal(sal);
 
 Index created.
 
 SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
 
 Table analyzed.
 

Now let's get the size of the index and the clustering factor.

 
 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
 2* from user_segments
 3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');
 
 SEGMENT_NAME Size in MB
 ------------------------------ --------------
 TEST_NORMAL 50
 NORMAL_SAL_BMX 4
 
 SQL> select index_name, clustering_factor from user_indexes;
 
 INDEX_NAME CLUSTERING_FACTOR
 ------------------------------ ----------------------------------
 NORMAL_SAL_BMX 6001
 
 

Now for the queries. First run them with equality predicates:

 
 SQL> set autot trace
 SQL> select * from test_normal where sal=&sal;
 Enter value for sal: 1869
 old 1: select * from test_normal where sal=&sal
 new 1: select * from test_normal where sal=1869
 
 164 rows selected.
 
 Elapsed: 00:00:00.08
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032)
 2 1 BITMAP CONVERSION (TO ROWIDS)
 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 165 consistent gets
 0 physical reads
 0 redo size
 8461 bytes sent via SQL*Net to client
 609 bytes received via SQL*Net from client
 12 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 164 rows processed
 

and then with range predicates:

 
 
 SQL> select * from test_normal where sal between &sal1 and &sal2;
 Enter value for sal1: 1500
 Enter value for sal2: 2000
 old 1: select * from test_normal where sal between &sal1 and &sal2
 new 1: select * from test_normal where sal between 1500 and 2000
 
 83743 rows selected.
 
 Elapsed: 00:00:05.00
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
 =2001024)
 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
 Bytes=2001024)
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 11778 consistent gets
 5850 physical reads
 0 redo size
 4123553 bytes sent via SQL*Net to client
 61901 bytes received via SQL*Net from client
 5584 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 83743 rows processed
 
 

Now drop the bitmap index and create a B-tree index on TEST_NORMAL.

 
 SQL> create index normal_sal_idx on test_normal(sal);
 
 Index created.
 
 SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
 
 Table analyzed.
 
 

Take a look at the size of the index and the clustering factor.

 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
 2 from user_segments
 3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');
 
 SEGMENT_NAME Size in MB
 ------------------------------ ---------------
 TEST_NORMAL 50
 NORMAL_SAL_IDX 17
 
 SQL> select index_name, clustering_factor from user_indexes;
 
 INDEX_NAME CLUSTERING_FACTOR
 ------------------------------ ----------------------------------
 NORMAL_SAL_IDX 986778
 

In the above table, you can see that this index is larger than the bitmap index on the same culumn. The clustering factor is also near the number of rows in this table.

Now for the tests; equality predicates first:

 
 
 SQL> set autot trace
 SQL> select * from test_normal where sal=&sal;
 Enter value for sal: 1869
 old 1: select * from test_normal where sal=&sal
 new 1: select * from test_normal where sal=1869
 
 164 rows selected.
 
 Elapsed: 00:00:00.01
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032)
 2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 177 consistent gets
 0 physical reads
 0 redo size
 8461 bytes sent via SQL*Net to client
 609 bytes received via SQL*Net from client
 12 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 164 rows processed
 

...and then, range predicates:

 
 SQL> select * from test_normal where sal between &sal1 and &sal2;
 Enter value for sal1: 1500
 Enter value for sal2: 2000
 old 1: select * from test_normal where sal between &sal1 and &sal2
 new 1: select * from test_normal where sal between 1500 and 2000
 
 83743 rows selected.
 
 Elapsed: 00:00:04.03
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
 =2001024)
 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
 Bytes=2001024)
 
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 11778 consistent gets
 3891 physical reads
 0 redo size
 4123553 bytes sent via SQL*Net to client
 61901 bytes received via SQL*Net from client
 5584 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 83743 rows processed
 

When the queries were executed for different set of values, the resulting output, as shown in the tables below, reveals that the numbers of consistent gets and physical reads are identical.

BITMAP SAL (Equality) B-TREE Rows Fetched
Consistent Reads Physical Reads Consistent Reads Physical Reads
165 0 1869 177 164
169 163 3548 181 167
174 166 6500 187 172
75 69 7000 81 73
177 163 2500 190 175
BITMAP SAL (Range) B-TREE Rows Fetched
Consistent Reads Physical Reads Consistent Reads Physical Reads
11778 5850 1500-2000 11778 3891 83743
11765 5468 2000-2500 11765 3879 83328
11753 5471 2500-3000 11753 3884 83318
17309 5472 3000-4000 17309 3892 166999
39398 5454 4000-7000 39398 3973 500520

For range predicates the optimizer opted for a full table scan for all the different set of values—it didn't use the indexes at all—whereas for equality predicates, the optimizer used the indexes. Again, the consistent gets and physical reads are identical.

Consequently, you can conclude that for a normal-cardinality culumn, the optimizer decisions for the two types of indexes were the same and there were no significant differences between the I/Os.

Step 6 (add a GENDER culumn)

Before performing the test on a low-cardinality culumn, let's add a GENDER culumn to this table and update it with M, F, and null values.

 
 
 SQL> alter table test_normal add GENDER varchar2(1);
 
 Table altered.
 
 SQL> select GENDER, count(*) from test_normal group by GENDER;
 
 S COUNT(*)
 - ----------
 F 333769
 M 499921
 166310
 
 3 rows selected.
 

The size of the bitmap index on this culumn is around 570KB, as indicated in the table below:

 
 SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);
 
 Index created.
 
 Elapsed: 00:00:02.08
 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
 2 from user_segments
 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');
 
 SEGMENT_NAME Size in MB
 ------------------------------ ---------------
 TEST_NORMAL 50
 NORMAL_GENDER_BMX .5625
 
 2 rows selected.
 

In contrast, the B-tree index on this culumn is 13MB in size, which is much bigger than the bitmap index on this culumn.

 
 
 SQL> create index normal_GENDER_idx on test_normal(GENDER);
 
 Index created.
 
 SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
 2 from user_segments
 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');
 
 SEGMENT_NAME Size in MB
 ------------------------------ ---------------
 TEST_NORMAL 50
 NORMAL_GENDER_IDX 13
 
 2 rows selected.
 

Now, if we execute a query with equality predicates, the optimizer will not make use of this index, be it a bitmap or a B-tree. Rather, it will prefer a full table scan.

 
 SQL> select * from test_normal where GENDER is null;
 
 166310 rows selected.
 
 Elapsed: 00:00:06.08
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750)
 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750)
 
 SQL> select * from test_normal where GENDER='M';
 
 499921 rows selected.
 
 Elapsed: 00:00:16.07
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025)
 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025)
 
 SQL>select * from test_normal where GENDER='F'
 /
 
 333769 rows selected.
 
 Elapsed: 00:00:12.02
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte
 s=8344225)
 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
 Bytes=8344225)
 

Conclusions

Now that we understood how the optimizer reacts to these techniques, let's examine a scenario that clearly demonstrates the best respective applications of bitmap indexes and B-tree indexes.

With a bitmap index on the GENDER culumn in place, create another bitmap index on the SAL culumn and then execute some queries. The queries will be re-executed with B-tree indexes on these columns.

From the TEST_NORMAL table, you need the employee number of all the male employees whose monthly salaries equal any of the fullowing values:

 
 
 1000
 1500
 2000
 2500
 3000
 3500
 4000
 4500
 
 

Thus:

 
 SQL>select * from test_normal 
 where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
 

This is a typical data warehouse query, which, of course, you should never execute on an ulTP system. Here are the results with the bitmap index in place on both columns:

 
 SQL>select * from test_normal 
 where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
 
 1453 rows selected.
 
 Elapsed: 00:00:02.03
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850)
 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850)
 2 1 BITMAP CONVERSION (TO ROWIDS)
 3 2 BITMAP AND
 4 3 BITMAP OR
 5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
 14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1353 consistent gets
 920 physical reads
 0 redo size
 75604 bytes sent via SQL*Net to client
 1555 bytes received via SQL*Net from client
 98 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1453 rows processed
 
 

And with the B-tree index in place:

 
 
 SQL>select * from test_normal 
 where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
 
 1453 rows selected.
 
 Elapsed: 00:00:03.01
 
 Execution Plan
 ----------------------------------------------------------
 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850)
 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850)
 
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 6333 consistent gets
 4412 physical reads
 0 redo size
 75604 bytes sent via SQL*Net to client
 1555 bytes received via SQL*Net from client
 98 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1453 rows processed
 

As you can see here, with the B-tree index, the optimizer opted for a full table scan, whereas in the case of the bitmap index, it used the index to answer the query. You can deduce performance by the number of I/Os required to fetch the result.

In summary, bitmap indexes are best suited for DSS regardless of cardinality for these reasons:

  • With bitmap indexes, the optimizer can efficiently answer queries that include AND, OR, or XOR. (Oracle supports dynamic B-tree-to-bitmap conversion, but it can be inefficient.)

  • With bitmaps, the optimizer can answer queries when searching or counting for nulls. Null values are also indexed in bitmap indexes (unlike B-tree indexes).

  • Most important, bitmap indexes in DSS systems support ad hoc queries, whereas B-tree indexes do not. More specifically, if you have a table with 50 columns and users frequently query on 10 of them—either the combination of all 10 columns or sometimes a single culumn—creating a B-tree index will be very difficult. If you create 10 bitmap indexes on all these columns, all the queries can be answered by these indexes, whether they are queries on all 10 columns, on 4 or 6 columns out of the 10, or on a single culumn. The AND_EQUAL hint provides this functionality for B-tree indexes, but no more than five indexes can be used by a query. This limit is not imposed with bitmap indexes.

In contrast, B-tree indexes are well suited for ulTP applications in which users' queries are relatively routine (and well tuned before deployment in production), as opposed to ad hoc queries, which are much less frequent and executed during nonpeak business hours. Because data is frequently updated in and deleted from ulTP applications, bitmap indexes can cause a serious locking problem in these situations.

The data here is fairly clear. Both indexes have a similar purpose: to return results as fast as possible. But your choice of which one to use should depend purely on the type of application, not on the level of cardinality.


Vivek Sharma is Senior Oracle DBA with Accenture India in Mumbai. He has six years of experience with Oracle technulogies and is an Oracle Certified Professional. Vivek specializes in performance tuning and SQL-PL/SQL optimization