Catalog Index - A New Oracle Text Indextype
   
What is the Catalog Index?
Oracle Text's CONTEXT index type is designed for collections of news articles, technical reports, resumes -- relatively large, coherent documents. This does not describe the majority of database text, which is usually short text fragments -- names, item descriptions, addresses, etc. While a CONTEXT index can be applied to these text fragments, it's usually overkill.

Text fragments need text searching. Since the CONTEXT index is not optimal for fragments, Oracle Text provides a new index type -- a catalog index (CTXCAT). Using a different index structure, the catalog index provides text indexing that is free from maintenance overhead and text searching that has structured query support and faster response time than the context index. CTXCAT indexes work best when text is in "small chunks" and searches need to restrict and/or sort the result set according to certain structured criteria.

Creating a Catalog Index
For example, lets consider an on-line auction site. Each item for sale has a short description, a current bid price and dates for the start and end of the auction. A user might want to see all the records with "antique cabinet" in the description, with a current bid price less than $500 sorted by auction start time.

create table auction (

item_id NUMBER PRIMARY KEY, -- auction item identifier
item_desc VARCHAR2(80), -- free-form item description
price NUMBER, -- current price of the item
start_time DATE -- end time of the auction
end_time DATE -- end time of the auction

)

A CTXCAT index (like a CONTEXT index) is a domain index and supports the "PARAMETERS" clause. A number of possible parameter settings are shared with CONTEXT indexes. These are: LEXER, MEMORY, STOPLIST, STORAGE and WORDLIST. There is a new parameter for CTXCAT: INDEX SET. INDEX SET defines the structured columns that are to be included in the CTXCAT index. Let's say we want to create an index on the item_desc column, but we need to be able to limit the search results by price, and sort by start_time. We create a new INDEX SET, and adding the structured columns to it.

ctx_ddl.create_index_set ('auction_set');
ctx_ddl.add_index ('auction_set', 'price');
ctx_ddl.add_index ('auction_set', 'start_time');

Note: that the item_desc column is NOT part of the INDEX SET. item_desc is only mentioned when we come to create the actual index:

CREATE INDEX auction_index ON auction (item_desc)
INDEXTYPE IS CTXCAT
PARAMETERS ('INDEX SET auction_set');

The Catalog Query Language
We'll be using the CATSEARCH operator instead of the CONTAINS operator used for a CONTEXT index. We want to find all auction items which contains the words "toy" and "dog" but not the phrase "live animal":

SELECT item_id, item_desc FROM auction
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"', null) > 0;

ANDed terms do not need the word AND or even an "&" operator. AND is assumed between any adjacent terms. NOT is represented by "|" (OR, not used here, is represented by "|") Parentheses can be used to group terms Double quotes are used to surround a phrase (otherwise "live animal" would have been read as "live AND animal". The "null" in the query above is a placeholder for a structured clause. There is no default - if you don't provide any structured clause you MUST put "null" in here. The structured clause allows you to restrict, or sort, your results. If we wanted to extend the query above to find only items costing less than $100, we could use:

SELECT item_id, item_desc FROM auction
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"', 'price < 100') > 0

If we wanted the results with the newest items first, we could use:

SELECT item_id, item_desc FROM auction
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"','price < 100 order by start_time desc') > 0

More Info
Oracle Text
Oracle Text Discussion Forum
Oracle9i Text Reference
Oracle9i Text Application Developers' Guide

Oracle9i Daily Features
Archives
   
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy