|
What
is the Catalog Index?
Oracle Text's CONTEXT index type
is designed for collections of news articles, technical reports,
resumes -- relatively large, coherent documents. This does
not describe the majority of database text, which is usually
short text fragments -- names, item descriptions, addresses,
etc. While a CONTEXT index can be applied to these text fragments,
it's usually overkill.
Text fragments need text searching. Since
the CONTEXT index is not optimal for fragments, Oracle Text
provides a new index type -- a catalog index (CTXCAT). Using
a different index structure, the catalog index provides text
indexing that is free from maintenance overhead and text searching
that has structured query support and faster response time
than the context index. CTXCAT indexes work best when text
is in "small chunks" and searches need to restrict and/or
sort the result set according to certain structured criteria.
Creating
a Catalog Index
For example, lets consider an on-line
auction site. Each item for sale has a short description,
a current bid price and dates for the start and end of the
auction. A user might want to see all the records with "antique
cabinet" in the description, with a current bid price less
than $500 sorted by auction start time.
create
table auction (
| item_id |
NUMBER
PRIMARY KEY, |
--
auction item identifier |
| item_desc |
VARCHAR2(80), |
--
free-form item description |
| price |
NUMBER,
|
--
current price of the item |
| start_time |
DATE |
--
end time of the auction |
| end_time
|
DATE |
--
end time of the auction |
)
A CTXCAT index (like a CONTEXT index) is
a domain index and supports the "PARAMETERS" clause. A number
of possible parameter settings are shared with CONTEXT indexes.
These are: LEXER, MEMORY, STOPLIST, STORAGE and WORDLIST.
There is a new parameter for CTXCAT: INDEX SET. INDEX SET
defines the structured columns that are to be included in
the CTXCAT index. Let's say we want to create an index on
the item_desc column, but we need to be able to limit the
search results by price, and sort by start_time. We create
a new INDEX SET, and adding the structured columns to it.
ctx_ddl.create_index_set
('auction_set');
ctx_ddl.add_index ('auction_set', 'price');
ctx_ddl.add_index ('auction_set', 'start_time');
Note: that the item_desc column is
NOT part of the INDEX SET. item_desc is only mentioned when
we come to create the actual index:
CREATE
INDEX auction_index ON auction (item_desc)
INDEXTYPE IS CTXCAT
PARAMETERS ('INDEX SET auction_set');
The
Catalog Query Language
We'll be using the CATSEARCH operator
instead of the CONTAINS operator used for a CONTEXT index.
We want to find all auction items which contains the words
"toy" and "dog" but not the phrase "live animal":
SELECT
item_id, item_desc FROM auction
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"',
null) > 0;
ANDed terms do not need the word AND or even
an "&" operator. AND is assumed between any adjacent terms.
NOT is represented by "|" (OR, not used here, is represented
by "|") Parentheses can be used to group terms Double quotes
are used to surround a phrase (otherwise "live animal" would
have been read as "live AND animal". The "null" in the query
above is a placeholder for a structured clause. There is no
default - if you don't provide any structured clause you MUST
put "null" in here. The structured clause allows you to restrict,
or sort, your results. If we wanted to extend the query above
to find only items costing less than $100, we could use:
SELECT
item_id, item_desc FROM auction
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"',
'price < 100') > 0
If we wanted the results with the newest items
first, we could use:
SELECT
item_id, item_desc FROM auction
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"','price
< 100 order by start_time desc') > 0
More Info
Oracle9i
Daily Features
|