How do I create a CTXCAT index?
The simplest case is just
CREATE INDEX indexname ON table(column)
INDEXTYPE IS CTXCAT;
However, if you do this you're not taking advantage of the ability to
include structured information within the index itself.
Let's consider our online auction site mentioned previously:
create table auction (
item_id NUMBER PRIMARY KEY, -- auction item identifier
item_desc VARCHAR2(80), -- free-form item description
price NUMBER, -- current price of the item
start_time DATE -- end time of the auction
end_time DATE -- end time of the auction
)
A CTXCAT index (like a CONTEXT index) is a "domain" index. Therefore
it supports the "PARAMETERS" clause. A number of possible parameter
settings are shared with CONTEXT indexes. These are: LEXER, MEMORY,
STOPLIST, STORAGE and WORDLIST (no other CONTEXT parameters are
supported). Howevwer, the most important parameter is a new one: INDEX
SET.
INDEX SET defines the structured columns that are to be included in
the CTXCAT index. Let's say I want to create an index on the item_desc
column, but I need to be able to limit my search results by price, and
sort by start_time. I do this by creating a new INDEX SET, and adding
the structured columns to it. New (for 8.1.7) procedures in the
CTX_DDL package allow me to do this:
ctx_ddl.create_index_set('auction_set');
ctx_ddl.add_index ('auction_set', 'price');
ctx_ddl.add_index ('auction_set', 'start_time');
Note that the item_desc column is NOT part of the INDEX SET. item_desc
is only mentioned when we come to create the actual index:
CREATE INDEX auction_index ON auction (item_desc)
INDEXTYPE IS CTXCAT
PARAMETERS ('INDEX SET auction_set');
And how do I do a search?
First of all, we'll be using the CATSEARCH operator instead of the
CONTAINS operator used for a CONTEXT index. Let's ignore the
structured columns for now and look at the basic query syntax. I want
to find all auction items which contains the words "toy" and "dog" but
not the phrase "live animal":
SELECT item_id, item_desc
FROM auction
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"', null) > 0;
A few points to note:
- ANDed terms do not need the word AND or even an "&" operator. AND is assumed between any adjacent terms.
- NOT is represented by "|" (OR, not used here, is represented by "|")
- Parentheses can be used to group terms
- Double quotes are used to surround a phrase (otherwise "live animal" would have been read as "live AND animal".
The "null" in the query above is a placeholder for a structured
clause. There is no default - if you don't provide any structured
clause you MUST put "null" in here.
The structured clause allows you to restrict, or sort, your
results. If I wanted to extend the query above to find only items
costing less than $100, I could use
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"', 'price < 100') > 0
and if I wanted the results with the newest items first, I could use
WHERE CATSEARCH (item_desc, '(toy dog) | "live animal"',
'price < 100 order by start_time desc') > 0
Note that there are restrictions on what can be used in the structured
clause. Of course any columns that are mentioned MUST be included in
the index set. Additionally, you cannot use functions, expressions,
sub-selects and so on. For full details of the restrictions, see the
8.1.7 documentation addendum.
Some combinations of structured criteria and order by might not be able to
be processed efficiently by the catsearch index. When this happens, the
query will fail with the error:
DRG-10844: index cannot execute this structured predicate efficiently
although note that the most common reason for getting this message is
that you have failed to include a referenced column in the INDEX SET!
Have much index space is used?
The amount of space used by a CTXCAT index will vary according to how
many structured columns are included in the index set. Without
getting too deep into the technical detail, each structured column
requires both more space in the main index table, AND an extra b-tree
index on the index table itself.
However, the figures tend to be large.
Tests show that in an auction table as above, with two structured
columns in the index set, the final amount of storage required is
between 10 and 15 times the size of the base table. This can be
compared to CONTEXT indexes, which are typically 25-100% of the
base table size.
How much time to create the index?
Again, this varies considerably depending on the number of
structured columns in the index set. As an example, creating
a CTXCAT index on 5 million short records, with three structured
columns, took me approximately 4 hours on a multi-CPU Sun Solaris
server with 2GB of memory.
It is worth noting that the creation of the B-Tree indexes will be
considerably quicker if the SORT_AREA_SIZE kernel parameter is
increased. The default is 64K - a very low figure. Increasing this
to 1MB will have a very significant effect - I've seen index-creation
times reduce by a factor of 10 by doing this. However note the specified
amount of memory will be used by EVERY process connecting, so great care
should be taken with this parameter if the database is shared with many
other users. It might be possible to increase it just for index creation,
then reduce it later.
Last Modified: 18 May 2001 roger.ford@oracle.com