Using CTX_REPORT to generate XML output of Index Information
Using CTX_REPORT to generate XML output of Index Information
CTX_REPORT was introduced in Oracle 9i in order to help users and
Oracle Support find out information about an existing Oracle Text
index.
In Oracle 10g, CTX_REPORT has been enhanced with the option to
produce XML output for many of its procedures. This makes it much
easier to incorporate information gained from CTX_REPORT into
other applications - such as perhaps a custom system-monitoring
application.
This paper walks through the process of generating the report, and
processing the XML output using some of Oracle's XML capabilities.
The paper assumes a basic knowledge of Oracle Text and XML, but little
or no knowledge of Oracle's XML capabilities. It is hoped that the
paper will double as a primer on using some of Oracle's XML capabilities
to convert from XML to SQL.
The reader is strongly encouraged to try all of the examples as they are
presented. In general, the output of the examples is not shown
in this document, in the assumption that the readers will be trying the
examples for themselves. You will need an Oracle 10g database,
and a SQL*Plus session - no extra software or products are necessary.
The examples are run on a Text index called "TI", owned
by the user "TUSER". You could create a suitable test index by
running the following SQL as a DBA user:
create user tuser identified by tuser default tablespace users
temporary tablespace temp;
grant connect,resource,ctxapp to tuser;
connect tuser/tuser
create table testtab(text varchar2(2000));
create index ti on testtab(text) indextype is ctxsys.context;
- the rest of the document assumes you are running in SQL*Plus
as user TUSER.
Getting the Total Index Size
To find out the size of the index, we can use the function
CTX_REPORT.INDEX_SIZE. This is a function returning a CLOB. Probably
the simplest way to read it is in SQL*Plus using SELECT ... FROM DUAL,
as follows:
SQL> select ctx_report.index_size('ti') from dual;
The chances are we'll only see a couple of lines here. That's because we need to
tell SQL*Plus to print more data, using
SQL> set long 50000
This time when we re-run the select, we get much more text, including a final line
which says:
TOTAL BYTES USED: 311,296 (304.00 KB)
This is our key value. But in order to make use of that value within a program,
we'd have to fetch the output from CTX_REPORT, parse it for that final value,
then convert that value into a proper number. A lot of work.
So let's ask for an XML report instead:
SQL> select ctx_report.index_size('ti', null, 'XML') from dual;
(note the second arg - partition name - is not needed but must be supplied since you cannot use named parameters in a function which is part of a SELECT).
This will give us a heap of XML output, including sizes for each
object.
Here's a section of the output:
<SIZE_OBJECT_NAME>
TUSER.SYS_IL0000051565C00006$$
</SIZE_OBJECT_NAME>
<SIZE_OBJECT_TYPE>
INDEX (LOB)
</SIZE_OBJECT_TYPE>
<SIZE_TABLE_NAME>
TUSER.DR$TI$I
</SIZE_TABLE_NAME>
<SIZE_TABLESPACE_NAME>
USERS
</SIZE_TABLESPACE_NAME>
<SIZE_BLOCKS_ALLOCATED>
8
</SIZE_BLOCKS_ALLOCATED>
<SIZE_BLOCKS_USED>
4
</SIZE_BLOCKS_USED>
<SIZE_BYTES_ALLOCATED>
65536
</SIZE_BYTES_ALLOCATED>
<SIZE_BYTES_USED>
32768
</SIZE_BYTES_USED>
</SIZE_OBJECT>
<SIZE_OBJECT>
Now for a moment let's just get the information about this on object - the $I table.
We can do this using an the XML DB extract operator, and an XPATH predicate.
To do that bit by bit:
First, rather than directly selecting a CLOB, force it into an XMLTYPE value:
SQL> select xmltype(ctx_report.index_size('ti', null, 'XML')) from dual;
The output from this will actually be the same as before, since SQL is just converting the XMLTYPE back into a CLOB - but it prepares the way for our next step:
Next we'll use a simple XPATH expression to specify what we want to
fetch:
SQL> select extract(xmltype(ctx_report.index_size('ti', null, 'XML')), '//SIZE_OBJECT') from dual;
We're now using the XML DB "extract" function to fetch all of the XML within <SIZE_OBJECT> tags. The syntax '//SIZE_OBJECT' is an XPATH expression meaning "all the XML within a SIZE_OBJECT element anywhere below the root element". A full explanation of XPATH is well beyond the scope of this document, but the reader is encouraged to look at the
tutorial at w3Schools.
Notice that there are new-line characters around each element in the output. This is
actually incorrect - it's a bug. We'll fix that now by using the SQL
"replace" function to remove all line feeds (character 10). XMLType
will automatically reformat our XML to look OK on output.
select
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT')
from dual;
Next we just need to specify that we only want the information for the $I table. We do this with a predicate within the xpath expression:
select
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]')
from dual;
And to take this one stage further, we can get just the size in bytes by adding this to the XPATH expression:
select
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]/SIZE_BYTES_USED')
from dual;
- but we don't really want the tags - just the value. There's two ways of doing this - we can use the text() function within the XPATH, or we can use extractValue rather than extract.
Using text():
select
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]/SIZE_BYTES_USED/text()')
as "Table Size"
from dual;
Using extractValue:
select
extractValue(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]/SIZE_BYTES_USED')
as "Table Size"
from dual;
Notice we slipped in the labels "Table Size" on those two, since the column headings
were getting increasingly unreadable.
We've now managed to extract a single value from the XML into a SQL
environment. But what if we want to fetch ALL the sizes, perhaps to
add them together to get a summary? To do that, we can use
XMLSequence, which returns a collection of XMLType values. We can then
use this function in a TABLE clause to unnest the collection values
into multiple rows. Now that we're generating a "table", we no longer
need to reference DUAL.
To get just the sizes:
select *
from table(xmlsequence(
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT/SIZE_BYTES_USED')));
To get all the object info, one object per row:
select *
from table(xmlsequence(
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT')));
Now lets process that information into individual values:
Set the column formatting for nice output
column "Name" format a33
column "Tablespace" format a30
column "Bytes" format a12
and do the select, using extractValue to fetch individual "columns"
select
extractValue(Column_Value, '/SIZE_OBJECT/SIZE_OBJECT_NAME') as "Name",
extractValue(Column_Value, '/SIZE_OBJECT/SIZE_TABLESPACE_NAME') as "Tablespace",
extractValue(Column_Value, '/SIZE_OBJECT/SIZE_BYTES_USED') as "Bytes"
from table(xmlsequence(
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT')));
And finally, since we've bought the values into the SQL world, we can
use any SQL operators on the values. Here we collect the sum of all the
sizes to get an aggragate total size of all objects used in the index.
select
sum(extractValue(Column_Value, '/SIZE_OBJECT/SIZE_BYTES_USED')) "Total"
from table(xmlsequence(
extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),
'//SIZE_OBJECT')));
INDEX_STATS
Most of the CTX_REPORT procedures have two forms - a function which returns
a CLOB (as we have been using above) and an alternate form which requires
you to pass in a CLOB to be populated. One procedure - INDEX_STATS - has does
not have a function version. This is because it needs to do a full table scan
of the main index table, which is likely to take some time. Therefore when
calling INDEX_STATS we need to go about things a little differently. First
we'll create a table with an XMLType column. Then we'll call a PL/SQL block
which passes a temporary CLOB to CTX_REPORT, and then inserts that CLOB
into the XMLType column. We can then perform our various XML operations
on that table.
drop table output;
create table output(report XMLType);
declare
tlob clob;
begin
ctx_report.index_stats(
index_name=>'ti', report=>tlob,
list_size=>20, report_format=>'XML');
insert into output values (xmlType (replace(tlob,chr(10),'')) );
commit;
dbms_lob.freetemporary(tlob);
end;
/
We now have a single row in table OUTPUT, which contains our XML
report. First we'll get the estimated row fragmentation:
select extractValue(report,
'//STAT_STATISTIC[@NAME="estimated row fragmentation"]')
as "Fragmentation"
from output;
Now something rather more complicated - get the top three most
frequent tokens. We must make a table out of the STAT_TOKEN
entries, where STAT_TOKEN_LIST has an attribute NAME equal
to "most frequent tokens" - and fetch just the first three using
the position function:
select extract(value(d), '//STAT_TOKEN_TEXT')
from output, table(xmlsequence(extract(report,
'/CTXREPORT/INDEX_STATS/STAT_TOKEN_STATS/STAT_TOKEN_LIST[@NAME="most frequent tokens"]/STAT_TOKEN[position()<4]'))) d;
Conclusion
The XML output mode of CTX_REPORT allows powerful manipulation of your index information.
In order to make full use of these, you need a good understanding of the XML features of the
Oracle database, such as extract, extractValue, XMLSequence, and of XPATH syntax in general.
With the use of XML exploding within the data processing world, such an understanding is
likely to be very useful in the future.
|