Using CTX_REPORT to generate XML output of Index Information

Using CTX_REPORT to generate XML output of Index Information

CTX_REPORT was introduced in Oracle 9i in order to help users and Oracle Support find out information about an existing Oracle Text index.

In Oracle 10g, CTX_REPORT has been enhanced with the option to produce XML output for many of its procedures. This makes it much easier to incorporate information gained from CTX_REPORT into other applications - such as perhaps a custom system-monitoring application.

This paper walks through the process of generating the report, and processing the XML output using some of Oracle's XML capabilities.

The paper assumes a basic knowledge of Oracle Text and XML, but little or no knowledge of Oracle's XML capabilities. It is hoped that the paper will double as a primer on using some of Oracle's XML capabilities to convert from XML to SQL.

The reader is strongly encouraged to try all of the examples as they are presented. In general, the output of the examples is not shown in this document, in the assumption that the readers will be trying the examples for themselves. You will need an Oracle 10g database, and a SQL*Plus session - no extra software or products are necessary.

The examples are run on a Text index called "TI", owned by the user "TUSER". You could create a suitable test index by running the following SQL as a DBA user:

create user tuser identified by tuser default tablespace users
  temporary tablespace temp;
grant connect,resource,ctxapp to tuser;
connect tuser/tuser
create table testtab(text varchar2(2000));
create index ti on testtab(text) indextype is ctxsys.context;
- the rest of the document assumes you are running in SQL*Plus as user TUSER.

Getting the Total Index Size

To find out the size of the index, we can use the function CTX_REPORT.INDEX_SIZE. This is a function returning a CLOB. Probably the simplest way to read it is in SQL*Plus using SELECT ... FROM DUAL, as follows:

SQL> select ctx_report.index_size('ti') from dual;
The chances are we'll only see a couple of lines here. That's because we need to tell SQL*Plus to print more data, using
SQL> set long 50000
This time when we re-run the select, we get much more text, including a final line which says:
TOTAL BYTES USED:				       311,296 (304.00 KB)
This is our key value. But in order to make use of that value within a program, we'd have to fetch the output from CTX_REPORT, parse it for that final value, then convert that value into a proper number. A lot of work.

So let's ask for an XML report instead:

SQL> select ctx_report.index_size('ti', null, 'XML') from dual;
(note the second arg - partition name - is not needed but must be supplied since you cannot use named parameters in a function which is part of a SELECT).

This will give us a heap of XML output, including sizes for each object.

Here's a section of the output:

<SIZE_OBJECT_NAME>
TUSER.SYS_IL0000051565C00006$$
</SIZE_OBJECT_NAME>
<SIZE_OBJECT_TYPE>
INDEX (LOB)
</SIZE_OBJECT_TYPE>
<SIZE_TABLE_NAME>
TUSER.DR$TI$I
</SIZE_TABLE_NAME>
<SIZE_TABLESPACE_NAME>
USERS
</SIZE_TABLESPACE_NAME>
<SIZE_BLOCKS_ALLOCATED>
8
</SIZE_BLOCKS_ALLOCATED>
<SIZE_BLOCKS_USED>
4
</SIZE_BLOCKS_USED>
<SIZE_BYTES_ALLOCATED>
65536
</SIZE_BYTES_ALLOCATED>
<SIZE_BYTES_USED>
32768
</SIZE_BYTES_USED>
</SIZE_OBJECT>
<SIZE_OBJECT>
Now for a moment let's just get the information about this on object - the $I table. We can do this using an the XML DB extract operator, and an XPATH predicate.

To do that bit by bit:

First, rather than directly selecting a CLOB, force it into an XMLTYPE value:

SQL> select xmltype(ctx_report.index_size('ti', null, 'XML')) from dual;
The output from this will actually be the same as before, since SQL is just converting the XMLTYPE back into a CLOB - but it prepares the way for our next step:

Next we'll use a simple XPATH expression to specify what we want to fetch:

SQL> select extract(xmltype(ctx_report.index_size('ti', null, 'XML')), '//SIZE_OBJECT') from dual;
We're now using the XML DB "extract" function to fetch all of the XML within <SIZE_OBJECT> tags. The syntax '//SIZE_OBJECT' is an XPATH expression meaning "all the XML within a SIZE_OBJECT element anywhere below the root element". A full explanation of XPATH is well beyond the scope of this document, but the reader is encouraged to look at the tutorial at w3Schools.

Notice that there are new-line characters around each element in the output. This is actually incorrect - it's a bug. We'll fix that now by using the SQL "replace" function to remove all line feeds (character 10). XMLType will automatically reformat our XML to look OK on output.

select 
  extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
          '//SIZE_OBJECT')
from dual;
Next we just need to specify that we only want the information for the $I table. We do this with a predicate within the xpath expression:
select 
  extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]')
from dual;
And to take this one stage further, we can get just the size in bytes by adding this to the XPATH expression:
select 
  extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]/SIZE_BYTES_USED')
from dual;
- but we don't really want the tags - just the value. There's two ways of doing this - we can use the text() function within the XPATH, or we can use extractValue rather than extract.

Using text():

select 
  extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]/SIZE_BYTES_USED/text()')
  as "Table Size"
from dual;
Using extractValue:
select 
  extractValue(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT[SIZE_OBJECT_NAME="TUSER.DR$TI$I"]/SIZE_BYTES_USED')
  as "Table Size"
from dual;
Notice we slipped in the labels "Table Size" on those two, since the column headings were getting increasingly unreadable.

We've now managed to extract a single value from the XML into a SQL environment. But what if we want to fetch ALL the sizes, perhaps to add them together to get a summary? To do that, we can use XMLSequence, which returns a collection of XMLType values. We can then use this function in a TABLE clause to unnest the collection values into multiple rows. Now that we're generating a "table", we no longer need to reference DUAL.

To get just the sizes:

select *
  from table(xmlsequence(
    extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT/SIZE_BYTES_USED')));
To get all the object info, one object per row:
select * 
  from table(xmlsequence(
    extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT')));
Now lets process that information into individual values:

Set the column formatting for nice output

column "Name" format a33
column "Tablespace" format a30
column "Bytes" format a12
and do the select, using extractValue to fetch individual "columns"
select 
   extractValue(Column_Value, '/SIZE_OBJECT/SIZE_OBJECT_NAME') as "Name",
   extractValue(Column_Value, '/SIZE_OBJECT/SIZE_TABLESPACE_NAME') as "Tablespace",
   extractValue(Column_Value, '/SIZE_OBJECT/SIZE_BYTES_USED') as "Bytes"
   from table(xmlsequence(
    extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT')));
And finally, since we've bought the values into the SQL world, we can use any SQL operators on the values. Here we collect the sum of all the sizes to get an aggragate total size of all objects used in the index.
select 
   sum(extractValue(Column_Value, '/SIZE_OBJECT/SIZE_BYTES_USED')) "Total"
   from table(xmlsequence(
    extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')), 
               '//SIZE_OBJECT')));

INDEX_STATS

Most of the CTX_REPORT procedures have two forms - a function which returns a CLOB (as we have been using above) and an alternate form which requires you to pass in a CLOB to be populated. One procedure - INDEX_STATS - has does not have a function version. This is because it needs to do a full table scan of the main index table, which is likely to take some time. Therefore when calling INDEX_STATS we need to go about things a little differently. First we'll create a table with an XMLType column. Then we'll call a PL/SQL block which passes a temporary CLOB to CTX_REPORT, and then inserts that CLOB into the XMLType column. We can then perform our various XML operations on that table.
drop table output;
create table output(report XMLType);

declare
  tlob clob;
  begin
    ctx_report.index_stats(
       index_name=>'ti', report=>tlob, 
       list_size=>20, report_format=>'XML');
    insert into output values (xmlType (replace(tlob,chr(10),'')) );
    commit;
  dbms_lob.freetemporary(tlob);
end;
/
We now have a single row in table OUTPUT, which contains our XML report. First we'll get the estimated row fragmentation:
select extractValue(report, 
         '//STAT_STATISTIC[@NAME="estimated row fragmentation"]')
       as "Fragmentation"
from output;
Now something rather more complicated - get the top three most frequent tokens. We must make a table out of the STAT_TOKEN entries, where STAT_TOKEN_LIST has an attribute NAME equal to "most frequent tokens" - and fetch just the first three using the position function:
select extract(value(d), '//STAT_TOKEN_TEXT')
from output, table(xmlsequence(extract(report,
      '/CTXREPORT/INDEX_STATS/STAT_TOKEN_STATS/STAT_TOKEN_LIST[@NAME="most frequent tokens"]/STAT_TOKEN[position()<4]'))) d;

Conclusion

The XML output mode of CTX_REPORT allows powerful manipulation of your index information. In order to make full use of these, you need a good understanding of the XML features of the Oracle database, such as extract, extractValue, XMLSequence, and of XPATH syntax in general. With the use of XML exploding within the data processing world, such an understanding is likely to be very useful in the future.
E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy