Technical Note

Melding Oracle Text and Discoverer

Author: Brian Carr, founding member of Oracle Giants, Oracle Certified Professional, Oracle ACE
Publication Date: September 2005

If you're looking for a way to query the vast amount of data stored in documents, spreadsheets, and other file formats, than Oracle Text is the answer. If you're looking for a way to query a database, then Oracle Business Intelligence Discoverer is a great solution.

In this Technical Note, I will describe a way to harness the power of Oracle Text and deliver the results from Discoverer—providing your users with a one-stop shop for all their ad-hoc reporting needs. This approach is ideal for developers who want to give users a consistent interface across both sets of functionality.

I will provide step-by-step instructions and a PL/SQL routine to pull these two powerful technologies together. When you have completed these steps your users will be able to:

  • Search documents for specific data
  • Return the list of files that match their criteria in Discoverer Viewer, Plus, or Desktop
  • Drill-down and open the documents right from Discoverer
As it is not possible to put conditions on a LOB field in Discoverer, I will describe a different means of searching documents for specific words and returning results to the user.

Step 1: Build Your Bridge

Oracle Text has the ability to crawl various document types and index the content for fast searching at a later time. One use of Oracle Text—and the foundation of our example here—is to create a document management system to which users can add and then search documents; on a periodic basis, you would build the Oracle Text index.

Before you can get started you need to have an Oracle Text index already created. (See Oracle documentation for details.) Behind the scenes, when you create an Oracle Text index another table is generated in the same schema. This table contains all the words found in your set of documents. What you need to do here is match these words to the specific documents they came from and store this pair in a new table called WORDS_BY_FILE. This table will serve as your bridge between Discoverer and Text.

Table WORDS_BY_FILE contains two fields: FILE_NAME and WORD. Structuring your table in this manner allows you to use conditions in Discoverer to filter your dataset. Obviously, this table could potentially have many rows.

CREATE TABLE LIBRARY.WORDS_BY_FILE (
 FILE_NAME VARCHAR2(1024),
 WORD VARCHAR2(64)
);

Step 2: Populate Your Bridge Table

Now that you have your bridge table created you need to populate it. Use the following PL/SQL routine.

CREATE OR REPLACE PROCEDURE OracleText_WordsByFile IS
   cursor WordList_cursor is
     --create a recordset of all the words
     --contained in this specific Oracle Context Index
     select '{' || replace(token_text,'-','\-') || '}' 
        token_text 
	from LIBRARY.dr$idxLibraryLitBinary$i 
	group by token_text;
   varTokenText LIBRARY.dr$idxLibraryLitBinary$i.TOKEN_TEXT%TYPE;
BEGIN
 open WordList_cursor;
 loop
   varTokenText := NULL;
   fetch WordList_cursor into varTokenText;
   exit when WordList_cursor%notfound;
      insert into LIBRARY.WORDS_BY_FILE 
	  select TRIM(A.FILE_NAME), 
          SUBSTR(TRIM(varTokenText),2,LENGTH(TRIM(varTokenText))-2)
	  from library.literature a
	  where contains(text,varTokenText,1) > 0;
	commit;
 end loop;
EXCEPTION
 WHEN OTHERS THEN
	DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_CALL_STACK);
	DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_STACK);
 close WordList_cursor;
END;
/
 

The execution time of running this procedure will depend on the number and size of documents you are indexing.

set timing on;
set serveroutput on;
execute OracleText_WordsByFile;	  
After you've executed this procedure you're ready to go into Discoverer and create your business area.

Step 3: Create a Discoverer Business Area

From Discoverer Administrator, choose to "create a new business area." Include the WORDS_BY_FILE table in your new business area. You're creating a standard business area here with one exception: In Discoverer Administrator, make the FILE_NAME field a "Content Type" of File (see below). The Content Type field describes the content of the multimedia data used when drilling. So by setting this parameter to File, Windows will handle this filetype as if you were to open it from Windows Explorer.

Figure 1
Figure 1 Making the FILE_NAME field a "Content Type" of File

Step 4: Query and Drill from Discoverer

Now that you have a bridge table and business area, you're ready to bring it all together and see the results.

In Discoverer Desktop, create a new worksheet. Next, filter your data by creating a condition. For example, let's say you want to see all documents that contain both the words "refills" and "product". To do that you would create a condition where the column 'Word' IN ('REFILLS','PRODUCT'). To confirm that you are only getting documents that contain both those words, you need a second condition of COUNT(1) = 2. (If you were trying to find documents that contained three words, you would change this condition to be COUNT(1) = 3 and so on.) In essence, what you did here was create an AND query.

Figure 2
Figure 2 Files containing two specific words (AND statement)

Figure 3
Figure 3 Results from our query in Desktop; double-click on Adobe PDF icon to view the full PDF

To do an OR query is even simpler: You just need to remove (uncheck) your COUNT(1) = 2 condition. Now you are seeing any documents that contain the words "refills" or "product".

Figure 4
Figure 4 Files containing one of two words (OR statement)

Figure 5
Figure 5 Results from our query in View; left-click on globe icon to view the actual file (e.g. in Word document)

Congratulations, you have just created an application that allows users to perform ad hoc queries on text documents via Discoverer!

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy