Database
Database 11g
Oracle Text is part of the Oracle9i Standard and Enterprise Editions. Oracle Text uses standard SQL to index, search, and analyze text and documents stored in the Oracle database, in files, and on the Web. Oracle Text can perform linguistic analysis on documents; search text using a variety of strategies oncluding keyword searching, context queries, Boolean operations, pattern matching, mixed thematic queries, HTML/XML section searching, etc. Oracle Text can render search results in various formats including unformatted text, HTML with highlighting, and original document format. Oracle Text supports multiple languages inlcuding Japanese, Korean, Traditional and Simplified Chinese.
| [Back to Top] |
interMedia integrates all of the features and functions of the former Oracle ConText product with image, audio and video, and geographic location services for Web Content Management applications built with the Oracle8i Internet Platform. interMedia Text has been completely re-architected, enhanced and tightly integrated with Oracle8i to provide up to an order of magnitude better search performance, greater ease of use, and new capabilities like thematic searches.
| [Back to Top] |
ConText is a cartridge for Oracle 8.0.x. interMedia Text is a feature of Oracle8i. Oracle Text is a feature of Oracle9i.
| [Back to Top] |
The official page of Oracle Text in the Oracle Technology Network:
http://technet.oracle.com/products/text/
There is also an Oracle Text forum for posting questions:
http://technet.oracle.com/support/htdocs/discussions.htm
| [Back to Top] |
Content-based retrieval on free text with both literal (word) predicates and thematic predicates. The main features of Oracle Text include: a comprehensive range of operators and index preferences (e.g. Boolean, exact phrase match, proximity, section searching, fuzzy, stemming, wildcard, thesaurus, stopwords, case sensitivity, and search scoring), "about" search, structured search, broad document format support and multi-language support. Other features are classification, catalog indexing, and XML XPath support.
| [Back to Top] |
That depends on the language - which is determined from the environment variable NLS_LANG. If NLS_LANG is not set, or the language component is set to "American" (e.g. NLS_LANG="American_America.ISO8859P1), then both theme and word indexes are created automatically. For other language settings, a theme index is not built.
| [Back to Top] |
Oracle9i Text Reference Guide and Oracle9i Text Application Developer's Guide. There is also a chapter on the "Oracle9i Application Developer's Guide - XML" reference.
| [Back to Top] |
"Expert One on One: Oracle" by Tom Kyte, Wrox Press. Chapter on interMedia Text by Joel Kallman.
"Building Oracle XML Applications" by Steve Muench, O'Reilly. Chapter 13 on interMedia Text.
"Document Warehousing and Text Mining" by Dan Sullivan, Wiley. Extensive analysis of Oracle Text search and services in document warehousing and text mining. Compares Oracle with IBM and, to some extent, Thunderstone.
"Oracle8i Tips & Techniques" by Douglas Scherer et al., Oracle Press. Chapter 11 on interMedia Text.
"Oracle8i, The Complete Reference" by Kevin Loney and George Koch, Oracle Press. Chapter 24: Using interMedia Text for Text Searches.
"Oracle XML Handbook" by Ben Chang, Mark Scardina et al., Oracle Press. Chapter 6: Searching XML Documents with interMedia Text.
| [Back to Top] |
Migrating your pre-8i ConText applications to Oracle8i interMedia Text involves the following steps:
a. Upgrading the database to Oracle8i. This is a mechanical process performed by the DBA.
b. Rebuilding your Text indexes. The API for Text index creation and maintenance is different, and simpler in Oracle8i, so you will need new scripts. interMedia Text ships with utilities to help generate these scripts.
c. Migrating your application code. The API for Text queries is different, and MUCH simpler in Oracle8i, so you will need to change the application code that makes these calls.
See the Oracle8i interMedia Text Migration Guide for details.
Does interMedia Text create theme indexes by default? That depends on the language - which is determined from the environment variable NLS_LANG.
If NLS_LANG is not set, or the language component is set to "American" (e.g. NLS_LANG="American_America.ISO8859P1), then both theme and word indexes are created automatically. For other language settings, a theme index is not built.
| [Back to Top] |
In all languages, ABOUT queries increase the number of relevant documents returned by a query. In English, the ABOUT query returns documents based on the concepts of your query, not only the exact word or phrase you specify (e.g. you can issue a query that finds all documents about the subject politics, not just the documents that contain the word politics).
| [Back to Top] |
Working code samples can be accessed from the Oracle Technology Network (OTN) web site at: http://technet.oracle.com/products/text
| [Back to Top] |
A.
cd $ORACLE_HOME/ctx/admin
B. as SYS:
@dr0csys.sql ctxsys system temp
where ctxsys is the ctxsys password system is the default tablespace for ctxsys temp is the temporary tablespace for ctxsys
C. as CTXSYS:
@dr0inst.sql
/oracle/db/dev118/ctx/lib/libctxx8.so
(on Solaris) where /oracle/db/dev118 is $ORACLE_HOME
D. as CTXSYS:
@defaults/drdefus.sql
Note: this is assuming you want U.S. to be your language default, otherwise run the corresponding drdef for your language.
| [Back to Top] |
A. Configure an IPC listener address. For example, change:
LISTENER =
(ADDRESS_LIST=
(ADDRESS=
(PROTOCOL=tcp) (HOST=ap118sun.us.oracle.com) (PORT=1521)
)
)
to:
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC) (KEY = EXTPROC0))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP) (HOST = ap118sun) (PORT = 1521))
)
)
)
This shows the full LISTENER definition in your listener.ora. Note: EXTPROC0 ends in a zero.
B. Add a system identifier (SID) name of PLSExtProc and a program name of EXTPROC in the server's LISTENER.ORA file. For example, in the SID_LIST_LISTENER definition, insert:
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC = ...
-
-
-
(SID_DESC =
(SID_NAME=PLSExtProc)(ORACLE_HOME=/oracle/db/dev118)
(PROGRAM=extproc)
)
-
-
-
C. Add a net service name description entry for EXTPROC0 in the server's tnsnames.ora file, using SID rather than SERVICE_NAME in the CONNECT_DATA section. For example, add this to the end of tnsnames.ora:
extproc_connection_data =
(DESCRIPTION=
(ADDRESS_LIST =
(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC0))
)
(CONNECT_DATA=
(SID=PLSExtProc)
(PRESENTATION = RO)
)
)
This entry should be added exactly as you see it here (do NOT substitute your SID anywhere, do NOT change the case of any text). Note: EXTPROC0 ends in a zero, RO ends in an Oh
| [Back to Top] |
lsnrctl status
should give you the following:
LSNRCTL for Solaris: Version
8.1.5.0.0 - Production on 31-MAR-99 18:57:49
(c) Copyright 1998 Oracle Corporation. All rights reserved.
Connecting to
(DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC0)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Solaris: Version 8.1.5.0.0 - Production
Start Date 30-MAR-99 15:53:06
Uptime 1 days 3 hr. 4 min. 42 sec
Trace Level off
Security OFF
SNMP OFF
Listener Parameter File
/private7/oracle/oracle_home/network/admin/listener.ora
Listener Log File
/private7/oracle/oracle_home/network/log/listener.log
Services Summary...
PLSExtProc has 1 service handler(s)
oco815 has 3 service handler(s)
The command completed successfully
The important information to understand here is the service handler for PLSExtProc.
B. Create a user/table/index/query thus: As SYS or SYSTEM:
create user ctxtest identified by ctxtest ;
grant connect, resource, ctxapp to ctxtest ;
Do any other grants, quotas, tablespace etc. for the new user. As CTXTEST:
create table quick
(
quick_id number primary key,
text varchar(80)
);
insert into quick ( quick_id, text )
values ( 1, 'The cat sat on the mat' );
insert into quick ( quick_id, text )
values ( 2, 'The quick brown fox jumped over the lazy dog' );
commit;
create index quick_text on quick ( text )
indextype is ctxsys.context;
At this point, if your Net8 setup is NOT correct you will get:
create index quick_text on quick ( text )
*
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: ConText error:
DRG-50704: Net8 listener is not running or cannot start external procedures
ORA-28575: unable to open RPC connection to external procedure agent
ORA-06512: at "CTXSYS.DRUE", line 122
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 34
ORA-06512: at line 1
If everything is OK, you will get:
Index created.
Now try some queries:
SQL> select quick_id from quick where contains(text, 'cat')>0;
QUICK_ID
----------
1
SQL> select quick_id from quick where contains(text, 'fox')>0;
QUICK_ID
----------
2
Congratulations !! You now have a successful interMedia Text V8.1 installation.
| [Back to Top] |
You will need to check the index errors table CTX_INDEX_ERRORS. This is owned by the user CTXSYS, and most users do NOT have # SELECT privilege to it by default.
| [Back to Top] |
In addition to the search capabilities of Oracle Text, a number of other features are provided to simplify application development:
Document Format Support - In order to index documents stored in a variety of native formats, such as Word, Excel, PowerPoint, WordPerfect, HTML, and Acrobat/PDF, interMedia supplies a broad variety of "filters" that allow documents stored in their native formats to be indexed.
Document Viewing and Highlighting - Oracle Text Services can convert any supported document format to either plain text or formatted text (an HTML approximation retaining as much as possible of the original formatting; available for all formats except PDF). Both plain text and HTML versions may be viewed in a standard browser, allowing maximum flexibility in deployment, especially for public Web applications. Both plain text and HTML versions may be viewed with or without highlighting and navigation of the search words or themes. It is also possible to download the original document in its native form for viewing in either the original authoring system (Word, PowerPoint etc.) or in any third-party viewer. For viewing PDF documents, the Acrobat viewer should be used.
Document Storage - Oracle Text Services supports several document storage options. The most straightforward option is "direct" storage where documents are kept in an Oracle database table. Where this is not possible, documents may be indexed directly from a file system or URL address by simply storing a file pointer or URL in the indexed column.
Text Manager - Oracle Text supplies an administration tool, integrated with the Oracle. Enterprise Manager, through which all major text maintenance and administration functions may be performed.
| [Back to Top] |
Oracle8i (interMedia) and Oracle9i Text uses the Inso Chicago Corporation's filtering technology, which enables the system to automatically detect and filter most document formats. With this single filtering technology, the system can index single or mixed format columns. Inso's filtering technology is also used to create plain-text and HTML output for document presentation. Oracle will receive updates from Inso and will pass these on to customers in new product releases and in patch releases as appropriate. Third-party or custom filters may also be used through an external filter capability.
| [Back to Top] |
The filter set that we have licensed covers over 100 document formats and is currently available on MS Windows on Intel, Solaris, HP/UX, IBM AIX, DEC UNIX, Linux, and SGI. See the Reference manual for a complete list of supported platforms and supported formats.
| [Back to Top] |
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: ConText error:
ORA-06520: PL/SQL: Error loading external library
ORA-06522: ld.so.1: extprocPLSExtProc: fatal: libskgxp8.so: open failed:
No such file or directory
ORA-06512: at "CTXSYS.DRUE", line 122
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 34
ORA-06512: at line 1
A. Stop your SQL*Net listener. Make sure that your LD_LIBRARY_PATH environment variable contains the following library directories:
$ORACLE_HOME/lib
$ORACLE_HOME/ctx/lib
Then restart your SQL*Net listener.
B. Check your tnsnames.ora file (usually in $ORACLE_HOME/network/admin), and look for something like:
sid_list_listener=...
(ENVS=LD_LIBRARY_PATH=/oracle/product/8.1.5/ctx/lib:)
(program=extproc)
If you have LD_LIBRARY_PATH in here, it MUST include both of the following paths(spelled out):
$ORACLE_HOME/lib and
$ORACLE_HOME/ctx/lib
| [Back to Top] |
ERROR at line 1:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: ConText error:
DRG-50704: Net8 listener is not running or cannot start external procedures
ORA-28575: unable to open RPC connection to external procedure agent
ORA-06512: at "CTXSYS.DRUE", line 122
ORA-06512: at "CTXSYS.TEXTINDEXMETHODS", line 34
The SQL*Net listener is unable to start the "extproc" process which does the actual index creation. See your SQL*Net documentation for more details.
| [Back to Top] |
ORA-00955: name is already used by an existing object
You must drop the index first. Occasionally, it may be necessary to do the following:
DROP INDEX indexname FORCE;
The reason behind this is that unlike standard B-tree indexes, it is possible for an Oracle Text (interMedia Text) index to be partially complete. Maybe some rows were not indexed properly, or maybe the index creation failed at the last moment and we can continue where we left off. Hence, once you have started an index creation, the kernel will assume that index exists until told otherwise.
| [Back to Top] |
Releases from 8.1.6 and higher allow attribute indexing. See the following site:
http://otn.oracle.com/products/text
| [Back to Top] |
CTXSYS/CTXSYS is the default username/password for the Oracle Text schema owner. Oracle8i Lite does not support Oracle Text
| [Back to Top] |
Oracle Text (intermedia Text) is a feature of Oracle8i and Oracle9i.
| [Back to Top] |
Yes you can. You have two options: Use the USER_DATASTORE object to create a concatenated field on the fly during indexing. Concatenate your fields and store them in an extra CLOB field in one of your tables. Then create the index on the CLOB field. If you're using Oracle8i Release 2( 8.1.6) or higher, then you also have the option of placing XML tags around each field prior to concatenation. This gives you the capability of searching WITHIN each field.
| [Back to Top] |
Oracle Text (interMedia Text) can create a full-text index on 9 million web pages - and pretty quickly. In a benchmark on a large Sun box, we indexed 100Gig of web pages (about 15 million) in 7 hours. We can also do partial indexing via regular DML or (in 9i) via partitioning. You can do "indexing light" to some extent - you can disable theme indexing, you dont need to filter documents if they are already ASCII/HTML/XML, and most common expansions - fuzzy, stemming, proximity - are done at query time.
| [Back to Top] |
If you want to designate a specific page as the preferred answer for queries on certain terms, we typically recommend creating a keyword section and putting all relevant keywords in that section. After the documents are indexed with proper section groups, search ranking can be controlled by the query. For example:
We determine that http://www.oracle.com/employment/ is the preferred page for queries on any of these terms: employment, job, jobs, recruitment. We modify the document to contain the following section:
<YOUR_SECTION>employment, job, jobs, recruitment</YOUR_SECTION>
Now assuming the document has been indexed with a section defined YOUR_SECTION, the following query will make sure that all documents where the term employment occurs in YOUR_SECTION will show up before other documents
(employment WITHIN YOUR_SECTION)*2 , employment*1
Note that the operator we are using to connect the subqueries is ACCUMULATE. This technique is being used by Oracle.com and Oracle UltraSearch among others.
| [Back to Top] |
As CTXSYS (or another DBA user):
SQL> select * from ctx_version; VER_D ----- 8.1.7
| [Back to Top] |
From the command line (as DBA):
sql>alter user ctxsys account unlock identified by <password>;
You can also unlock the account from OEM (Oracle Enterprise Manager):
unix>oemapp dbastudio
| [Back to Top] |
Fact ONE
A domain index can exist and have USER_INDEXES.STATUS = 'VALID' even when its CREATE statement raises an exception.
create table quick
( id number constraint quick_pk primary key,
text varchar2(80) );
insert into quick ( id, text )
values ( 1, 'The cat sat on the mat' );
commit;
create index quick_text on quick ( text )
indextype is ctxsys.context
parameters ( 'dog' );
...which gives DRG-11000: invalid keyword DOG.
This CREATE INDEX statement is syntactically correct SQL, and so causes a row to be created in the table under USER_INDEXES. The operation fails on the Text side of the Extensible Indexing API 'cos the inner language of the PARAMETERS clause has wrong syntax. The row in USER_INDEXES has...
STATUS = 'VALID'
DOMIDX_STATUS = 'VALID'
DOMIDX_OPSTATUS = 'FAILED'
Note, under these circumstances the failed index is NOT represented in CTX_USER_INDEXES.
While this state persists, you can do queries against the table - unless of course the WHERE clause has a CONTAINS against the column with the failed ConText index, which gives...
DRG-10599: column is not indexed
But you cannot do insert, delete or update for ANY field to the table. You get...
ORA-29861: domain index is marked LOADING/FAILED/UNUSABLE
The only way to recover from this situation is to do...
drop index quick_text;
...and then reissue the CREATE INDEX without error.
Fact TWO
If you do "shutdown abort" while a CREATE INDEX ... CONTEXT job is running (which otherwise would've completed without error) then you get...
STATUS = 'INPROGRS'
DOMIDX_STATUS = 'VALID'
DOMIDX_OPSTATUS = 'VALID'
...in USER_INDEXES (ie different from in Fact ONE). There IS a row in CTX_USER_INDEXES and it has...
IDX_STATUS = 'POPULATE'
As above, while this state persists, you can do queries against the table - unless of course the WHERE clause has a CONTAINS against the column with the failed ConText index, which again gives...
DRG-10599: column is not indexed
...and again you cannot do insert, update or delete to the table. You get...
ORA-29861: domain index is marked LOADING/FAILED/UNUSABLE
Neither can you do this...
alter index my_index rebuild parameters ( 'resume' );
You get...
ORA-29868: cannot issue DDL on a domain index marked as LOADING
Even drop index my_index causes this exception. You have to use...
drop index my_index force;
Fact THREE
When CREATE INDEX ... CONTEXT aborts 'cos a resource runs out (do this by assigning the $I table to a tablespace where the user has a v.small quota, and use "memory 1k" so you see something in the $I), eg...
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
ORA-20000: interMedia Text error:
DRG-50857: oracle error in dreii0fsh
ORA-01653: unable to extend table CLASS_SYS.DR$DOCS_TEXT$I by 48 in tablespace DROP_ME
...then you get...
STATUS = 'VALID'
DOMIDX_STATUS = 'VALID'
DOMIDX_OPSTATUS = 'FAILED'
...in USER_INDEXES and...
IDX_STATUS = 'POPULATE'
...in CTX_USER_INDEXES. At this point, non-text queries work fine, text queries give DRG-10599, and attempted dml to ANY field in the Text indexed table gives ORA-29861. Then fix the quota and do this...
alter index docs_text rebuild parameters ( 'resume memory 10M' );
This completes without exception and gives a viable text index with...
STATUS = 'VALID'
DOMIDX_STATUS = 'VALID'
DOMIDX_OPSTATUS = 'VALID'
...in USER_INDEXES and...
IDX_STATUS = 'INDEXED'
...in CTX_USER_INDEXES.
Fact FOUR(a)
If you do "shutdown abort" while a ALTER INDEX ... SYNC job is running, then after restart you get...
STATUS = 'INPROGRS'
DOMIDX_STATUS = 'VALID'
DOMIDX_OPSTATUS = 'VALID'
...in USER_INDEXES (ie different from in Fact ONE). There IS a row in CTX_USER_INDEXES and it has...
IDX_STATUS = 'INDEXED'
Now you can do both queries (incl Text) and DML to the table. But any attempt to do alter index fails with...
ORA-29868: cannot issue DDL on a domain index marked as LOADING
(Even alter index docs_text rebuild online parameters ( 'dog' ) causes this, showing that the test is done generically at the SQL level before even trying to pass the parameters clause to Text.)
As with Fact Two, the only way aout of this is DROP INDEX ... FORCE.
Fact FOUR(b)
If you do "shutdown abort" while a Ctx_Ddl.Sync_Index job is running, then after restart everything is OK. (USER_IDEXES.STATUS is 'VALID'.) You can submit Ctx_Ddl.Sync_Index again and it'll finish what it hadn't done before.
Ditto "optimize".
Note: You CAN do DML in a table where SYNC is running - whether from SQL of PL/SQL. But you cannot do DML to the rows currently being sync's. Depending how you try, you either just hang until sync has finished or you get "ORA-00054: resource busy..." Of course any DML you do while SYNC is running does NOT get SYNC'd by that run.
Fact FIVE(a)
If you try ALTER INDEX ... SYNC from a second session while it's running from a first, you get "ORA-29868". This is reasonable, since at the Text level, you can't have more than one concurrent sync on the same index.
Fact FIVE(b)
If you try Ctx_Ddl.Sync_Index from a second session while SYNC is running from a first (via either API), you get a hang with no message until the first session finishes. (Though it's rather hard to verify that this is actually happenning.) This is because as stated, at the Text level, you can't have more than one concurrent sync on the same index. But it's bad there's no warning. There should a WAIT/NOWAIT flavor to the API.
Fact SIX(a)
If you try ALTER INDEX ... OPTIMIZE from a second session while SYNC is running from a first, you get "ORA-29868". This is bad, since at the Text level, you can have a concurrent SYNC and OPTIMIZE.
Fact SIX(b)
If you try Ctx_Ddl.Optimize_Index from a second session while SYNC is running from a first first (via either API) it runs fine. But it's rather hard to verify that the two jobs are actually doing good work concurrently!
| [Back to Top] |