Hitlist Highlighting with Oracle Text
Hitlist Highlighting with Oracle Text
Get a zip archive of all the files for this example.
Introduction
The result of a free-text search is commonly a list of documents - or
"hit list" - which match the search. The user will typically scan this
list looking for items which are relevent. To make this process
efficient, we must present as much useful information as possible in
the hitlist. Sometimes the best way to do this is to present
information such as the title of the document. But often it helps to
be able to highlight a section, or sections, of the main text of the
document which best matches the search terms, with those search terms
highlighted.
This techique is sometimes known as "Key Words In Context", or KWIC.
This paper will demonstrate how to generate such a KWIC hitlist.
It uses a Java bean to do the main work, and demonstrates calling that
bean from a Java Server Page (JSP). The bean, or its component code,
could easily be called from other Java environments, or from a PL/SQL
procedure as a database procedure stored in Java.
The application illustrated is a simplified version of one used within
Oracle to search a mailing list archive. It should be easily adaptable
to customers own requirements.
The Algorithm
The full algorith is presented in the file algorithm.html
in the zip file (see the top of this document). It uses the concepts
of "relevance" and "novelty". Relevance is a measure of the number of
words which match in a particular segment. Novelty is the number of
new words which have not been found in previous segments.
Testing the Application
Create and populate a simple test table using Create.sql
from the download zip file. You will need to run it initially as a
DBA user such as SYS or SYSTEM. It will create a user called
"testuser", then it will log on as that user, then create a table and
populate it with two rows of nonsense text. Note that if your
environment requires a connect string, you will need to add it to the
testuser/testuser username/password pair.
Then edit test.jsp, substituting in the necessary parameters for your
database in the lines:
ods.setServerName ("eddie");
ods.setPortNumber (1521);
ods.setDatabaseName ("eddi10b");
The first value is the server machine name, the second the SQL*Net
port number (default is 1521) and the third is the database SID.
If you are unsure of these values, check in your "sqlnet.ora" file or
consult your database administrator.
Now copy test.jsp into your Oracle webserver's HTDOCS directory. The
default, on Unix, will be $ORACLE_HOME/Apache/Apache/htdocs.
Now compile the Java source:
javac KWIC.java
If you don't already have "javac" in your PATH, you will find it
(on Unix) in $ORACLE_HOME/jdk/bin.
If all goes well, this will generate KWIC.class. This must be
copied into the directory
$ORACLE_HOME/Apache/Apache/htdocs/WEB-INF/classes (or the equivalent
on non-Unix systems). It will probably also be necessary to copy the
file $ORACLE_HOME/jdbc/lib/classes12.jar into the same directory.
Now go to your web browser, and type in
http://yourservername/test.jsp
If your Apache web server is running on a different port, such as
8888, you would need to say:
http://yourservername:8888/test.jsp
If all goes well you should get a web page consisting of a search
box and a "number of rows" box. Enter "quick and brown" in the search
box and hit "submit".
You should now see a hitlist with two items, and the search terms
highlighted in context.
Customizing
To work with your own table, it shouldn't be necessary to change the
KWIC bean at all. You should be able to make all the necessary
modifications in the JSP file.
Important: if you change the name of "test.jsp" you MUST change the
line |