|
|
Last updated: 03/23/04
Contents
Top Questions
Crawl and Query Performance
Search/Result Page Questions
Advanced Features
Troubleshooting, Installation and Configuration
Integration with other Oracle products
Top 7 General Questions
1.
What types of data sources can be made searchable with Ultra Search ?
Out-of-the-box, Ultra Search crawls web pages, databases (Oracle
& Non-Oracle), IMAP, Oracle Portal, and file systems. It provides a
mechanism (Java Extensibility API) to extend the crawler to crawl other
data sources.
2. How can I
obtain Ultra Search ?
Ultra Search is included, no additional license cost, in: 9i
database, 9iAS (as subcomponent of Portal, includes search-and-result
'Portlet'), and Oracle Collaboration Suite.
3. What are
the different file types that Ultra Search can crawl and index ?
Ultra Search can find and index over 100 different vendor-specific
file types in your Web servers, filesystems, Oracle Portal and email attachments.
It uses Oracle Text and its filter technology (also see the Oracle Text
Reference Manual), which provides more than 100 different 'filters' for
the most common file types (a Filter is a piece of software which converts
vendor-specific, proprietary file types (e.g. doc, xls, and pdf) to a standard
file type that Ultra Search can index (HTML). Ultra Search bases its decision
which filter to invoke on the filename extension.
Limitations: The indexing of XML files, picture files, audio files,
and video files is not supported today.
4. Does
Ultra Search support a web interface ?
Yes, all elements of functionality can be accessed through
a browser interface. Ultra Search runs on any Java compliant Web server,
including the Oracle HTTP server (Apache) included with the Oracle server
and the Oracle Container (OC4J).
5. Where does
Ultra Search store its fulltext index ?
Ultra Search uses the Oracle9i database as its content
repository. Pre-Oracle 9i database releases can be crawled, but the Ultra
Search infrastructure database must be Release 9i or higher.
6. How can
I embedd Ultra Search into my Web pages or applications ? Is there an API
?
Ultra Search includes two "out-of-the-box" sample query applications,
written in Java Server Page (JSP) format and also a full-blown, documented
Java Query API. The Java API gives programmers control over all the bells
and whistles of the product; use it to initiate Ultra Search searches,
retrieve and display search results, and control the number of search result
hits.
7. How long
will it take to install and configure Ultra Search ?
Install & Post-install steps: half-day. Crawl: Depends
on data (must have data sources identified). Out-of-the-box default search
screen is provided, add time for any desired customizations
8.
How do I go about installing and configuring the product ?
Read the Ultra Search documentation at this site for a detailed
overview of installation and configuration. Please print and read both
installation and post installation sections for your configuration before
attempting to install.
Outline of the process:
-
Installation - Install the Ultra Search binaries and Web server (Oracle
HTTP server or OC4J) on your system. This is performed automatically for
you in the database release (through Oracle installer), the Portal release
(through Portal installer), and the Collab Suite.
You create a new Ultra Search instance (a few SQL commands) and edit
Ultra Search configuration files (you will need basic config info for your
system including hostname, Ports and SID of your Ultra Search infrastructure
database).
Note: When editing ultrasearch.properties, you need to specify an absolute
path name: admin.srchome=$ORACLE_HOME/ultrasearch/webapp/isearch_admin
does NOT work, please use:
admin.srchome=/u01/app/oracle/product/9.2.0.1.0/ultrasearch/webapp/isearch_admin.
Then, start the Web server and test Ultra Search.
-
Configuration - Access Ultra Search admin interface, input basic settings
(Crawler page), start defining data sources and schedules, and start crawling.
For installing Ultra Search with Portal, please see also Rich
Soule's Portal technote.
Performance
Peak
Queries ?
Using this technology Oracle.com services 50k queries and 1.2
M page-hits per day.
What
is the indexing throughput of Ultra Search ?
Depends on amount of CPU available, size of documents, degree
of filtering, speed of disk while reading, network and many other factors.
Assuming no bandwidth limitations, an Ultra Search instance can crawl low-hundreds
of MB per hour on a moderately equipped SUN Ultra 60. This can be scaled
up if more powerful machines are available for crawling. Typically, we
find that after the initial crawl is done, incrementally only 50-100MB
enter a portal corpus every day. Therefore we caution against provisioning
too much crawling horsepower.
What is the
impact of the crawler on my intranet servers ?
Depends on amount of data to be crawled. Observes robots.txt.
Breadth-first crawling distributes load.
Search/Result Page Questions
When displaying hits using the supplied search..jsp, although the server
estimates more than 200 hits, it will only display 200 of them. Why ?
Showing more hits is very resource intensive on the server, so we place
a limit of 200 hits out-of-box. This limit can be changed by editing wk0qry.pkh
in $ORACLE_HOME/ultrasearch/admin. MAX_HITS is defined to be 200. After
changing this limit, reload wk0qry.pkh into the wksys schema using sqlplus.
Advanced Features
Is
it possible to do Thesaurus searches with Ultra Search ?
Ultra Search does not utilize the Oracle Text thesaurus functionality
by default, but can be extended to use Text's thesaurus operator if the
Ultra Search instance provides 'custom query expansion' (query expansion
means that for each user's search string, Ultra Search expands it to a
query in the underlying Oracle Text SQL extension language).
Each Ultra Search instance can customize the query that Ultra Search
sends to Text by providing a couple of stored procedures. Use the 'synonym'
operator to get Thesaurus functionality. The procedure to customize query
expansion is described here.
To give an example, say the current expansion for "hello world" is:
({hello world})*2,(({hello};{world})*2,({hello},{world}))
To use the thesaurus feature, you need to change the expansion to:
({hello world})*2,(({hello};{world})*2,({hello},{world},SYN(hello),SYN(world)))
Note: 'SYN' is the Text synonym operator. Please check the Oracle Text
Reference for the exact syntax.
Does Ultra
Search support "natural language" searches ?
Ultra Search believes that stemming and thesaurus technologies
are very effective for information retrieval. For natural language queries,
Oracle partners with third parties to deliver NL front-ends on Oracle Text
and Ultra Search.
How does Ultra
Search detect the language of a document ?
The language recognizer is trained statistically using trigram
data from documents in various languages (8 provided: Danish, Dutch, English,
French, German, Italian, Portugue, and Spanish). It starts with the hypothesis
that the given document does not belong to any language and then refutes
this hypothesis for a particular language where possible. Currently works
for any language using the Latin-1 alphabet and any language with a deterministic
Unicode range of characters (Chinese, Japanese, Korean, etc.). Also, imposes
a 500 character cut off (if a language is recognized by that point). Though
not formally supported yet, the LanguageRecognizer does provide a method
for adding other Latin-1 language by compiling new training data.
Troubleshooting, Installation and Configuration
After
installing Ultra Search 9.2.x, I can not get the Ultra Search Query page
(search.jsp) to work. I get the following error:
oracle.ultrasearch.query.SearchException: WKG17005: connection failure.
What can I do ?
Ultra Search 9.2.x "hardcodes" the connection information used
by the sample query application search.jsp to connect to the Ultra Search
database into the file $OH/ultrasearch/sample/query/common_customize_instance.jsp.
Update this file and force recompilation to fix your problem:
-
Edit common_customize_instance.jsp and enter the correct information
for your Ultra Search instance and place the correct connection string
information into the string m_connection_string, including hostname,
database port #, and SID of the Ultra Search "backend" database instance
(Ultra Search's own database instance in which it keeps the fulltext index
and its dictionary). Also, replace the strings in m_instance_schema
with the userid and password of your newly created Ultra Search instance
(Note: The Ultra Search installation requires you to create a new database
user, schema, tablespace and instance as part of its post installation).
-
Force recompilation of search.jsp by removing the JSP cache directory,
usually named _pages in directory $OH/ultrasearch/sample.
You may also use Unix touch, or copy search.jsp over itself
to achieve the same end.
When displaying
hits using the supplied search..jsp, although the server estimates more
than 200 hits, it will only display 200 of them ?
For answer of this question, see TroubleShooting
Section
Integration With Other Oracle Products
Oracle Collaboration Suite
How
do I setup Search in the Collaboration Suite ?
In version 1 of the Collaboration Suite, the search application
is part of the Files component and provides search over two other components
of the Suite, Files and Mail. In addition, Ultra Search can be configured
to provide search over Intranet Web pages.
Files and Mail search should work out-of-the-box after these components
have been installed with the Suite. To configure Web Search, first configure
Ultra Search (Ultra Search gets installed by the Portal installer when
the Collab Suite Portal is installed) and then point the Collab Suite search
application to the Ultra Search instance. To do this, find the "Federated
Search" configuration page in configuration section of the Files component
on Enterprise Manager. Enter the correct connection information into this
page (Note: The "Mail" parameter can be ignored, please do not alter "Webmail
base URL").
Note: Errors generated by the Collab Suite search application are written
into the files application log.
What
files am I authorized to find and see in the Collaboration Suite "Search"
application ?
Collaboration Suite Search will find all private files of the
login user (based on Collab. Suite login) and all files that are in 'workspaces'
where the login user is a member.
Oracle Files (formerly Oracle iFS)
Is
Ultra Search integrated with Oracle Files ?
Ultra Search will be integrated with Files in Oracle Collaboration
Suite Version 2. In version 2, the File component of the Collaboration
Suite will interact with UltraSearch through a new 'searchlet' interface
(a Searchlet is a Java program that will interface between Ultra Search
and other applications through a new, standardized API). The Searchlet
API will allow Ultra Search to send search requests to Oracle Files without
the need for crawling. Files manages its own fulltext index - documents
get indexed immedialy upon insertion into Files - and returns search results
back to Ultra Search.
You can crawl Oracle Files with Ultra Search today through
HTTP - it is just another web data source - or database crawling. However,
only public documents can be crawled through HTTP.
|
| |