Secure Search Returns Best ResultsBy Ron Hardman
Oracle Secure Enterprise Search provides the right intranet search.
If your company is like most, data—some meant to be shared and some not—is stored in multiple locations. Those locations may include
The only thing more difficult than keeping track of all these different datasources is finding one specific piece of data in them when you need it. I recently challenged a group of students at a university to find that day's lunch menu. I knew that it was available for everyone to see on the university intranet, and the printed copy of the menu was in my hand. They had five minutes to find it online—plenty of time for an entire class to find one current document.
Only two people found it. The reason: The students had too many potential datasources to look through, and the storage location for the menu had recently changed. This task challenged 30 students who had grown up with computers and were computer science majors to look for five minutes for something very specific. How much more difficult is it for people in your organization who perhaps did not grow up with computers or may even be somewhat intimidated by them to have to look for something about a general topic saved somewhere roughly two years ago?
One way to tie many of these datasources together and make everything accessible is to use a crawler. Search companies such as Google and Ask.com enable people to access the internet by using a Web crawler to discover content, index it, and make it available through simple search screens. The crawler can go through most of the datasources mentioned above. With a single interface, everyone can find what they need—almost.
There are a couple of problems with the crawler solution. The first is that access to some data is restricted. This means that certain intranet sites, document repositories, databases, and applications require authentication and restrict access based on assigned responsibility. You wouldn't want everyone in the organization to have access to all of your HR records, for example. The second problem is that not all datasources can be crawled. Some applications have closed datasources, and the only way to access their data is through their own search interfaces.
To address these issues, Oracle has introduced a product called Oracle Secure Enterprise Search, and this article provides a feature overview of the product, discusses its architecture, and walks through the process of crawling and indexing a datasource.
All About Search
Oracle Secure Enterprise Search can index all of the datasources discussed and more than 100 document formats, including PDF, Microsoft Word, Excel, HTML, and XML. It supports indexing and searching in all major languages and all through the same search interface.
Oracle Secure Enterprise Search supports administration through a Web page created during installation. From it you can create datasources, set crawler preferences (such as the number of crawler agents to run and the maximum size for documents), modify the types of documents supported in each datasource, set the crawling schedule, and modify security settings.
As soon as you create a datasource, a schedule is automatically created for one or more Java crawlers to discover new documents or pages. When the indexing is complete, users can search through all of the datasources simultaneously or individually through the user search page. The search page is also created during installation. The easy-to-use interface requires no customization, and it is also possible to use query APIs to embed search results in your own custom application. The default search page includes basic and advanced interfaces, with the advanced search allowing for attribute-specific and source-specific searches.
Synchronization can be scheduled at any interval to ensure that current information is presented to users. For example, if a crawler is scheduled to run on a particular datasource every night at 11 p.m., the crawler will examine the source files at that time to see which changes, if any, have been made since the last crawl. Any new or modified pages or documents are indexed.
Other major features include the following:
Connections to secure datasources. Oracle provides secure source types, or connectors, to several known datasources and also supports the development of custom connectors to additional datasources. Figure 1 shows a list of available connectors.
More connectors are in development, so refer to the online documentation for the most recent connector list. Authentication. Oracle Secure Enterprise Search supports LDAP and synchronization with ActiveDirectory, so it aligns with your existing user management system. Employing existing user permissions/roles, it can restrict access to datasources according to your existing security policies.
Federated search. Some datasources may already have their own indexing mechanism. Instead of reindexing them, you can use a federated search to pass terms from Oracle Secure Enterprise Search to the other repository and then get the results back from it for display. The search user does not know that the search was executed elsewhere.
Java API. Oracle Secure Enterprise Search supplies the following APIs:
Analytics. The administration interface includes reports showing common search terms, common misspellings, click-through rates, and other statistics that can be used to improve the user experience.
Google Desktop Enterprise Integration. You can integrate Oracle Secure Enterprise Search with Google Desktop Enterprise. Users can then search their desktops in the interface in which they search other file servers.
Oracle Secure Enterprise Search includes Oracle Database 10g Enterprise Edition (Release 10.1.0.5.0) and installs with the Oracle Text option, Oracle's full-text retrieval technology, which it uses for indexing and search operations. It does not require execution of SQL scripts for index creation or maintenance. All administration is handled through the administration user interface, which executes in an Oracle Containers for J2EE (OC4J) runtime environment.
In addition to the database and the administration interface, Oracle Secure Enterprise Search includes a multithreaded Java crawler. The crawler is governed by settings in the administration interface, including its scheduled execution time and frequency, number of threads to run, crawler time-out, maximum document size, and the depth of the crawl (the number of nested links to follow beyond the defined datasource). It also allows you or your administrator to set exclusion rules. For example, if an intranet site references sites on the internet that you do not want to include, you can add the sites as exclusions. These links are not indexed and not included in searches.
Oracle Secure Enterprise Search also includes a Web services API that allows you to connect custom interfaces. If the default administration or query applications cannot be used in your application, the Web services API will provide full administration, basic query, and advanced query capabilities to your custom code.
All components are bundled in a single install disk or download file, and no additional software configuration is required to get up and running. It took me less than 20 minutes to download Oracle Secure Enterprise Search to my laptop and install it, and only another 10 to define my first datasource, complete the crawl, and perform my first search.
Enterprise searches with advanced security requirements and multiple datasource configurations obviously take more time. Any extra configuration time will be a product of corporate security requirements and the number of datasources to be configured. Keep in mind that Oracle Secure Enterprise Search can be up and running for a portion of your datasources while others are still being added.
Try It Out
To get started, download the software from OTN for your platform, unzip the file, and run Setup.exe in the top-level directory.
When the installation is finished, two URLs are provided for the user search page and administration login interface. They should look similar to the following:
To try out Oracle Secure Enterprise Search, configure the crawler to index a URL and try the search. The following steps guide you through this process:
1. Open the administration login page in your browser, and enter the password you created during installation.
2. The Information message on the General page shows that no datasources are defined. Click the Sources subtab at the top of the screen, as shown in Figure 2.
3. Select Web as the source type and click Create , as shown in Figure 3.
4. Enter Code as the source name. The starting URL can be virtually anything, including your organization's internet site. For this example, point to a code repository for a PL/SQL programming book, www.peakretrieval.com/plsql/plsql_programming_index.html, as shown in Figure 4.
5. Click Create . A schedule is created automatically, and crawling starts immediately.
6. To view the status of the crawl, click the Schedules link at the top of the page. The Code schedule is shown, along with the status of the last (or current) crawl. Click the link in the Status column to view the log.
7. Click the Statistics link (the pencil icon in the Statistics column) to display what was indexed. The crawler statistics for this datasource appear.
8. Click the Search link (not the Search tab) at the top right of the Administration page to open the search window. Figure 5 shows the default search interface.
9. Perform a search for CTXSYS.CONTEXT (an Oracle Text index type). The search is case-insensitive by default. Figure 6 shows the search results.
10. Click any link to display the indexed document—a script in this case—from its original source. Note that the keywords are highlighted in the search results and that the familiar Cached and Links options are available for each record returned. If users find that the documents they were looking for have been moved or deleted, they can still find what they need by clicking the Cached link. The Links hyperlink shows pages or documents that are referenced (linked) by this page as well as pages or documents that link to this page.
The result also shows how many document matches there were for the search terms. In this case, there are two results.
The preceding Web source example shows how you might index an internet or corporate intranet site and make it available for everyone to search. No authentication is required, and there are no restrictions regarding who has access to the data. Of course, not all datasources are meant to be open for public access.
Some datasources require authentication prior to allowing access. Oracle Database is a great example of this. To index a column in an HR database, for example, the crawler must be able to get access to that source, and this requires authentication information.
To set up a sample CONTACTS database for access from Oracle Secure Enterprise Search, download the sample data for this article, and follow these steps:
1. Log in to the Oracle Secure Enterprise Search database as SYS or SYSTEM, and run the Create_User.sql script.
2. Connect to the CONTACTS user (the default password is oracle).
3. To create the MY_PRIVATE_CONTACT_LIST table and insert the seed data, run the Create_Table.sql script.
With the sample CONTACTS schema in place on your local machine, create a new datasource on the Administration screen, as follows:
1. Click the Sources subtab, select Table as the source type, and click Create .
2. Enter the following information to define the source:
3. Click Create .
4. Click the Schedules subtab, and click the link in the Status column for the Contacts datasource.
5. Click the Statistics icon next to the Contacts source and log filename.
6. Confirm that 10 documents are indexed, and click Finish .
The crawler indexed the secure datasource you just set up (Contacts) because you provided the username and password to the crawler.
Provided you also set up the Web source example (named Code), there are two datasources indexed now. With the current configuration, clicking the Search link at the top right side of the Administration page brings up a single search screen for both sources.
To separate the sources, click the Search tab in the Administration window and click the Source Groups subtab. Perform the following steps to create two source groups, starting with the Code datasource created earlier:
1. Click Create .
2. Enter PL/SQL as the source group name, and click Proceed to Step 2 .
3. Choose Code from the list of available sources, and click the >> button.
4. Click Finish in the upper right corner.
To create the source group for the Contacts datasource, do the following:
1. Click Create .
2. Enter Contacts as the source group name, and click Proceed to Step 2 .
3. Choose Table from the list, and click Go .
4. Choose Contacts from the list of available sources, and click the >> button.
5. Click Finish in the upper right corner.
Click the Search link at the top right side of the Administration page, and note the datasources listed above the search box. By default, the search is against all source groups. To search only the Contacts source group, click Contacts , enter the search term diesel , and click Search . Oracle Secure Enterprise Search returns the matching record from the Contacts datasource.
Note that the source group is displayed with the result set. Click the source group name to browse the rest of the contents in the source group.
The indexes created by Oracle Secure Enterprise Search for all datasources are secured in the Oracle Secure Enterprise Search database, so unauthorized users will not be able to misuse any indexes and gain access to secure information. After setting up the two datasources described in this article, however, there is still no restriction on who can search either datasource. Oracle provides three ways to filter search results according to individual or group permissions, including a centralized scheme (such as Oracle Internet Directory [OID], UNIX accounts, or Microsoft Active Directory), access control list, and query-time authorization. For information on using these security schemes, refer to the links in the "Next Steps" box.
For many years, Oracle Database has been the leader in the database market, storing ever-increasing amounts of data. Oracle Secure Enterprise Search now extends data management beyond the database to every archive, file server, e-mail server, and desktop in the enterprise. It is not just about storing more data but also about managing and extracting relevant information when users need it, and that's the need Oracle Secure Enterprise Search fills.
Ron Hardman works with Academy District 20 schools in Colorado Springs, Colorado, and is the founder of 5-Mile Software, a software company delivering assessment and back-office solutions for K-12 schools. He is coauthor of Oracle Database 10g PL/SQL Programming and Expert PL/SQL, both from Oracle Press, and is an Oracle ACE.