This application demonstrates how you can implement a data cartridge, load data into an Oracle table using SQL*Loader, create domain indexes on indexed data, and make use of these indexes in queries.
The indexing scheme used in this example is based on the paper entitled "
Searching on the Secondary Structure of Protein Sequences" by Hammel and Patel. Some details of a protein's secondary structure, including Loops, Sheets, and Helices, can aid Life Scientists in determining the function of the protein. Life Scientists would like to be able to perform queries based on the type and length of secondary structure segments appearing in a protein. For example, the query condition <E 3 5><H 4 4> would match all proteins having a Sheet of between length 3 and 5, followed immediately by a Helix of length 4. More details on the query language and the indexing scheme can be found in the Hammel and Patel paper, as well as in the presentation links below.
The sample application makes use of a dataset of proteins with known secondary structures called
Stride, which is distributed with the
Predator protein secondary structure prediction utility. If you would like more secondary structure protein data, you can obtain it from PDB (the "Protein Data Bank"), or you can generate it from the primary structure using Predator. Predator is available by FTP at:
UNIX Version: ftp://ftp.ebi.ac.uk/pub/software/unix/predator
DOS Version: ftp://ftp.ebi.ac.uk/pub/software/dos/predator
For further details about the Extensible Indexing framework, and how to build your own indexing scheme in Oracle, please see the Oracle Data Cartridge Developer's Guide.
Depending on hardware, this command could take up to several minutes to complete.