Protein Secondary Structure demo
Protein Secondary Structure Sample Application
This sample application
(available in this prot_idxtype.zip
file) shows how you can use the Extensibility Framework to speed
up queries on the secondary structure of proteins stored in an Oracle
table.
The Extensibility Framework can be used to speed up queries in
the
Life Sciences domain, Multimedia domain , Spatial domain, and any other
domain that you can conceive an indexing scheme for.
This application demonstrates how
you can implement a data cartridge, load data into an Oracle table
using
SQL*Loader, create domain indexes on indexed data, and make use of these indexes in queries.
The indexing scheme used in this
example is based on the
paper entitled "Searching on the Secondary Structure of Protein
Sequences"
by Hammel and Patel.
Some details of a protein's secondary structure, including Loops,
Sheets,
and Helices, can aid Life Scientists in determining the function of the
protein. Life Scientists would like to be able to perform queries
based on the type and length of secondary structure segments appearing
in a protein. For example, the query condition <E 3 5><H
4
4> would match all proteins having a Sheet of between length 3 and
5, followed
immediately by a Helix of length 4. More details on the query
language
and the indexing scheme can be found in the Hammel and Patel paper, as
well as in the presentation links below.
The sample application makes
use of a dataset of proteins
with known secondary structures called Stride,
which is distributed with the Predator
protein secondary structure prediction
utility. If you would like more
secondary structure protein data, you can obtain it from PDB (the
"Protein Data Bank"), or you
can generate it from the primary structure using Predator.
Predator is
available by FTP at:
UNIX Version: ftp://ftp.ebi.ac.uk/pub/software/unix/predator
DOS Version: ftp://ftp.ebi.ac.uk/pub/software/dos/predator
Presentation
Information
- A presentation walking through this demo is available for download.
Extensible
Indexing Information
For further details about the
Extensible Indexing framework,
and how to build your own indexing scheme in Oracle, please see the
Oracle
Data Cartridge Developer's Guide.
Prerequisites
- You must have Oracle 9i (or
later) installed.
Description of the
Files in the Sample
Application
- protss.sql
- Creates all of the
extensible indexing objects necessary to run the sample code, including
indextype, operators, metadata table, and statistics type. Also
creates
a schema, protss, to serve as the "indextype owner".
- createtable.sql
- Creates a
schema to hold some sample table data. This schema
is known as the "index owner". A Proteins table is created to
hold
primary and secondary protein information.
- load.ctl
- SQL*Loader control file,
used to load the sample data.
- createindex.sql
- Creates the
domain index on the Proteins table.
- query.sql
- Some sample queries.
- utlxplan.sql
- Creates the plan_table
to be used by the "Explain Plan" command to show optimizer query
execution
plans.
- showplan.sql
- Prints the execution
plan stored in plan_table. Assumes that the plan_table has
already
been defined.
- stride_sql.dat
- The Stride data in SQL*Loader form, converted from the Stride
database available with Predator.
Installation
Notes
- From the protein_eix directory,
connect to the database as
SYSTEM or SYS using SQLPlus. Execute protss.sql to create the
Secondary
Structure objects. At the SQL prompt, type:
- Connect to the database as
SYSTEM or SYS again. Execute
createtable.sql. This creates the index owner, and the Proteins
table.
At the SQL prompt, type:
- Populate the Proteins
table. First, exit SQLPlus, and
go to a command prompt, pointed to the protein_eix directory.
Execute
SQL*Loader to load the sample data.
From the command prompt, execute:
C:\protein_eix> sqlldr
protss_user/protss_user load.ctl log=xyz.log
direct=true - Connect to the database again,
as SYSTEM or SYS using SQLPlus.
Execute createindex.sql to create the domain index. At the SQL
prompt,
type:
Depending on hardware, this command
could take up
to several minutes to complete. - The demo setup is now
complete. You can run some sample
queries by executing query.sql in SQLPlus. At the SQL prompt,
type:
|