Place
the cursor on this icon to display all the screenshots. You can also place the
cursor on each individual icon in the following steps to see only the screenshot
that is associated with that step.
Sequence alignment is without doubt one of the most commonly
performed bioinformatics tasks. The most widely used sequence alignment algorithm
is BLAST (Basic Local Alignment Search Tool). BLAST is a method for rapid sequence
comparison that was introduced in 1990 by Stephen Altschul. It is typically
used to compare a query nucleotide or amino acid sequence against a database
of sequences. Its success comes from its combination of speed, sensitivity,
and statistical assessment of the results.
BLAST is a heuristic method to find the high-scoring locally
optimal alignments between a query sequence and a database. The BLAST algorithm
and family of programs rely on the statistics of gapped and ungapped sequence
alignments. The statistics allow the probability of obtaining an alignment with
a particular score to be estimated.
Over the last few years, as more sequence data has become
available as a result of large scale sequencing efforts, BLAST has started to
be used for an increasing range of sequence-alignment activities, which include
functional annotation, gene discovery, across-organism and across-species analysis,
and genome assembly and completion.
Oracle10g BLAST Functions
A version of BLAST, which is very similar to NCBI BLAST 2.0,
has been implemented in the database using table functions. This enables users
to perform BLAST queries against data that is held directly inside an Oracle
database. Because the algorithms are implemented as table functions, parallel
computation is intrinsically supported.
The five core variants of NCBI BLAST have been implemented:
BLASTN compares a nucleotide query sequence
against a nucleotide database.
BLASTP compares an amino acid sequence against
a protein sequence database.
BLASTX compares the six-frame conceptual
translation products of a nucleotide query sequence (both strands) against
a protein sequence database.
TBLASTN compares a protein query sequence
against a nucleotide sequence database dynamically translated in all six
reading frames (both strands).
TBLASTX compares the six-frame translations
of a nucleotide query sequence against the six-frame translations of a nucleotide
sequence database.
Two BLAST table functions have been implemented:
MATCH
Returns the following:
q_seq_id: identifier of the query sequence
t_seq_id: identifier of the matched (target)
sequence (for example: the NCBI accession number)
score: score of the match
value: the expect value
ALIGN
Returns information about the alignment including the
following:
q_seq_id: identifier of the query sequence
t_seq_id: identifier of the matched (target)
sequence (for example: the NCBI accession number)
pct_identity: percentage of the query sequence
that identically matches with the database sequence
alignment_length: the length of the alignment
mismatches: the number of base-pair mismatches
between the query and database sequences
gap_openings: the number of gaps opened
in gapped alignment
gap_list: the list of offsets where a gap
is opened
q_start: the position where the gap alignment
starts
q_end: the position where the gap alignment
ends
s_start: the position where the database
sequence alignment starts
s_end: the position where the database sequence
alignment ends
expect: the expect value of the alignment
score: the score of the alignment
BLAST queries can be invoked directly using the SQL interface
or through an application. The results of queries can also be displayed either
by using the SQL interface or through an application. When the SQL interface
is used, the user has the flexibility to decide how the results will be displayed.
The introduction of similarity search functionality in the
database means that users no longer have to export sequence data and transform
it into a BLAST data set prior to doing similarity searches.
Complex BLAST Queries
As BLAST is implemented in the database, and can be invoked
by SQL, you can pre-process any queries, as well as perform any required post-processing.
The ability to preprocess data, so that you are performing the BLAST search
only on a subset of the data, means that queries should be highly performant.
The post-processing capability means that it is now possible to integrate the
output of a sequence similarity search with the data mining, text mining, and
other analytical features of the database.
Specify the User Name as sys and the Password.
Choose SYSDBA from the Connect As drop-down list, and then click
Login.
2.
Click the Administration tab.
3.
Under Storage, select Tablespaces.
4.
Click Create.
5.
Enter DMTBLSP as the name and under Datafiles,
click Add.
6.
Enter dmtblsp01.dbf as the file name and 20
MB for the file size. Make sure Automatically extend datafile when
full (AUTOEXTEND) is selected and enter 10 KB in the Increment
field. Then click Continue.
7.
Click OK to create the tablespace.
8.
Your tablespace has been created. Click your database
locator link to return to the Administration page.
You will perform a BLASTN query against a human DNA
database using the MATCH and ALIGN functions. Perform the following:
1.
You will perform a BLAST search of the given query sequence
against a human DNA database and return the seq_id, score, and expect
value of matches that score > 25. Open a terminal window and execute
the following commands:
cd wkdir
sqlplus odm/odm
@nblast01
The query in the nblast01.sql
script is as follows:
The output displays the sequence ID, expect value and
score for all nucleotide sequences in the ecoli10 database that have a
similarity score of greate than 25 compared to the input nucleotide query
sequence.
2.
Now perform a BLAST search of the given query sequence
against a human DNA database and return the full alignment information
of matches that have an expect value score > 25. From your terminal
window, execute the following commands:
@nblast02
The query in the nblast02.sql
script is as follows:
The results show the sequence ID, expect value, score
and full alignment information for all nucleotide sequences in the ecoli10
database that have a sequence similarity score of greater than 25 compared
to the input nucleotide query sequence.
You will perform a BLASTP query against all human proteins
in SwissProt using the MATCH and ALIGN functions. Perform the following:
1.
Perform a BLAST search of the given query sequence against
all human proteins in SwissProt and return the seq_id, score, and expect
value of matches that score > 25. From your terminal window, execute
the following commands:
@pblast01
The query in the pblast01.sql
script is as follows:
Select T_SEQ_ID, score, EXPECT as evalue from TABLE(BLASTP_MATCH ( (select sequence from query_db), -- query_sequence CURSOR(SELECT seq_id, seq_data FROM swissprot WHERE organism = 'Homo sapiens (Human)'), -- seqdb_cursor 1, -- subsequence_from -1, -- subsequence_to 0, -- FILTER_LOW_COMPLEXITY 0, -- MASK_LOWER_CASE
'BLOSUM62', -- SUB_MATRIX 10, -- EXPECT_VALUE 0, -- OPEN_GAP_COST 0, -- EXTEND_GAP_COST 0, -- WORD_SIZE 0, -- X_DROPOFF 0)) -- FINAL_X_DROPOFF t where t.score > 25;
The output displays the sequence ID, expect value, and
score for all human amino acid sequences in the SwissProt database that
have a similarity score of greater than 25 compared to the input amino
acid query sequence.
2.
Perform a BLAST search of the given query sequence against
all of the human proteins in SwissProt created after 01-Jan-90 and return
the full alignment information of matches with an expect value score >
25. From your terminal window, execute the following commands:
@pblast02
The query in the pblast02.sql
script is as follows:
Select T_SEQ_ID, ALIGNMENT_LENGTH, Q_SEQ_START, Q_SEQ_END, Q_FRAME, T_SEQ_START, T_SEQ_END, T_FRAME, score, EXPECT as evalue from TABLE(BLASTP_ALIGN ( (select sequence from query_db), -- query_sequence CURSOR(SELECT seq_id, seq_data FROM swissprot WHERE organism = 'Homo sapiens (Human)' AND creation_date > '01-Jan-90'), -- seqdb_cursor 1, -- subsequence_from -1, -- subsequence_to 0, -- FILTER_LOW_COMPLEXITY 0, -- MASK_LOWER_CASE
'BLOSUM62', -- SUB_MATRIX 10, -- EXPECT_VALUE 0, -- OPEN_GAP_COST 0, -- EXTEND_GAP_COST 3, -- WORD_SIZE 0, -- X_DROPOFF 0)) -- FINAL_X_DROPOFF t where t.score > 25;
The results show the sequence ID, expect value, score
and full alignment information for all human amino acid sequences in the
SwissProt database that have a sequence similarity score of greater than
25 compared to the input amino acid query sequence.
You will perform TBLAST queries using the ALIGN function.
Perform the following:
1.
Perform a BLAST search of the given nucleotide query
sequence against an amino acid database and return the full alignment
information of matches with expect values that score > 25. From your
terminal window, execute the following commands:
@tblast01
The query in the tblast01.sql
script is as follows:
The results show the sequence ID, expect value, score
and full alignment information for all amino acid sequences in the protdb
database that have a sequence similarity score of greater than 25 compared
to the input nucleotide query sequence.
2.
Perform a BLAST search of an amino acid query sequence
against a translated nucleotide database and return the full alignment
information of matches with an expect value > 25. From your terminal
window, execute the following commands:
@tblast02
The query in the tblast02.sql
script is as follows:
The results show the sequence ID, expect value, score
and full alignment information for all nucleotide sequences in the ecoli10
database that have a sequence similarity score of greater than 25 compared
to the input amino acid sequence.
3.
Perform a BLAST search of the given nucleotide query
sequence against a nucleotide database and return the full amino acid
alignment information of matches with expect values that score > 25.
From your terminal window, execute the following commands:
@tblast03
The query in the tblast03.sql
script is as follows:
The results show the sequence ID, expect value, score
and full alignment information for all translated nucleotide sequences
in the ecoli10 database that have a sequence similarity score of greater
than 25 compared to the input translated nucleotide query sequence.
Place
the cursor on this icon to hide all screenshots.