Oracle helps the Swiss Institute of Bioinformatics to manage vast volumes of protein data
The Swiss Institute of Bioinformatics maintains several important biological databases, including UniProtKB/Swiss-Prot, the world's most widely used source of protein sequence and functional information, but the task of managing and updating these databases is getting more difficult by the day. The sophisticated data management features of Oracle Database 10g are helping to ease the burden.
With biomedical researchers becoming increasingly reliant on protein sequence information to design new drugs and treatments, there is high demand for up to date information, and a need for increasingly sophisticated querying tools to interrogate that data. There's also increasing demand for all of the different protein sequence databases around the world to be linked up, so that the information contained within them can be interrogated as one single database.
This is far easier said than done. All the databases in question are constantly changing and updating as new information comes to light. There's also the fact that different databases use different data models and terminologies to store, classify and annotate the data they contain. Thirdly, when large protein sequence databases like UniProtKB/Swiss-Prot were first developed, they generated an immediate demand for associated databases to classify and map the properties of those sequences. This gave rise to a second generation of databases that look for and record patterns in the raw data contained in the first. UniProtKB/Swiss-Prot and similar databases are continually mined for motifs and patterns by InterPro member databases like Prosite and Pfam, so any change to UniProtKB/Swiss-Prot must automatically be made available to these and other similar systems.
Linking all of these databases and preserving the vital relationships and synchronicities between them presents major data management challenges. To address some of them, SIB developed HitKeeper, a sophisticated database management system that tracks the constantly changing relationships between multiple databases. It also acts as the database engine behind MyHits, a web-based information service that allows users to interrogate the databases managed by HitKeeper, and to perform queries on pre-selected subsets of protein sequence data.
An open source application published under the GNU Public License, HitKeeper was initially built using the MySQL open source database. However, the increasing volume and complexity of the biological datasets it was storing, combined with a number of limitations in the MySQL software, led the SIB team to seek a more advanced solution. Their search led them to Oracle Database 10
g, the market-leading relational database management system.
"There were several advantages that made us decide to migrate from MySQL to Oracle, including database scalability, management tools, intelligent locking and query optimisations," says Marco Pagni of SIB. "We conducted the migration in incremental steps, from adapting the schema and the SQL code to the new specifications, to replacing some routines that were initially written in Perl with their counterparts in PL/SQL."
"As a result, we have obtained a more robust and scalable version of HitKeeper, and plenty of new possibilities to develop it further in future using PL/SQL," says Pagni's colleague Dmitry Kuznetsov.
SIB is also investigating how it can exploit new semantic technologies available in Oracle Database 11
g to query the data available in the UniProtKB/Swiss-Prot database. The Resource Description Framework (RDF), for example, is a standard way of representing semantic data. Oracle now provides a means to store RDF data within the database, bringing the industrial-strength capabilities of the Oracle platform to the world of RDF.
For more information about activities and projects of the Swiss Institute of Bioinformatics, contact:
Dmitry Kuznetsov, Ioannis Xenarios and Marco Pagni
Swiss Institute of Bioinformatics
Vital-IT Group
Quartier Sorge - Batiment Genopode
CH-1015 Lausanne
Switzerland