This tutorial shows how to use the semantic data management capability of Oracle Spatial Option for Oracle Database 11g to store and query the National Cancer Institute (NCI) Semantic Network Ontology.
Approximately 1 hour
This tutorial covers the following topics:
| Overview | |
| Scenario | |
| Prerequisites | |
| Creating the Infrastructure and Model to Hold the RDF Data | |
| Loading the Patients Table | |
| Building Semantic Indexes | |
| Summary |
Place
the cursor over this icon to load and view all
the screenshots for this tutorial. (Caution: This action loads all screenshots
simultaneously, so response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
Overview of Semantic Data Management Using Oracle Spatial Option for Oracle Database 11g
As part of Oracle Spatial 11g, an option for Oracle Database 11g Enterprise Edition, Oracle delivers an advanced semantic data management capability not found in any other commercial or open source triple store. With native support for RDF/RDFS/OWL standards, this semantic data store enables application developers to benefit from an open, scalable, secure, integrated, efficient platform for RDF and OWL-based applications.
These semantic database features enable:
| storing, loading, and DML access to RDF/OWL data and ontologies | ||
| inference using OWL, RDFS semantics and user-defined rules | ||
| SQL querying of RDF/OWL data and ontologies using SPARQL-like graph patterns | ||
| ontology-assisted querying of enterprise (relational) data | ||
This tutorial assumes that you are familiar with the major concepts associated with RDF and OWL, such as {subject, predicate, object} triples, URIs, blank nodes, plain and typed literals, and ontologies. This demo does not explain these concepts in detail, but focuses instead on how to setup the demo in the Oracle Database to start working with the ontology. The database contains semantic data and ontologies (RDF/OWL models), as well as traditional relational data. To load semantic data, bulk loading is the most efficient approach, although you can load data incrementally using transactional INSERT statements. You will perform bulk loading in this demo.
Introduction to the NCI Semantic
Network
This example works with the NCI Thesaurus Produced by the Enterprise Vocabulary System (EVS), a collaborative project by the NCI Center for Bioinformatics and the NCI Office of Communications. National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, U.S.A.
The OWL version of the NCI Thesaurus is produced by the EVS in collaboration with Jim Hendler and Jen Golbeck, Maryland Information and Network Dynamics Lab, Semantic Web Agents Project (Mindswap) University of Maryland, College Park, Maryland, USA
Ontology URI: http://www.mindswap.org/2003/nciOncology.owl
Version: 03.09d
Default namespace: http://www.mindswap.org/2003/nciOncology.owl#
Namespace Prefixes:
The 17 first level Classes are:
You will create a schema user called nciuser which will own three tables:
Data will be loaded into the semantic model called nci. This will load data into table nci_rdf_data. You will then populate PATIENTS_DATA table with random data to simulate patients information, then use Semantic Operator to query on the model and patients table.
Before starting this tutorial, you should:
| 1. |
Install Oracle Database 11g. Install Oracle Examples Media 11g. |
| 2. | Run the required procedure for Semantic Technologies
Support in 11g. Follow the steps in the following note on https://metalink.oracle.com:
Note 452989.1: Required Procedure for Semantic Technologies Support in 11g
|
| 3. |
Install the 11.1.0.6.0 Semantic Technologies patch. Note: The patch is NOT necessary if the Oracle Database 11.1.0.7.0 patch set or higher is installed. Log into Metalink at https://metalink.oracle.com and click on the Patches & Updates tab. Select "Simple Search" and in the Search by drop down list select Patch Number(s). Enter the Patch Number, 7032734. Select your Platform or Language. Click GO. To install the patch, read the instructions in the file, jenadrv_patch_ig.txt.
|
| 4. | Setup the database to support Semantic Networks.
If it has not been setup, follow the steps in the following note on https://metalink.oracle.com:
Note 454371.1: How To Set Up The Database For Semantic Networks Support?
|
| 5. |
Make sure the user account, mdsys, is unlocked. Log in as a sys user and then issue the following statement to unlock mdsys. alter user mdsys identified by mdsys account unlock; |
As a prerequisite, you must have already created and configured a tablespace, rdf_tablespace, that stores the semantic network support. In the following steps, you will create two additional tablespace: the rdf_users tablespace will store user objects (default tablespace for nciuser schema) and rdf_temp will be used by the operations done by nciuser schema (temporary tablespace for nciuser). In this topic, you will also create the nciuser and the model.
|
1. |
Start SQL*Plus. At the Enter username prompt, enter the following to log-in as a privileged user: sys/oracle as sysdba
|
|
2. |
Create the rdf_users tablespace by running the following command: Note: In the datafile path, replace the <user_name> and <SID> with appropriate folder name of your ORACLE_HOME. create tablespace rdf_users
|
| 3. |
Create the rdf_temp tablespace by running the following command: Note: In the tempfile path, replace the <user_name> and <SID> with appropriate folder name of your ORACLE_HOME. create temporary tablespace rdf_temp
|
| 4. |
Create the nciuser and grant the required privileges to the user by entering the following commands on the SQL*Plus prompt: create user nciuser identified
by nciuser grant create session, resource to nciuser;
|
| 5. |
Connect as nciuser/nciuser username/password. Create the table, nci_rdf_data, the model table that stores the codes for the triples in the Model named NCI Enter the following commands on the SQL*Plus prompt: connect nciuser/nciuser; create table nci_rdf_data (id number, triple sdo_rdf_triple_s); execute sem_apis.create_sem_model('nci', 'nci_rdf_data', 'triple');
|
In this topic, you will load the staging table, stable, using SQL*Loader utility.
| 1. |
Assuming you are still connected as nciuser, create a staging table stable. To create the table, enter the following command: create table stable (
RDF$STC_sub varchar2(4000) not null,
RDF$STC_pred varchar2(4000) not null,
RDF$STC_obj varchar2(4000) not null,
RDF$STC_sub_ext varchar2(64),
RDF$STC_pred_ext varchar2(64),
RDF$STC_obj_ext varchar2(64),
RDF$STC_canon_ext varchar2(64)
);
|
| 2. |
Grant insert privilege on the nci_rdf_data table to mdsys. Also, grant select, update privileges on the stable table to mdsys schema. Run the following: grant insert on nci_rdf_data to mdsys;
grant SELECT, UPDATE(RDF$STC_sub_ext,RDF$STC_pred_ext,RDF$STC_obj_ext,RDF$STC_canon_ext)
on stable to MDSYS;
|
| 3. |
Open a command window. Use sqlloader to load nci_z.nt file into the staging table, stable. Note: Make sure you have downloaded and unzipped file nci_z.zip so you have nci_z.nt ready before executing the sqlldr command (note that nci_z.nt file is around 80MB in size). For this tutorial, find the nci_z.zip file here. Note: Change to the folder that contains the nci_z.nt file. Set ORACLE_HOME to appropriate path.
This will take some time. Wait until you receive a message: Load completed - logical record count 464841.
|
| 4. |
Load the data into the Model from the staging table. The syntax for
the bulk_load_from_staging_table
procedure is: exec sem_apis.bulk_load_from_staging_table('<model_name>',
'<owner of the staging table>',
'<staging table name>');
Run the following:
exec sem_apis.bulk_load_from_staging_table('nci','nciuser','stable');
|
| 5. |
Create the rules index by running the following statements: execute sem_apis.create_entailment('nci_idx',sem_models('nci'), sem_rulebases('owlprime'),0, null);
|
| 1. |
Before you run the script that creates the build_patients procedure, connect as mdsys and then set serveroutput on. connect mdsys/mdsys set serveroutput on
|
| 2. |
Run the build_patients.sql script. Extract the build_patients.sql script from the build_patients.zip file located here. This script creates a procedure, build_patients. To run the script, save the script on your local machine and then run it from the sql prompt as follows: @c:\<path where you saved the script>\build_patients.sql; Note: Replace <path where you saved the script> with the path of the folder where you saved the build_patients.sql file. |
| 3. |
If the procedure compiled with errors, enter show errors to view the errors. Otherwise, grant execute privileges on the build_patients procedure to public. Run the following statement. grant execute on build_patients to public; |
| 4. |
Connect as nciuser and run the build_patients procedure. Run the following statement: connect nciuser/nciuser; exec mdsys.build_patients(100000, 1, 'patients_data');
|
| 5. |
Confirm that the data was loaded in the patients_data table. Run the following query: select count(*) from patients_data; |
| 1. |
Assuming you are still connected as nciuser, create the index, nciIndex on patients_data table. Run the following statements: create index nciIndex on patients_data(diagnosis)
indextype is mdsys.sem_indextype parameters ('ONTOLOGY_MODEL(NCI), RULEBASE(owlprime)');
|
| 2. |
Gather statistics on the table. Run the following statement: execute dbms_stats.gather_index_stats('nciuser','nciIndex');
|
| 1. |
Connect as nciuser/nciuser username/password. The following query uses the SEM_MATCH function to show all the 'Subjects' that are of type Class. These includes all the Classes in the Ontology. select s
from table(sem_match ('(?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2002/07/owl#Class>)',
sem_models('nci'),
null,
null,
null));
Note: The screenshot shows only the partial output.
|
| 2. |
The following query returns the triples that have Finger_Fracture as the 'Subject'. Note the use of variables 'p' and 'o' meaning 'any predicate' and 'any object'. select p, o |
| 3. |
The following query uses sem_match function to extract all the 'Objects' that are 'subClassOf' Finger_Fracture and it is adding the use of inference with the 'owlprime' rulebase. select o
|
| 4. |
The following query uses SEM_DISTANCE
and SEM_RELATED operators
to relate a relational table (PATIENTS_DATA)
to our NCImodel. It uses
variable 'diagnosis' to represent the DIAGNOSIS
column in PATIENTS_DATA
table and asks for the diagnosis that are a 'subClassOf' Upper_Extremity_Fracture. select diagnosis, SEM_DISTANCE(123) Note: You may get different results as the PATIENTS_DATA table is built with random data. |
| 5. | In previous SELECT statement
you return all the diagnosis from PATIENTS_DATA
table that meet the criteria, this help in case you need to list all the
patients, however, if you just want to know the different diagnosis that
are a 'subClassOf' Upper_Extremity_Fracture, you can add the distinct
operation as in the following query:
select distinct diagnosis, SEM_DISTANCE(123) |
| 6. | The following query shows the patients that have a diagnosis which is
a subClassOf Connective_Tissue_Disorder.
select distinct diagnosis, SEM_DISTANCE(123) |
This tutorial is an introduction to the NCI model and provides an overview on ontology.
In this lesson, you learned how to:
| Create the infrastructure and model to hold the RDF data | ||
| Load RDF data into the model and the patients table | ||
| Create semantic indexes | ||
| Query the model and the patients table to extract information | ||