RDF Semantic Data Management Using the Oracle Spatial 11g Option

Purpose

This tutorial shows how to use the semantic data management capability of Oracle Spatial Option for Oracle Database 11g to store and query the National Cancer Institute (NCI) Semantic Network Ontology.

Time to Complete

Approximately 1 hour

Topics

This tutorial covers the following topics:

 Overview
 Scenario
 Prerequisites
 Creating the Infrastructure and Model to Hold the RDF Data

Loading the RDF Data into the Model

 Loading the Patients Table
 Building Semantic Indexes

Querying the Model

 Summary

Viewing Screenshots

 Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Overview

Overview of Semantic Data Management Using Oracle Spatial Option for Oracle Database 11g

As part of Oracle Spatial 11g, an option for Oracle Database 11g Enterprise Edition, Oracle delivers an advanced semantic data management capability not found in any other commercial or open source triple store. With native support for RDF/RDFS/OWL standards, this semantic data store enables application developers to benefit from an open, scalable, secure, integrated, efficient platform for RDF and OWL-based applications.

These semantic database features enable:

 storing, loading, and DML access to RDF/OWL data and ontologies
 inference using OWL, RDFS semantics and user-defined rules
 SQL querying of RDF/OWL data and ontologies using SPARQL-like graph patterns
 ontology-assisted querying of enterprise (relational) data

This tutorial assumes that you are familiar with the major concepts associated with RDF and OWL, such as {subject, predicate, object} triples, URIs, blank nodes, plain and typed literals, and ontologies. This demo does not explain these concepts in detail, but focuses instead on how to setup the demo in the Oracle Database to start working with the ontology. The database contains semantic data and ontologies (RDF/OWL models), as well as traditional relational data. To load semantic data, bulk loading is the most efficient approach, although you can load data incrementally using transactional INSERT statements. You will perform bulk loading in this demo.

Introduction to the NCI Semantic Network

This example works with the NCI Thesaurus Produced by the Enterprise Vocabulary System (EVS), a collaborative project by the NCI Center for Bioinformatics and the NCI Office of Communications. National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, U.S.A.

The OWL version of the NCI Thesaurus is produced by the EVS in collaboration with Jim Hendler and Jen Golbeck, Maryland Information and Network Dynamics Lab, Semantic Web Agents Project (Mindswap) University of Maryland, College Park, Maryland, USA

Ontology URI: http://www.mindswap.org/2003/nciOncology.owl

Version: 03.09d

Default namespace: http://www.mindswap.org/2003/nciOncology.owl#

Namespace Prefixes:


The 17 first level Classes are:

Back to Topic List

 

Scenario

You will create a schema user called nciuser which will own three tables:

Data will be loaded into the semantic model called nci. This will load data into table nci_rdf_data. You will then populate PATIENTS_DATA table with random data to simulate patients information, then use Semantic Operator to query on the model and patients table.

Back to Topic List

Prerequisites

Before starting this tutorial, you should:

1.

Install Oracle Database 11g. Install Oracle Examples Media 11g.

2. Run the required procedure for Semantic Technologies Support in 11g. Follow the steps in the following note on https://metalink.oracle.com:

Note 452989.1 : Required Procedure for Semantic Technologies Support in 11g

 

3.

Install the 11.1.0.6.0 Semantic Technologies patch.

Note: The patch is NOT necessary if the Oracle Database 11.1.0.7.0 patch set or higher is installed.

Log into Metalink at https://metalink.oracle.com and click on the Patches & Updates tab. Select "Simple Search" and in the Search by drop down list select Patch Number(s). Enter the Patch Number, 7032734. Select your Platform or Language. Click GO.

To install the patch, read the instructions in the file, jenadrv_patch_ig.txt.

 

4. Setup the database to support Semantic Networks. If it has not been setup, follow the steps in the following note on https://metalink.oracle.com:

Note 454371.1 : How To Set Up The Database For Semantic Networks Support?

 

5.

Make sure the user account, mdsys, is unlocked. Log in as a sys user and then issue the following statement to unlock mdsys.

                                 
alter user mdsys identified by mdsys account unlock;
                              

Back to Topic List

Creating the Infrastructure and Model to Hold the RDF Data

As a prerequisite, you must have already created and configured a tablespace, rdf_tablespace, that stores the semantic network support. In the following steps, you will create two additional tablespace: the rdf_users tablespace will store user objects (default tablespace for nciuser schema) and rdf_temp will be used by the operations done by nciuser schema (temporary tablespace for nciuser). In this topic, you will also create the nciuser and the model.

1.

Start SQL*Plus. At the Enter username prompt, enter the following to log-in as a privileged user:

                                 
sys/oracle as sysdba
                              

 

2.

Create the rdf_users tablespace by running the following command:

Note: In the datafile path, replace the <user_name> and <SID> with appropriate folder name of your ORACLE_HOME.

create tablespace rdf_users
datafile 'c:\app\ <user_name>\oradata\ <SID>\rdf_users01.dbf' size 128M reuse
autoextend on next 64M maxsize unlimited
segment space management auto;


3.

Create the rdf_temp tablespace by running the following command:

Note: In the tempfile path, replace the <user_name> and <SID> with appropriate folder name of your ORACLE_HOME.

create temporary tablespace rdf_temp
tempfile 'c:\app\ <user_name>\oradata\ <SID>\rdf_temp01.dbf' size 128M reuse
autoextend on next 32M maxsize unlimited;

 

4.

Create the nciuser and grant the required privileges to the user by entering the following commands on the SQL*Plus prompt:

create user nciuser identified by nciuser
default tablespace rdf_users
temporary tablespace rdf_temp;

grant create session, resource to nciuser;

 

5.

Connect as nciuser/nciuser username/password.

Create the table, nci_rdf_data, the model table that stores the codes for the triples in the Model named NCI Enter the following commands on the SQL*Plus prompt:

                                 
connect nciuser/nciuser;
create table nci_rdf_data
(id number,
triple sdo_rdf_triple_s);
                              
                                 
execute sem_apis.create_sem_model('nci', 'nci_rdf_data', 'triple');          
      
                              

Back to Topic List

Loading the RDF Data into the Model

In this topic, you will load the staging table, stable, using SQL*Loader utility.

1.

Assuming you are still connected as nciuser, create a staging table stable. To create the table, enter the following command:

                               
create table stable ( 
RDF$STC_sub varchar2(4000) not null, 
RDF$STC_pred varchar2(4000) not null, 
RDF$STC_obj varchar2(4000) not null, 
RDF$STC_sub_ext varchar2(64), 
RDF$STC_pred_ext varchar2(64), 
RDF$STC_obj_ext varchar2(64), 
RDF$STC_canon_ext varchar2(64) 
); 
        
                            

 

2.

Grant insert privilege on the nci_rdf_data table to mdsys. Also, grant select, update privileges on the stable table to mdsys schema. Run the following:

                               
grant insert on nci_rdf_data to mdsys; 

grant SELECT, UPDATE(RDF$STC_sub_ext,RDF$STC_pred_ext,RDF$STC_obj_ext,RDF$STC_canon_ext) 
on stable to MDSYS; 
        
                            

 

3.

Open a command window. Use sqlloader to load nci_z.nt file into the staging table, stable.

Note: Make sure you have downloaded and unzipped file nci_z.zip so you have nci_z.nt ready before executing the sqlldr command (note that nci_z.nt file is around 80MB in size).

For this tutorial, find the nci_z.zip file here.

Note: Change to the folder that contains the nci_z.nt file. Set ORACLE_HOME to appropriate path.


sqlldr userid=nciuser/nciuser control=%ORACLE_HOME%\md\demo\network\rdf_demos\bulkload.ctl data=nci_z.nt direct=true skip=0 load=1000000 discardmax=0 bad=d0.bad discard=d0.rej log=d0.log errors=100000000

This will take some time. Wait until you receive a message: Load completed - logical record count 464841.


4.

Load the data into the Model from the staging table. The syntax for the bulk_load_from_staging_table procedure is:

                               
exec sem_apis.bulk_load_from_staging_table('<model_name>',
                                           '<owner of the staging table>',
                                           '<staging table name>'); 
                                 
Run the following:

exec sem_apis.bulk_load_from_staging_table('nci','nciuser','stable');
                              
                            

 

5.

Create the rules index by running the following statements:

execute sem_apis.create_entailment('nci_idx',sem_models('nci'), sem_rulebases('owlprime'),0, null);

Back to Topic

Loading the Patients Table

You will now run a procedure that will insert 100,000 rows into the PATIENTS_DATA table. Note, the procedure assumes that the mdsys user password is ' mdsys'.

1.

Before you run the script that creates the build_patients procedure, connect as mdsys and then set serveroutput on.

                               
connect mdsys/mdsys
set serveroutput on
                            

 

2.

Run the build_patients.sql script. Extract the build_patients.sql script from the build_patients.zip file located here.

This script creates a procedure, build_patients. To run the script, save the script on your local machine and then run it from the sql prompt as follows:

@c:\<path where you saved the script>\build_patients.sql;

Note: Replace <path where you saved the script> with the path of the folder where you saved the build_patients.sql file.

                               
                                  

                              
                            
3.

If the procedure compiled with errors, enter show errors to view the errors. Otherwise, grant execute privileges on the build_patients procedure to public. Run the following statement.

                               
grant execute on build_patients to public; 

                            
4.

Connect as nciuser and run the build_patients procedure. Run the following statement:

                               
connect nciuser/nciuser;
exec mdsys.build_patients(100000, 1, 'patients_data');

                                  
                              
                            

 

5.

Confirm that the data was loaded in the patients_data table. Run the following query:

                               
select count(*) from patients_data;

                                  

                              
                            

Back to Topic List

Building Semantic Indexes

When using the SEM_RELATED operator, you can create a semantic index of type MDSYS.SEM_INDEXTYPE on the column that contains the ontology terms. Creating such an index will result in more efficient execution of the queries. The CREATE INDEX statement must contain the INDEXTYPE IS MDSYS.SEM_INDEXTYPE clause, to specify the type of index being created.

1.

Assuming you are still connected as nciuser, create the index, nciIndex on patients_data table. Run the following statements:

                               
create index nciIndex on patients_data(diagnosis)
indextype is mdsys.sem_indextype parameters ('ONTOLOGY_MODEL(NCI), RULEBASE(owlprime)'); 

                                  

                              
                            
2.

Gather statistics on the table. Run the following statement:

execute dbms_stats.gather_index_stats('nciuser','nciIndex');

Back to Topic List

Querying the Model

You can use semantic operators to query relational data in an ontology-assisted manner, based on the semantic relationship between the data in a table column and terms in an ontology. The SEM_RELATED semantic operator retrieves rows based on semantic relatedness. The SEM_DISTANCE semantic operator returns distance measures for the semantic relatedness, so that rows returned by the SEM_RELATED operator can be ordered or restricted using the distance measure. Below we have some miscellaneous simple queries, using SEM_MATCH and using SEM_RELATED operators:

1.

Connect as nciuser/nciuser username/password. The following query uses the SEM_MATCH function to show all the 'Subjects' that are of type Class. These includes all the Classes in the Ontology.

                               
select s 
from table(sem_match ('(?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
                             <http://www.w3.org/2002/07/owl#Class>)', 
                             sem_models('nci'), 
                             null,
                             null, 
                             null)); 
   
                               
Note: The screenshot shows only the partial output.
                            

 

2.

The following query returns the triples that have Finger_Fracture as the 'Subject'. Note the use of variables 'p' and 'o' meaning 'any predicate' and 'any object'.

select p, o
                              
from table(sem_match
( '(<http://www.mindswap.org/2003/nciOncology.owl#Finger_Fracture> ?p ?o )',
sem_models('nci'),
null, null, null));
                               
                              
                                  
                                
                              
          
       
                            
3.

The following query uses sem_match function to extract all the 'Objects' that are 'subClassOf' Finger_Fracture and it is adding the use of inference with the 'owlprime' rulebase.

select o
                              
from table(sem_match ( ' (<http://www.mindswap.org/2003/nciOncology.owl#Finger_Fracture>
<http://www.w3.org/2000/01/rdf-schema#subClassOf>
?o)',
sem_models('nci'),
sem_rulebases('owlprime'),
null,
null));


4.

The following query uses SEM_DISTANCE and SEM_RELATED operators to relate a relational table ( PATIENTS_DATA) to our NCImodel. It uses variable 'diagnosis' to represent the DIAGNOSIS column in PATIENTS_DATA table and asks for the diagnosis that are a 'subClassOf' Upper_Extremity_Fracture.

                               
select diagnosis, SEM_DISTANCE(123)         
                                
from patients_data
where sem_related (diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
'<http://www.mindswap.org/2003/nciOncology.owl#Upper_Extremity_Fracture>',
sem_models('nci'),
sem_rulebases('owlprime'), 123) = 1
ORDER BY SEM_DISTANCE(123) asc;

Note: You may get different results as the PATIENTS_DATA table is built with random data.
                               
                                  
                              

        
                            
5. In previous SELECT statement you return all the diagnosis from PATIENTS_DATA table that meet the criteria, this help in case you need to list all the patients, however, if you just want to know the different diagnosis that are a 'subClassOf' Upper_Extremity_Fracture, you can add the distinct operation as in the following query:
select distinct diagnosis, SEM_DISTANCE(123)
                              
from patients_data
where sem_related(diagnosis, '<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
'<http://www.mindswap.org/2003/nciOncology.owl#Upper_Extremity_Fracture>',
sem_models('nci'),
sem_rulebases('owlprime'), 123) = 1
ORDER BY SEM_DISTANCE(123) asc;  Note: You may get different results as the PATIENTS_DATA table is built with random data.
6. The following query shows the patients that have a diagnosis which is a subClassOf Connective_Tissue_Disorder.
select distinct diagnosis, SEM_DISTANCE(123)
                              
from patients_data
where sem_related(diagnosis,
'<http://www.w3.org/2000/01/rdf-schema#subClassOf>',
'<http://www.mindswap.org/2003/nciOncology.owl#Connective_Tissue_Disorder>',
sem_models('nci'),
sem_rulebases('owlprime'), 123) = 1
ORDER BY SEM_DISTANCE(123) asc;

Back to Topic List

Summary

This tutorial is an introduction to the NCI model and provides an overview on ontology.

In this lesson, you learned how to:

 Create the infrastructure and model to hold the RDF data
 Load RDF data into the model and the patients table
 Create semantic indexes
 Query the model and the patients table to extract information

Back to Topic List

 Place the cursor over this icon to hide all screenshots.

 

 

 

Left Curve
Popular Downloads
Right Curve
Untitled Document