Converting Relational Data Into RDF Format

This tutorial shows the two different methods of converting or materializing relational data into RDF graph data format.

Approximately 1 hour

Topics

This tutorial covers the following topics:

Using the D2RQ based method

Using the SQL based method

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Overview

This tutorial describes two different methods for converting or materializing relational data into RDF graph data format. The RDF data that is generated complies with the W3C standard. It can be processed by any W3C compliant tool or RDF data store, such as Oracle Database 11g.

The D2RQ-based approach is recommended because D2RQ provides customizable mapping files that allow users to specify the generation of URIs, enabling the reuse of URIs across different columns, tables, schemas, or even databases.

URI reuse here means using the same URI to represent the same resource. Note, the SQL-based approach, as provided in this tutorial, does not handle escapes, character encoding, new lines, tabs, and other special characters.

Back to Topic List

Prerequisites

Before starting this tutorial, you should:

1.

Install Oracle Database 11g Enterprise Edition. Ensure the Oracle Spatial 11g Option, Oracle Partitioning Option, and the Oracle Advanced Compression Option are installed.

 

2.

Run the required procedure for Semantic Technologies Support in 11g. Follow the steps in the following note on https://metalink.oracle.com:

Note 452989.1: Required Procedure for Semantic Technologies Support in 11g

 

3. Install the 11.1.0.6.0 Semantic Technologies patch.

Note: The patch is NOT necessary if the Oracle Database 11.1.0.7.0 patch set or higher is installed.

Log into Metalink at https://metalink.oracle.com and click on the Patches & Updates tab. Select "Simple Search" and in the Search by drop down list select Patch Number(s). Enter the Patch Number, 7032734. Select your Platform or Language. Click GO.

To install the patch, read the instructions in the file, jenadrv_patch_ig.txt.

 

4. Setup the database to support Semantic Networks. If it has not been setup, follow the steps in the following note on https://metalink.oracle.com:

Note 454371.1: How To Set Up The Database For Semantic Networks Support?

 

5. Install D2RQ. It can be downloaded from the following site: http://www4.wiwiss.fu-berlin.de/bizer/d2rq/

Click the Download link on the left side of the home page. Click D2RQ and D2R Server. Then click D2RQ API for Jena and Sesame ( version v0.5.1). Download either the .gz or .zip format (both are platform independent).

Once downloaded, unzip in a directory on your machine.

For more information on D2RQ and the D2R server, refer http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/

Note: Referring to the documentation on the D2RQ site is not necessary for this tutorial.

Back to Topic List

Using the D2RQ based method

This topic demonstrates the use of D2RQ with a small example.

In this topic you perform the following:

Setup the user and the table
Generate the mapping file in N3 format
Generate the N-Triple data file using the mapping file
Optionally modify the N3 file to meet your needs

 

Setup the user and the table

The following SQL statements create a simple employee table that models an employee hierarchy. A few sample rows are inserted in the table. In the following topics, you start by generating a mapping file in N3 [4] format using D2RQ v0.5.1.

1.

Create a new user, test. Grant connect, resource privilege to the user.

Log in as sys user and then run the following SQL statements:

grant connect, resource to test; 

 

2.

Connect as test user. Create a simple table, emp, that models an employee hierarchy.

To create the emp table, run the following statement:

conn test/test
create table emp (ID integer primary key,
name varchar2(30),
sex varchar2(1),
birth_year int,
report_to integer
);

 

3.

Insert a few sample rows in the emp table. Run the following INSERT statements:

insert into emp values(1, 'John', 'M', 1980, 2); 
insert into emp values(2, 'Mary', 'F', 1981, 3);
insert into emp values(3, 'Jack', 'M', 1982, 3);

Commit the inserted rows by using the following command:

commit;

 

 

Back to Topic

Generate the mapping file in N3 format

In the following steps, you generate the mapping file in N3 format using D2RQ v0.5.1.

1.

Download the batch_files.zip from here. Extract the two batch files (gen_map.bat, dump_rdf.bat) into a folder on your local machine.

Open the gen_map.bat file and edit the file to customize it according to your setup.

Generic syntax (d2rq.generate_mapping class):

<jdk_path>\bin\java –classpath <classpath> d2rq.generate_mapping
-u <username>
-p <password>
-d oracle.jdbc.driver.OracleDriver
jdbc:oracle:thin:@HOSTNAME:PORT:SID > output_file.n3


Examine the following components in the gen_map.bat file:

%ORACLE_HOME%\jdk\bin\java This is the java executable. Most likely you do not need to modify it. Make sure ORACLE_HOME is set.
<ORACLE_HOME>\jdbc\lib\ojdbc5.jar This is the jar file for Oracle's jdbc. Substitute with your ORACLE_HOME string value
<PATH_TO_D2RQ>\d2rq-0.5.1 This is the directory where D2RQ was installed. Substitute with the directory where you unzipped D2RQ. Replace in all the occurrences.
d2rq.generate_mapping This is the D2RQ main class you are using. Do not change this.
-u test -p test Assuming the username and password is test, do not change this.
-d oracle.jdbc.driver.OracleDriver This is the Class used for the jdbc connection to the database. Do not change this.
jdbc:oracle:thin:@hostname:port:SID > employee.n3

This is the jdbc database connection.

Change hostname to your machine name. Change port to your listener port. Change SID to your instance SID.

employee.n3 is the name of the output file, Leave it as is.

Note: The placeholders marked with a red rectangle should be edited according to your setup. There are multiple occurrences of <PATH_TO_D2RQ> in the file. Ensure you replace all of them correctly.

 

2.

Open a command window. Change to the folder where you saved the gen_map.bat file. Run the batch file. It outputs a mapping into file employee.n3 that is encoded using N3 syntax.

Note: Before you run this batch file, ensure ORACLE_HOME is set. 

Note: The "1" before >employee.n3 gets added by default when executing the batch file in the Windows environment. Do not add a "1" before >employee.n3 while editing the gen_map.bat file.

This will generate a employee.n3 file in the folder from which you ran the batch file.

 

Back to Topic

Generate the N-Triple data file using the mapping file

You can modify the N3 format file but in this tutorial you use it as it is to generate an RDF file (n-triple format) using the following steps. Note that N-TRIPLE is specified as the output format. Also a base URI is provided for resolving relative URI patterns.

1.

Open the dump_rdf.bat. Edit the file to customize according to your setup.

Generic syntax (d2rq.dump_rdf class):

<jdk_path>\bin\java –classpath <classpath> d2rq.dump_rdf
-u <username>
-p <password>
-d oracle.jdbc.driver.OracleDriver
-j jdbc:oracle:thin:@HOSTNAME:PORT:SID
-m employee.n3
-f N-TRIPLE
-b http://test.com/
> emp_table.ntriple

Apart from the components shown the step 1 of previous topic, the extra components are:

d2rq.dump_rdf This is the main java class used for the conversion
-f N-TRIPLE This is the output format. Do not change this.
-b http://test.com/ This is the base URI. Do not change for the purposes of this tutorial.

Note that in this case before the jdbc:oracle:thin:@hostname:port:SID line, you need a -j parameter that is not required in previous java command.

Note: The placeholders marked with a red box should be edited according to your setup. There are multiple occurrences of <PATH_TO_D2RQ> in the file. Ensure you replace all of them correctly.

 

2.

In the command window, change to the folder where you saved the dump_rdf.bat file. Run the batch file.

Note: Ensure ORACLE_HOME is set to appropriate path before running the batch file.

Note: The "1" before >emp_table.ntriple gets added by default when executing the batch file in the Windows environment. Do not add a "1" before >emp_table.ntriple while editing the dump_rdf.bat file.

 

3.

Open the generated emp_table.ntriple file in Notepad. The screenshot shows the output.

 

 

Back to Topic

Optionally modify the N3 file to meet your needs

These generated triples may be acceptable depending on the requirements of your applications. However, that is not always the case. For example, an obvious improvement can be made to the first triple. It uses a typed literal as the object value for the “report_to” property (relationship) that corresponds to the original “report_to” column in the relational table. It is more desirable to use a URI, instead of a generic decimal value, as the object value for (highlighted by a red box in the screen shot above) the “report_to” property, so that the generated triple set forms a more efficient linked graph.

1.

Open the employee.n3 file in Notepad. Find the following description in the mapping file employee.n3:

map:EMP_REPORT_TO a d2rq:PropertyBridge;

Replace the whole description by the following:

FROM:

map:EMP_REPORT_TO a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:TEST_EMP;
d2rq:property vocab:EMP_REPORT_TO;
d2rq:column "TEST.EMP.REPORT_TO";
d2rq:datatype xsd:decimal;
.


TO:

map:EMP_REPORT_TO a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:TEST_EMP;
d2rq:property vocab:report_to;
d2rq:uriPattern "TEST.EMP/@@TEST.EMP.REPORT_TO@@";
.

The highlighted text shown in the screenshot changes the typed literal to a URI as the object value.

Save the file.

 

2.

Re-run the dump_rdf.bat batch file to generate a new N-Triple file containing the following snippet.

The triples certainly model the original relational data better than the one generated using default mapping file.

Examine the screenshot.

If you are not satisfied with the fact that the schema name (“TEST” as shown in above snippets) is included in the URI, it is possible to modify the mapping file to skip the schema name. Refer to the D2RQ documentation for details: http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/

 

Back to Topic

Back to Topic List

Using the SQL based method

SQL statements can be used to generate RDF data directly from a relational table. You can write a simple query to generate triples that capture the "report_to" relationships.

In this topic, you perform the following:

Use SELECT statement
Use UNPIVOT in SELECT statement

Use SELECT statement

1.

Connect as test/test username/password.

Run the following simple query to generate triples that capture the "report_to" relationships.

Note: The following SQL is straightforward. However, it is non-trivial to ensure the generated data conforms to N-Triple syntax. Refer to Section 3 of RDF Test Cases document on http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/ for a clearly specified grammar.

select  '<http://test.com/TEST.EMP/' || ID || '> ' 
|| '<http://test.com/vocab/report_to> '
|| '<http://test.com/TEST.EMP/' || REPORT_TO || '> '
|| ' .'
from emp;

 

 

 

Back to Topic

Use UNPIVOT in SELECT statement

1.

Following the same approach, you can write individual SELECT statements to generate triples for each relevant column.

It may become tedious if there are many columns in the relational tables. A more compact way is to use UNPIVOT to move values from multiple columns into multiple output rows.

For example, the following SQL unpivots four columns (ID, REPORT_TO, BIRTH_YEAR, NAME) into four separate rows, each row representing a triple.

Run the following statements:

col subject format a33
col object format a33

select * from (
   select '<http://test.com/TEST.EMP/' || id || '>' as subject,
           to_char(id) as id,
           to_char(report_to) as report_to,
           to_char(birth_year) as birth_year,
           name as name
   from emp
  )
 UNPIVOT (object for predicate 
    in (id as 'ID',
        report_to as 'REPORT_TO',
        birth_year as 'BIRTH_YEAR',
        name as 'NAME'
       ));

 

Each of the rows in the previous screen shot maps to a single triple for each of the three column values.

Additional data processing may be necessary including proper URI/literal generation, actual character escaping, and N-Triple format conformance.

 

 

Back to Topic

Back to Topic List

In this lesson, you've learned how to:

Back to Topic List

Notation3 (N3): http://www.w3.org/DesignIssues/Notation3.html

Back to Topic List

Place the cursor over this icon to hide all screenshots.