Python Data Persistence with Oracle Database

By Yuli Vasiliev

Learn how to build your own Python app backed by Oracle Database, utilizing Oracle XML DB, Python scripts, and PL/SQL stored procedures.  

Published January 2009

While Python is quickly gaining popularity among developers, Oracle Database has been No. 1 among enterprise-level databases for a long time. Coupling these two in an efficient way is an interesting topic for discussion, while this actually can be a real challenge, because both have a lot to offer.

Be warned, though; this article is not an overview of the most outstanding Python and Oracle Database features, providing a set of detached examples. Instead, with the help of a single sample, it tries to give you a taste of how these two technologies can be used together in a complementary way. In particular, the article walks you through the creation of an Oracle-backed Python application that implements the business logic in both Python and inside the database, utilizing stored PL/SQL procedures whose invocations are orchestrated from within a Python script.

As you’ll learn in this article, even the lightweight Oracle Database 10g Express Edition (XE) can be efficiently utilized as the database back end for your data-driven Web application whose front-end tier is built with Python. In particular, Oracle Database XE supports Oracle XML DB, a set of Oracle Database XML technologies that are often needed when it comes to building Web applications.

Sample Application

Collecting information about what the user does while using your application is becoming a popular mechanism for receiving user feedback. Often, a click-tracking tool incorporated in your online application can give you much more ideas about users’ preferences than any survey asking users to explicitly express a preference.

Taking a simple example, suppose you want to pick up the headlines of the three latest Oracle Technology Network (OTN) articles from the OTN – New Articles RSS page and place the links on your site. Then you want to collect information about how many times users have followed each of those links on your site. That’s what our sample is going to do. Now let’s try to figure out how all this functionality might be implemented. To start with, you have to decide how business logic will be distributed among application layers. In fact, deciding how to distribute business logic among application tiers is likely the most challenging part of planning a database-driven application. While there is usually more than one way to implement business logic, your job is to find the most effective one. As a general rule of thumb, when planning a database-driven application, you should seriously consider implementing key data-processing logic inside the database. This approach can help you cut down network overhead associated with sending data between the Web server and the database, as well as reduce the burden on the Web server.

Projecting all this theory onto our sample, you might, for example, put the burden of obtaining the article details being inserted into the database, on the stored procedures created in the database, thus saving the Web server the trouble of processing the task related to maintaining the data integrity. What this means in practice is that you won’t need to write the Python code responsible for keeping track of whether a record related to an article whose link is being clicked already exists in your database or not, and if not, inserting that record, obtaining all the required details from the OTN - New Articles RSS page. By letting the database itself keep track of such things, you get a much more scalable and less error-prone solution. In this case, the Python code will be responsible for just obtaining the article links from the RSS page and sending a message to the database when a user clicks an article link.

Figure 1 gives a graphical depiction of how the sample components interact with each other, as well as with external sources.

 

Figure 1
Figure 1: A high-level view of how the sample application works.


The rest of this article discusses how to implement the sample app. For a concise description of how to set up and launch the sample, you can refer to the readme.txt file in the root directory of the sample code.

Preparing Your Working Environment

To build the sample discussed here, you need to have the following software components installed (see Downloads portlet) and working properly in your system:

  • Apache HTTP Server 2.x
  • Oracle Database 10g Express Edition
  • Python 2.5 or later
  • mod_python module
  • cx_Oracle module


For a detailed description of how to install the above components, you can refer to another OTN article, “ Build a Rapid Web Development Environment for Python Server Pages and Oracle,” by Przemyslaw Piotrowski.

Designing the Underlying Database

It is generally a good idea to start with designing the underlying database. Provided that you have a user schema created and granted all the privileges needed to create and manipulate schema objects, your first step is to create underlying tables. In this particular case, you will need the only table called otn_articles_rss, created as follows:

CREATE TABLE otn_articles_rss (
 guid VARCHAR2(100) PRIMARY KEY,
 title VARCHAR2(200),
 pubDate VARCHAR2(32),
 link VARCHAR2(200),
 clicks INTEGER
);

The next step is to design a stored procedure called count_clicks that will be invoked from Python code, updating data in the otn_articles_rss table. Before you can proceed to the count_clicks procedure, though, you have to answer the following question: What will happen when count_clicks tries to update the clicks field of the article record that has not yet been inserted into the otn_articles_rss table? Let’s say that a new item was just added to the RSS page and then a link to that item appeared on your site. When someone clicks the link, the count_clicks PL/SQL procedure is invoked from within the Python code responsible for handling clicks performed on the links to OTN articles. Obviously upon processing the first click, the UPDATE statement being issued within the count_clicks procedure will fail, because there is no row to update yet.

To accommodate such cases, you can implement an IF block in the count_clicks procedure, working if the SQL%NOTFOUND attribute is set to TRUE due to the UPDATE’s failure to find the specified record. Within that IF block, you first can insert a new row into the otn_articles_rss table, providing only the guid and the number of clicks specified. After that, you should commit the changes so that becomes immediately available to the other user sessions that may be already in need of updating the clicks field of the newly inserted article record. Finally, you should update this record, setting its title, pubDate, and link fields. This logic may be implemented as a separate procedure, say, add_article_details, which you can create as follows:

CREATE OR REPLACE PROCEDURE add_article_details (gid VARCHAR2, clks NUMBER) AS
  item XMLType;
  heading VARCHAR2(200);
  published VARCHAR2(32);
  url VARCHAR2(200);
BEGIN
  SELECT extract(httpuritype.createuri(
                   'http://feeds.delicious.com/v2/rss/OracleTechnologyNetwork/otntecharticle').getXML(),
                   '//item[contains(guid, "'||gid||'")>0]') 
       INTO item FROM DUAL;
  SELECT extractValue(item, '//title'), 
         extractValue(item, '//pubDate'), 
         extractValue(item, '//link') 
       INTO heading, published, url FROM DUAL; 
  UPDATE otn_articles_rss SET
   title = heading,
   pubDate = published,
   link = url,
   clicks = clicks + clks
  WHERE guid = gid;
END;
/

As you can see, the procedure accepts two arguments. gid is the guid of the article whose link is being clicked. clks is the amount by which to increase the number of total article’s views. In the procedure body, you obtain the required portion of the RSS document as an XMLType instance, and then extract the information that is then immediately used to fill up the otn_articles_rss’ record associated with the RSS item being processed.

With add_article_details in place, you can move on and create the count_clicks procedure as follows:

CREATE OR REPLACE PROCEDURE count_clicks (gid VARCHAR2, clks NUMBER) AS
BEGIN
  UPDATE otn_articles_rss SET
    clicks = clicks + clks
   WHERE guid = gid;
  IF SQL%NOTFOUND THEN
    INSERT INTO otn_articles_rss(guid, clicks) VALUES(gid, 0);
    COMMIT;
    add_article_details (gid, clks);
  END IF;
  COMMIT;
END;
/

Transaction Considerations

In the count_clicks stored procedure shown in the above listing, note the use of COMMIT that follows immediately after the INSERT statement. What’s really important, though, is that it is followed by a call to add_article_details, whose execution may take much time. By committing at this stage, you make the newly inserted article record immediately available for other possible updates, which would otherwise be waiting for the completion of add_article_details.

Consider the following example. Suppose that the RSS page was just updated and a fresh new article link has become available. Next, two different users load your page and click this new link at almost the same time. As a result, two simultaneous calls to count_clicks will be made. In this case, the call that happens first will insert a new record into the otn_articles_rss table, and then it will call add_article_details. While the add_article_details execution is in progress, the other call to count_clicks could successfully perform the update operation, increasing the total click counts. However, if COMMIT were omitted here, the second call would fail to find a row for updating and, therefore, try to perform another insert. In fact, that would not lead to predictable results. What it would lead to, though, is a unique constraint-violated error and the loss of an update to be made by the second count_clicks call.

The most interesting part here is performing another COMMIT operation in the end of the count_clicks procedure body. As you might guess, committing at this stage is required to remove a lock from the record being updated, making it immediately available for updates performed by other sessions. Some may argue that this approach reduces flexibility, depriving the client of the ability to commit or roll back the transaction at its discretion. In this particular case, however, this is not a big issue because the transaction started upon a call to count_clicks should be committed immediately anyway. This is because count_clicks is always called when a user clicks an article link, thus leaving your page.

Building the Front-End Tier

Now that you have the stored procedures created and ready to be used in the application, you have to figure out how to orchestrate the entire flow of operations performed by all those pieces of application logic implemented inside the database, from within the front-end tier. This is where Python comes into play.

Let’s start with a simple implementation. In order to begin, you need to write some Python code that will be responsible for obtaining data from the OTN – New Articles RSS page. Then, you will need to develop code that will process clicks performed on the OTN articles’ links placed on your Web page. Finally, you will need to build that Web page itself. For that, you might use one of Python’s server-side technologies, say, Python Server Pages (PSP), which makes it possible to embed Python code into HTML.

To write Python code, you can use your favorite text editor, such as vi or Notepad. Create a file called oraclepersist.py and then insert the following code there, saving the file where the Python interpreter can find it:

import cx_Oracle
import urllib2
import xml.dom.minidom
def getRSS(addr):
    xmldoc = xml.dom.minidom.parseString(urllib2.urlopen(addr).read())
    items = xmldoc.getElementsByTagName('item')
    return items
def getLatestItems(items, num):
    latest=[]
    inxs = ['title','guid','pubDate','link']
    myitems = [item for index, item in enumerate(items) if index < num]
    for  item in myitems:
      latest.append(dict(zip(inxs,[item.getElementsByTagName(inx)[0].firstChild.data for inx in inxs])))
    return latest

As you might guess, the getRSS function shown above will be used to pick up data from the RSS page, returning it as a DOM object. getLatestItems is intended to process that DOM document, transforming it into a Python dictionary object.

In the getLatestItems function, note the use of list comprehensions—a new Python language feature that provides a great way to significantly simplify coding when it comes to data-processing tasks.

The next step involves creating the code that will handle clicks performed on the links to the OTN articles, which have been obtained from the OTN – New Articles RSS page and placed on your Web page. For that, you might develop another custom Python function—say, processClick— which will be invoked each time a user clicks an OTN article link on your page. To implement processClick, add the following code to oraclepersist.py:

def processClick(guid, clks = 1):
    db = cx_Oracle.connect('usr', 'pswd', '127.0.0.1/XE')
    c = db.cursor()
    c.execute('''call count_clicks(:guid, :clks)''', {'guid':guid, 'clks':clks})
    db.close()

The above code provides a simple example of cx_Oracle in action. The first thing it does is connect to the underlying database. Then, it obtains a Cursor object whose execute method is then used to invoke the count_clicks stored procedure discussed in the “Designing the Underlying Database” section earlier.

Now you can move on and build the Web page. Because this is only a demonstration application, the page can be very simple, including only links obtained from the RSS page. Within the APACHE_HOME/htdocs directory, create a file called clicktrack.psp and then insert the following code into it:

<html>
<head>
 <meta http-equiv="Content-Type"  content="text/html; charset=UTF-8">
 <title>latest OTN articles</title>
</head>
<body>
<h2>Three most latest OTN  articles</h2>
<%import oraclepersist
url =  'http://feeds.delicious.com/v2/rss/OracleTechnologyNetwork/otntecharticle'
doc = oraclepersist.getRSS(url)
articles =  oraclepersist.getLatestItems(doc, 3)
for article in articles:
%>
<% import urllib %>
<a href=<%= str(article['link'])  %> 
    onclick = "this.href =  '/dispatcher.psp?url=<%=urllib.quote_plus(article['link'])+str('&guid=')
           +urllib.quote_plus(article['guid'])  %>'">
       <%=str(article['title']) %>
</a><br/>
<% 
%>
</body>
</html>

As you can see, the above document contains several embedded Python code blocks. In the first block, you invoke functions from the oraclepersist module created as described in this section earlier, obtaining an instance of the list whose items represent the three most recent OTN articles. Then, you loop through this list in the for loop, generating a link for each article item presented in the list. It’s interesting to note that although each of these links refers to the corresponding OTN article address, the link’s onclick handler will dynamically modify the link’s destination to the dispatcher.psp page that you will also need to create in the APACHE_HOME/htdocs directory. You append two parameters to each dynamically generated link, namely guid and url, providing dispatcher.psp with the information about the article being loaded.

Here is the code for the dispatcher.psp:

<html>
<body>
<%
import oraclepersist
import urllib
from mod_python import util
params = util.FieldStorage(req)
oraclepersist.processClick(urllib.unquote_plus(params['guid'].value),  1)
psp.redirect(urllib.unquote_plus(params['url'].value))
%>
</body>
</html>

In the above code, you access parameters appended to the URL with the help of the FieldStorage class from the util module described in the Mod_python Manual available at the mod_python Web page. Then, you invoke the processClick function from our oraclepersist custom module, passing the guid extracted from the URL as the first parameter and 1, meaning one click, as the second. Finally, you redirect your browser to the location of the article to be loaded.

You are now ready to test the application. Because you’re dealing with live data, you must be connected to the internet. Once a connection is established, point your browser to http://localhost/clicktrack.psp. As a result, a simple Web page containing three links to the most-recent OTN articles should appear. This might look like Figure 2.

 

Figure 2
Figure 2: This is what the application page might look like when loaded.


Click any article link and see what happens. From the user’s point of view, you will see only the article being loaded into your browser, as shown in Figure 3.

 

Figure 3
Figure 3: The only thing a user sees when following an article link on the application page is the article itself.


The code responsible for collecting information about the click made will run in the background. To make sure it has done so, you can connect to the underlying database and issue the following query:

SELECT * FROM otn_articles_rss;

Even before the article document is fully loaded, the above should output a row containing information about the article being loaded, showing 1 in the clicks field. Each subsequent click to the link will increase the value of the clicks field by 1.

Do It in a Pythonic Way

The structure of the code you wrote in the preceding section doesn't seem much like the one implemented in a Pythonic way. In particular, you implemented a set of functions to be then invoked from within the code embedded in HTML, in a certain order, utilizing results returned by one function as parameters to another. Actually, that’s the way in which you could structure your code in any other scripting language—say, PHP.

The real power of Python, though, lies in its ability to hide boring implementation details, coming up with a simple, elegant, yet efficient coding solution. Dictionaries, lists, and list comprehensions are among the popular of Python’s built-in types, allowing you to significantly simplify your code when it comes to dealing with structured data. Turning back to the oraclepersist.py script discussed in the preceding section, let’s upgrade it to get the most out of these outstanding Python language tools. To avoid confusion, you can save the revision in a separate file called oraclepersist_list.py:

import cx_Oracle
  import urllib2
  import xml.dom.minidom
  url =  'http://feeds.delicious.com/v2/rss/OracleTechnologyNetwork/otntecharticle'
  inxs =  ['title','guid','pubDate','link']
  num = 3
  def getRSS(addr):
      xmldoc =  xml.dom.minidom.parseString(urllib2.urlopen(addr).read())
       items = xmldoc.getElementsByTagName('item')
       return items
   
                                
articles = [dict(zip(inxs,[item.getElementsByTagName(inx)[0].firstChild.data for inx in inxs])) for index, item in enumerate(getRSS(url)) if index < num]
def processClick(guid, clks = 1): db = cx_Oracle.connect('usr', 'pswd', '127.0.0.1/XE') c = db.cursor() c.execute('''call count_clicks(:guid, :clks)''', {'guid':guid, 'clks':clks}) db.close()

As you can see in the above code, utilizing a list comprehension—a very efficient mechanism of structuring application data—does a good job of reducing the overall code size. Furthermore, it eliminates the need for a client to explicitly make calls to the module functions. So, you can now rewrite the embedded Python code block in clicktrack.psp, as discussed in the preceding section, as follows:

...
<%import oraclepersist_list
for article in oraclepersist_list.articles:
%>
...

Although it is much shorter now, nothing changes from the user’s standpoint.

However, some may argue that this truly is not a flexible way of connecting code in a PSP page with its backend. With this new syntax, for example, the number of links to be displayed, as well as the RSS address to be used, is hardcoded in the oraclepersist_list.py script—you cannot change these parameters dynamically when needed. To address this issue, you can wrap the list comprehension in a function in the oraclepersist_list.py script, as follows:

...
def getLatestItems(num = 3, url = 'http://feeds.delicious.com/v2/rss/OracleTechnologyNetwork/otntecharticle'):
    inxs =  ['title','guid','pubDate','link']
    return  [dict(zip(inxs,[item.getElementsByTagName(inx)[0].firstChild.data for inx in  inxs])) for index, item in enumerate(getRSS(url)) if index < num]
...

As you can see, the above still takes advantage of the advanced syntax based on using list comprehensions, lists, and dictionaries, while allowing the parameters to be dynamically changed from within the clicktrack.psp page. The following snippet illustrates how you can now explicitly specify the number of article links to be displayed:

...
<%import oraclepersist_list
for article in  oraclepersist_list.getLatestItems(5):
%>
...

Using the Object-Oriented Approach

Although object-oriented programming (OOP) in Python is entirely optional, utilizing this paradigm can minimize redundancy, efficiently customizing existing code. Like other modern languages, Python lets you use classes for packaging logic and data, simplifying data-definition and data-manipulation operations.

Going back to the oraclepersist_list.py script discussed in the preceding section, let’s replace the processClick function with the HandleClick class shown below:

...
class HandleClick:
   def __init__(self, usrname='usr', password ='pswd', orcldb='127.0.0.1/XE'):
      self.dbconn = cx_Oracle.connect(usrname, password, orcldb)
   def __del__(self):
      self.dbconn.close()
   def processClick(self,guid,clks):
      self.dbconn.cursor().execute('''call count_clicks(:guid, :clks)''', {'guid':guid, 'clks':clks})

Assuming that you save the revision in the oraclepersist_class.py file, the updated dispatcher.psp now might look like this:

...
<%
import oraclepersist_class
import urllib
from mod_python import util
params = util.FieldStorage(req)
h = oraclepersist_class.HandleClick()
h.processClick(urllib.unquote_plus(params['guid'].value),  1)
psp.redirect(urllib.unquote_plus(params['url'].value))
%>
...

Here you create an instance of the HandleClick class and then call its processClick method, passing parameters exactly as you did before.

In the HandleClick class discussed here, particularly interesting is the use of special class methods __init__ and __del__. Like the other special methods, you never invoke them directly. Instead, Python implicitly invokes them in response to certain events happening during the instance lifetime. So __init__, a constructor, is invoked upon creation of the instance, and __del__, a destructor, is called just before the instance is destroyed.

In the above example, you connect to the database in the constructor and close the connection in the destructor. In some situations, however, it can be interesting to implement more operations in these methods. For example, you might want to issue a SQL statement from within the destructor just before the instance is destroyed. The following snippet illustrates how you could rewrite the HandleClick class so that the count_clicks stored procedure is invoked from within the destructor, rather than from a class method that is called explicitly:

...
class HandleClick:
   def __init__(self, usrname='usr', password ='pswd', orcldb='127.0.0.1/XE'):
      self.dbconn = cx_Oracle.connect(usrname, password, orcldb)
      self.params ={}
   def __del__(self):
      self.dbconn.cursor().execute('''call count_clicks(:guid, :clks)''', self.params)
      self.dbconn.close()
   def addArticleClick(self,guid,clks):
      self.params['guid']=guid
      self.params['clks']=clks

As you can see, there is no processClick in the updated HandleClick class anymore. Instead, client code is supposed to call addArticleClick, which populates the params dictionary, an attribute of the class, with the parameters to be passed to the count_clicks stored procedure that will be invoked from within the destructor. So, you can now rewrite the Python code block embedded in the dispatcher.psp page as follows:

...
<%
import oraclepersist_class
import urllib
from mod_python import util
params = util.FieldStorage(req)
h = oraclepersist_class.HandleClick()
h.addArticleClick(urllib.unquote_plus(params['guid'].value), 1)
del h
psp.redirect(urllib.unquote_plus(params['url'].value))
%>
...

Notice here the use of the del statement to unbind the h variable holding a reference to an instance of the HandleClick class. Since that’s the only reference to that instance, Python will then implicitly delete the instance, using a mechanism called garbage collection. Upon deletion, the __del__ destructor will be automatically triggered, executing the SQL statement and then closing the connection.

The above is a good example of how you can benefit from using special methods when developing object-oriented code in Python. In this particular example, the client code is responsible only for setting parameters for the query to be issued against the database, while Python does the rest implicitly.

Conclusion

As you learned in this article, developing a scalable database-driven Web application requires good planning. Before you can proceed to building application components and writing code, you first have to decide how much application logic can be implemented inside the database and what you might implement in the front-end tier.

When designing the article sample, you put some data-processing logic inside the database, implementing a couple of PL/SQL stored procedures. Here you learned how Oracle XML DB features can be used to obtain XML data from the Web and then extract the desired pieces of information from the obtained XML document. Then, you built some Python code that orchestrates the entire flow of operations performed by the stored procedures. This Python code in turn is invoked from within PSP pages that you built to implement the application’s front-end tier. As a result, you got the application that picks up some live data from the Web and keeps track of user activity on your site, storing this information in the database. On the Python side, you saw how to obtain, hold, and manipulate structured data, using the Python language’s built-in tools: lists, dictionaries, and list comprehensions. You also looked at how to take advantage of Python’s object-oriented features when packaging application logic and data in classes.


Yuli Vasiliev is a software developer, freelance author, and consultant currently specializing in open source development, Java technologies, databases, and service-oriented architecture (SOA). He is the author of Beginning Database-Driven Application Development in Java EE: Using GlassFish (Apress, 2008).