| Developer: Open Source
Drupal + Oracle: Inside the OraDrup Project
by Gavriel Pedros
Do the adjectives "open source" and "standard" go hand-in-hand? As this effort to combine Drupal with Oracle Database XE attests, the answer is: not necessarily.
Published January 2007
Imagine the following scenario: While doing research for an implementation-related project, you read a blog or article about certain open source software that is "database independent" and that should work with "any" ANSI SQL compliant database, such as Oracle Database.
After some investigation, you find that the software is an impressive piece of work. Yet after a little more digging, you discover that it is in fact available for only an extremely limited number of database management systems—not for any standard database, as the blogger/author claimed. Consequently, you drop the project.
This issue is common across open-source software, which for all its advantages does not necessarily take a "standard" approach. In this case study, I will describe the OraDrup Project, which has the goal of combining the popular open-source Drupal CMS with Oracle Database—to make the former a viable option for the huge Oracle developer community, and vice-versa.
What is Drupal?
Open-source CMSs have been around for quite some time. They come in various forms and are implemented using a wide spectrum of technology including Java, PERL, Python, and the ubiquitous PHP. Some CMSs focus solely on one particular type of content such as the written word with maybe a smattering of image handling. Others can boast a feature list as long as your arm that includes audio, video, arbitrary documents, structured information, and events.
Drupal is an example of an advanced content management framework. With Drupal, it is possible to manage a variety of content and to do so in a consistent and well-organized manner. A Drupal site is multi-user oriented and can function as a blog, discussion forum, a collaborative authoring tool, directory, general community site, or any combination of them. It can be a very powerful tool for building complex Web sites or for incorporating a HTTP-based interface to an existing system.
Drupal started out in the year 2000 as a simple, nameless internal news site with a built-in
bulletin board allowing a group of student friends to leave each other messages. It was not long after that when the site established itself on the World Wide Web. It received the name it has today and was released as open source software. It captured the interest of many people and soon began to take on a life of its own, incorporating ideas such as collaboration, content management, work-flow, syndication, distributed authentication, and more. Today, there are four hundred or so people world-wide working on various aspects of the system and this number continues to grow.
Drupal's construction is highly modular. It has a slim kernel that supports a set of core modules that are responsible for providing application services. For example, the System module is responsible for managing the general functioning of the site, the Book module allows users to collaboratively author a book, and the Search module is responsible for indexing content and for undertaking keyword-based searches. The core modules are numerous and you are free to enable or disable them according to requirements. This makes the system very flexible indeed.
Looking outward, extending Drupal's capability beyond that provided by the core package becomes a matter of building and plugging in additional modules. In this particular domain, there is no shortage of contributed modules. For example, if software management happens to be a requirement, the Project module can be installed and activated. The Project Issue Tracking module extends the Project module's capability further. Modules are being actively maintained and improved by the Drupal community. They cover areas as diverse as geospatial schematization, OpenID, mind mapping, and even English to "pirate-speak" translation.
Another Drupal strength is its theming system. A theme is the term given to the look and feel of a Drupal site. In other CMSs, this feature is sometimes referred to as templating or "skinning." Drupal is very amenable to such customization.
The beauty of Drupal theming is that the action of radically customizing a site's look and feel does not require the modification of existing code. The generation of displayable items such as pages, images, form elements, and so on is handled using a collection of baseline rendering functions called "theme functions." The theme system allows a developer to step in and override these functions by placing alternative theme functions in the theme's directory. For example, a menu with options that were previously rendered as an unordered list can be transformed easily into a collection of animated tabs for a particular theme.
Why Bring Oracle and Drupal Together?
So where does Oracle fit in? To date, Drupal has principally been a MySQL-oriented system. A very popular deployment choice for Drupal installations is the LAMP stack (Linux-Apache-MySQL-PHP). Of course, Drupal is equally at home running under any platform capable of hosting MySQL and PHP.
This is all well and good for the average implementation but sometimes the "M" in LAMP is not an option. Some organizations may not have the luxury of being able to incorporate another database server into their operations. In other cases, MySQL simply might not be the right solution for the task. In such circumstances, the adoption of Drupal is unlikely.
Furthermore, with Drupal having such an active developer community, it is difficult to see why it should not expand its horizons to include Oracle technology. In doing so, Drupal will become a viable option for Oracle-based developers and vice-versa. Were that to happen, it would be very exciting to see it taking advantage of or benefitting from such Oracle features as:
- Data partitioning and parallelism
- Replication and queuing technology
- Real Application Clusters
- Transparent Application Failover and other high availability features
- Fine Grained Access Control at the database level
- Automatic segment space management
Furthermore, with Oracle Database having such a huge presence, it would offer the opportunity for Drupal to gain greater exposure where, previously, it would have been overlooked.
Much has been said about Drupal's potential to support an arbitrary number of database platforms. Indeed, support for PostgreSQL is packaged alongside that for MySQL. Why, then, has there been no reported sighting of Drupal running under Oracle? Despite their good intentions, projects embarking on the path of building Oracle compatibility seem to end up in a "Bermuda Triangle." For that reason, an Oracle-based Drupal implementation has attained almost mythical status—nobody has ever seen one.
I decided to discover the origins of this myth first-hand. How easy or difficult would it be to build Oracle support into Drupal 4.7.0? I founded the OraDrup Project to find out.
Examining the Source Code
The first subject under the microscope was the way Drupal interacts with the underlying RDBMS. One of the first things I noted when examining the code is that Drupal uses its own simple abstraction layer with which to communicate with the database. Naturally enough, the interface is divided along generic and proprietary lines. High-level functions are built into the generic layer with the core code talking directly to this layer. Vendor-specific implementations are conveniently hidden behind API functions found in the latter category.
An example of where this approach shines is the case of paging queries, where you wish to fetch only a subset of rows from a query's results. You will not find the core code peppered with proprietary LIMIT clauses. Instead, the request is fulfilled by way of an API function, db_query_range(), which is implemented for each target database. Platform-specific features remain localized and hidden behind such logical operations.
This conveniently brings us to the next point. The use of platform-neutral SQL throughout the code is encouraged in an effort to keep Drupal database-independent. This approach may well be subject to debate. It could be argued that in the attempt to find consensus across all database platforms, the SQL that ends up being used will tend toward the lowest common denominator. It is quite possible that the same query will run perfectly happily on one database while performing dreadfully on another. Past experience has shown that working with generic SQL-based platforms will almost inevitably involve modifications to one target platform or another in order to sort out performance problems when demands are high. Nevertheless, these are the facts on the ground.
A small issue came to light when I discovered that some column names in Drupal's data model were problematic. Given the user-oriented nature of Drupal, there happened to be many tables with foreign keys to the users table. Unfortunately, the primary key column in this table happens to be named uid, which collides head-on with a member of Oracle's reserved word list. Other column names of interest include session, comment, and mode, not to mention the existence of a table named "access." The fabled platform-neutral SQL approach couldn't prevent this conflict; there was no escaping the fact that queries needed to be modified to suit the target platform.
Speaking of queries, a small handful of queries were discovered that were not suitable for running on Oracle. A few examples include:
- SELECT expression (with no FROM clause and table name)
- Queries which involve the comparison of a character column with the empty string. By definition, the empty string is null in Oracle Database and these queries would therefore not behave as intended.
- GROUPing BY an unqualified, ambiguous column. This is non-standard SQL.
Some of these have since been rectified in later releases of Drupal but probably the most interesting issue was the use of long text. Drupal's data model makes liberal use of very long text columns. Whilst Oracle's VARCHAR2 data type goes well beyond accommodating most needs, it was clear that VARCHAR2's 4000-character limit would not be appropriate under these circumstances. After all, if Drupal could manage content of arbitrarily large size under MySQL, you certainly would want to do the same under Oracle.
Naturally enough, Oracle's Character Large Object (CLOB) data type is a good candidate for modeling such columns. However, the choice of CLOBs introduces a few challenges in this context.
First, Drupal's database queries like to do things with long text columns that would be objectionable on some platforms—including issuing SQL that compares one long text value (potentially up to 1Mb in length) with another using the equality operator. Another is grouping by such columns in some queries. These operations are forbidden by Oracle when CLOB and other Large Object (LOB) columns are involved.
The platform-neutral SQL syntax philosophy, as admirable as its intentions were, could not help to bypass this issue. By now, if there remained any thoughts of avoiding SQL modification, they were quickly fading away. Sometimes, developers tend to focus on their challenges a little more than they do upon the more positive things they come across. I'll be honest and say that if this task was completely without its challenges then there would be little gratification upon completing it.
The key to overcoming these challenges wasn't so much that the majority of SQL in Drupal was compatible with Oracle—it was the disciplined way in which all queries were issued. When examining the code base of similar systems, source code inspection is sometimes accompanied by screams of horror upon seeing queries containing parameter values embedded directly into the SQL body. Developer trauma associated with visions of avoidable hard-parsing, library cache redundancy, and resource wastage is common.
Thankfully, Drupal goes nowhere near such territory. It is delightful to see that queries and their arguments are segregated, with the queries themselves containing placeholders for their parameters. A nod in the direction of JDBC, perhaps? Regardless of the motivation, this fact makes that task of introducing bind variables almost trivial—and discovering it was the pivotal moment in deciding whether it was worth proceeding with the project or abandoning it.
To recap, the following discoveries had been made thus far:
- Oracle reserved words were present in the data model and appeared in many queries
- Unorthodox and non-standard SQL incompatible with Oracle was being issued
- Queries and their arguments were kept separate as a matter of policy.
Given the inevitable need to resolve the issues, the choice was simple: Dive straight into core code and tailor the queries inline, or hook into the database API and patch queries dynamically.
Instinct says that the former is the preferred approach: if there is a problem somewhere then the sensible place to put things right is at its source. Unfortunately, this implies directly modifying core code and diverging from the main code base. The aim of the project was to show that Drupal could run against Oracle and preferably without the need to hack chunks of core code to pieces. It was desirable to conceive of a method whereby not just core Drupal could run under Oracle, but contributed modules as well. If they too could leverage the solution then so much the better. It was worth the effort to see if this was a practical solution.
Implementation Decisions
A first-cut Oracle schema was put together based upon the MySQL data model. The following simplifications were made, leaving room for refinements and elimination of wastage later on:
- All MySQL integral types, regardless of specification and domain, will map to the INTEGER type.
- All MySQL floating point types will map to the FLOAT type.
- The CLOB type will be used liberally.
Querying was to take advantage of parameter segregation to slot bind variable placeholders into the SQL. Owing to this approach, the generic database API function, db_query(), needed re-implementing as a consequence. This was unavoidable and the experience suggests that it wasn't as generic a function as it was intended to be.
To reduce the number of trips to the database, it was decided that, when fetching results, to request all rows in the result set, buffering them locally for consumption by the session. This was based upon the following reasoning:
- The number of rows would not be particularly large.
- All requested rows would be processed eventually.
- The OCI8 extension to PHP could not report the number of rows in a result set.
Requesting them all and buffering them locally allows this information to be obtained.
Next came the query rewriting mechanism. On the subject of reserved word elimination, a simple approach would be to build an inspector into the database API implementation for Oracle and identify and quote occurrences of these words. A simple search and replace did the trick, though it must be noted that this was a regex search, which does not come cheap. Nevertheless, it was fairly straightforward.
The trickier task was to identify the problematic queries that needed rewriting. To assist, a simple QueryRewriter class was constructed that could take a string of SQL and determine whether it was clean or required modification. The approach would be to create one instance per "rogue" query and delegate the transformation task to it upon identification. This way, all Drupal queries passing through the Oracle database API could be vetted for compatibility. Furthermore, the grouping of QueryRewriter instances by Drupal module made it possible to reduce the number of inspectors to just those of the active modules. The disadvantage of adopting this approach is quite prominent. The price to be paid for this level of flexibility is measured in CPU cycles. Since the majority of Drupal's SQL queries pass through this process without being changed, there is a lot of avoidable work being undertaken. This lends additional weight to the argument of directly fixing SQL in the core code. Nevertheless, curiosity gave momentum to the re-writing approach.
Putting It All Together
Development got under way using Oracle9i Database and PHP4. The general idea was to use this environment as the starting point and then move on to Oracle Database 10g and PHP5 later. This initial move was not without its challenges, but most issues were rectified by using a more current version of OCI8.
The data model was created and set up in an initial state. Building the Drupal database API was the usual formality and was soon completed. Following that, a Drupal installation was created and all non-essential modules in Drupal were disabled. It was a painstaking process building the query rewriters, given that some of the rogue queries were not easy to spot. In the end it became a combination of analyzing source code and ad-hoc testing. Each passing day saw a new core module join the Oracle club until the list of modules was complete. Theoretically, Drupal was now compatible with Oracle.
A simple PERL script was written which populated the tables with articles of varying length. One thousand calls to the fortune program later, Drupal under Oracle9i was ready to simulate a realistic installation. Additional articles were created by hand and filters were applied to them. The articles were commented upon and the comments, in turn, received replies. RSS feeds were generated from the content and so on. The same was done when the site was switched from English to French and from French to Portuguese. Everything functioned very nicely indeed. During the course of this testing, it didn't take very long to become apparent that the overall response time of the system was respectable but hardly exciting. Considering the amount of work with which PHP was tasked, this was quite understandable. At the same time, given that this was a first-cut implementation, there was plenty of optimism for increasing the application's performance across the board.
But a pleasant surprise was on the horizon. After upgrading to PHP5, the response time became noticeably quicker, probably due to a combination of reasons including the newer, refactored release of the OCI8 extension. Drupal pages were being returned in well under a second. Turning on Drupal's caching mechanism improved response time still further. Bearing in mind that the entire system was running on a sole 1GHz Pentium III that was playing host to X, Web, MySQL, and Oracle servers, the system was doing very well indeed.
However, the best was yet to come. It was time to try out Drupal under Windows 2000, PHP5, and Apache. The installation of Oracle Database 10g Express Edition (XE), the free edition of Oracle Database 10g, was a breeze. (It was rather amusing to find that it took more time to get PHP5 and Apache up and running properly than the DBMS.) Running on the same hardware as before, a Drupal installation in this particular configuration was shockingly impressive. The response time could only be described as breathtaking. Eventually it became necessary to verify Drupal's configuration to ensure that it wasn't somehow mistakenly talking to a MySQL database. It wasn't. Running Drupal under Oracle Database XE was everything that was envisioned at the start of the project. The myth was finally laid to rest.
Where Are We Now?
Following an extended period of testing out all aspects of the system, the time had come to increase the number of eyes looking out for functional problems. The source code was made available for download and a request was made for volunteers. As of January 2007, this is the current state of the project. Hopefully, following a successful run of substantial user testing, issue capture and resolution, the next stage in the process is to focus on streamlining performance with an eye on scalability.
By the time development had reached completion, the following conclusions had been reached:
"Platform-neutral" SQL is not a good idea. The approach of using platform-neutral SQL is well-intentioned but painfully impractical for complex software. The experience in this project backs up this claim. Syntax is only part of the story. I haven't even mentioned the subtleties brought about by different locking models and the impact that the assumption that "all databases generally work in the same way" can have on correctness and scalability. It seems rather an obvious statement to make, but an architecture that fully embraces different database platforms in a flexible manner akin to the way that Drupal theming operates would have made this project relatively painless to implement. More importantly, it would open the doors to getting the best out of the target database management system. In my opinion, if a heavy-duty DBMS is at your disposal, it seems a great shame if you are discouraged from using the many custom features that it offers. That said, Drupal is young and still evolving. Its design has got a lot of things right and there are exciting features being talked about for future releases. It would not come as a surprise to see some innovative minds advocating such changes to the database API.
Clean communication with the database is important. The strategic decision to systematically segregate all SQL queries and their arguments, as employed by Drupal, may seem rather inconsequential. The approach came about as part of the way Drupal cleans query arguments to prevent SQL injection attacks. Rather than leaving this responsibility with each module developer, the system performs the cleansing as a standard part of its query processing. It was primarily this particular design decision that made the building of an Oracle compatible database API significantly easier simply because the use of bind variables would have been the first thing to come to mind when constructing the Oracle database API. If the query had to be programmatically disentangled or otherwise separated from its arguments, the chances are high that this project would have been abandoned before it had even begun. It is true that Oracle provides some very useful features such as the configuration settings CURSOR_SHARING = FORCE and CURSOR_SHARING = SIMILAR to work against the negative impact of such queries. This would not have been enough to convince me to continue. It was preferable to see the correct approach being taken without asking the database to metaphorically brush such issues under the carpet. Thankfully, Drupal admirably stands out from the crowd with its clean and disciplined approach.
The database model will only get better. The first-cut database model used for this Oracle implementation was very basic indeed. There is plenty of room for refinement in order to reduce waste and improve database performance as a result. No effort was made to strategically organize the tables into specific tablespaces, review the indexing, carry out partitioning or work of that nature. The primary goal was to see if a functionally correct Drupal implementation was possible. It was assumed that there would be plenty of scope for a comprehensive review, database administration and tuning exercise at a later stage.
Performance enhancements should be investigated. On the subject of performance gains that are easy to implement, a PHP intermediate code cache and optimizer can be a tremendous help. All development was undertaken without the use of any such software. The extent to which such tools can improve Drupal's performance is an unknown quantity at present. It should be mentioned that Drupal itself is undergoing further development in the area of performance enhancement. The forthcoming release of Drupal 5.0, for example, promises improved session handling and access checking to reduce the load they place on the database. It also makes an "aggressive" caching policy available as a configuration option. There are certainly plenty of options to investigate.
Another area which should be addressed under the general area of performance enhancement is connection management. The use of a connection pool in favor of opening and closing a connection per HTTP request would be of significant benefit. Recently it has come to light that database resident connection pooling may become available in a future release of Oracle. This would deliver a major boost for the performance of Web-based applications. There are a few avenues worth exploring, but these are for another project.
The dynamic query rewriting mechanism was adopted for flexibility and, in particular, for easier site-admin maintenance. Ideally, Drupal administration should not require the need to patch or replace core files when introducing a new database platform. Achieving the same result by installing modules is seen as far more acceptable approach. Another reason this was adopted was that Drupal contributors may only be familiar with MySQL and thus may not be able to offer Oracle compatibility "tweaks" where necessary. This approach allows for the creation of independently-created interim "module compatibility packs". In spite of adopting the query re-writing approach, it is acknowledged that queries should really not require correction on-the-fly, but should be valid at source. This is unquestionably the best approach, avoiding many CPU cycles lost on string manipulation. Perhaps the next evolutionary step for Drupal's database API is to gain the ability to issue vendor-specific SQL?
Conclusion
It remains to say that the project's deliverables comprise of a small number of drop-in files that make up the database API plus a Drupal module that hooks directly into the system.
So what is likely to be achieved now that it has been established that Drupal and Oracle can work together? That remains to be seen. This is just the first rung on the ladder and my hope is to have fueled sufficient interest to give the project enough momentum to move to the next level. Given that this combination of technologies is finally a possibility, it surely won't be long before a little ambition and some imagination join forces to yield some very interesting results.
The blog and source code for the OraDrup project can be found at bitfine.com/oradrup.
Gavriel Pedros is an information technology specialist working with emerging technology and open source software. He has a strong background that includes design, development and administration. Gavriel's interests include performance enhancement, Java, C++, web technology and musical composition. He has been working with database technology since Oracle 5.
Send us your comments
|