As Published In
Oracle Magazine
January/February 2007

FEATURE


Go XML

By David Baum

Oracle Database 10g gives enterprises a way to manage content-rich applications and diverse data with XML.

The increased use of content-rich applications and the internet, particularly for business transactions, has put new demands on enterprise databases. One way to handle these demands is with the increased use of XML. As a markup language capable of describing many different kinds of data, XML is used to store and exchange business information—including structured, unstructured, and semistructured data.

At the heart of Oracle's XML strategy is Oracle XML Database (Oracle XML DB), a unique feature of Oracle Database 10g that allows for generating, storing, retrieving, querying, and managing massive volumes of XML data. Oracle XML DB has been adopted by many enterprises that are seeking a way to take advantage of their database infrastructure and skills to manage diverse data.

Storage Options

One good example of how this works is CIMIRe—a government agency in Belgium that manages retirement, disability, and survivor benefits for workers and their families. The agency maintains information about 12.8 million Belgian citizens and uses Oracle XML DB to manage more than 200 million XML documents—and growing, by 70 million documents each year—making it one of the biggest XML databases in Belgium. "We are dealing with high volumes of rather complex XML data that need to be online for 45 years," says Philippe Delcourt, IT manager for CIMIRe. "The only way to get the best-possible response time is to have a database that does not treat XML as just text but rather understands XML."

According to Delcourt, developers at the agency need to be able to store and access XML data without compromising performance. "We want to avoid having two databases—one for relational content and another for XML data," Delcourt says. "Since both data structures are stored in our Oracle database, we do not have to spend time integrating, maintaining, and backing up two different data stores."

Additionally, CIMIRe needs to be able to exchange information in XML format with other social security offices. Using XML in the database allows the agency to send data to these partners without making any transformations. And because Oracle's XML functionality is accessible through the standard SQL interface, CIMIRe can use many applications to query its XML data.

"We can use one interface to query and update both relational and XML content," Delcourt says. "The complex XPath searches are very fast, with subsecond response times, and we have maintained this superb performance despite our growing volume of data."

Volume Simplified 

Structural Details


There are two primary ways to store XML content in a relational database—structured and unstructured.

Structured storage entails decomposing the content of the XML document into a set of objects. A benefit to this storage approach is that the data can be accessed by applications that understand only relational technology.

When an XML schema is registered with Oracle XML DB, the required type definitions are automatically generated from the XML schema, so they can be decomposed and stored in the database without any loss of information. This allows Oracle XML DB to leverage the full power of standard SQL interfaces, while reducing storage space and memory requirements.

With unstructured storage the entire XML document is stored natively as a character large object (CLOB) inside the database, and this storage yields optimal throughput when inserting and retrieving whole documents. This approach allows the document to be retained intact—increasingly important for digital signatures and authentication.

Oracle Database 10g, which uses standard access methods for navigating and querying XML based on the World Wide Web Consortium (W3C) XML data model, supports both XML storage approaches.

The State of California Office of Legislative Counsel finds that Oracle XML DB enhances the power of Oracle Database 10g while reducing storage space and memory requirements. These capabilities are particularly appealing to the Office of Legislative Counsel, because it creates a massive document repository for storing about 350,000 bills, constitutional amendments, resolutions, and legislative measures.

The Office is responsible for drafting all of the legislation for the California State Assembly and State Senate. It maintains California statutes and codes—a compilation of all the bills that have been passed and signed into law, and Oracle Database 10g serves as a central repository for these. "We selected Oracle because we needed a database that is highly available and dependable and that can support relational structures along with flexible XML features as part of our bill-drafting system," says Mendora Servin, product manager for the Office of Legislative Counsel.

Servin and her team maintain the bill-drafting system for the State Assembly and State Senate. They also index all the documents to simplify searching and research. "Our drafting system stores content in Oracle XML DB in a schema-based system, according to a carefully defined structure," she says. "Using Oracle XML DB to create and store structured documents lets us automate a lot of the process."

The Office worked with Oracle, Linsonic, and the Xcential Group to establish a data model by leveraging the schema-based aspects of the Oracle XML DB technology. "Being able to create a very complex and powerful schema—and leverage [Oracle] XML DB to implement it in an object-relational fashion—was a big time-saver," explains Don Neithinger, a consultant with Linsonic. "Oracle XML DB virtually eliminated the pain of data modeling, since the data model evolved directly from the schema. The application is tightly coupled with the schema and allows users to mark up the documents the way they need to."

With the initial database development work behind them, the Office is immersed in putting hundreds of thousands of documents in a searchable format for the public. "Most of our users are attorneys who don't have time to learn complex information systems," says the Office of Legislative Counsel's Servin. "Our Oracle-based system makes it easy for them to find the exact information they need." 

Snapshots


CIMIRe

www.cimire.fgov.be
 Location: Brussels, Belgium
 Industry: Public sector
 Oracle products: Oracle Database 10g, Oracle XML DB

State of California Office of Legislative Counsel

www.leginfo.ca.gov
 Location: Sacramento, California
 Industry: Public sector
 Oracle products: Oracle Database 10g, Oracle XML DB

British Sky Broadcasting Group

www.sky.com
 Location: Isleworth, U.K.
 Industry: Media and entertainment
 Oracle products: Oracle Database 10g, Oracle XML DB

While users enjoy the simplicity, Servin's team appreciates Oracle Database 10g for its performance, stability, and rapid-development facilities. "Oracle Database 10g has been very reliable," Linsonic's Neithinger says. "On top of that, Oracle Database 10g has superior repository features. No other vendor gives the same degree of flexibility in how to store XML content natively in the database."

Neithinger says that fragment-level data retrieval from XML documents is practically instantaneous in Oracle Database 10g, even when very large documents are involved. "A document hierarchy is created within the folder hierarchy that allows users to navigate very quickly," he says. "It's a convenient way for users to find specific parts of the California code."

The future looks extremely promising. Neithinger adds, "We are also very excited about the next-generation XML storage capabilities in the works, because they further enable efficient XML search and retrieval."

XML Evolution

As the popularity of XML grows, more enterprises are deploying XML-capable databases as back-end repositories, says Noel Yuhanna, an analyst at Forrester Research. "Oracle offers native XML support to store both unstructured and semistructured data into databases, including images, faxes, movies, XML, content, e-mail, and other complex datatypes," he says. "Customers like to use databases for this content because of the strong data management capabilities that they offer."

Of course, users don't care whether data is structured or unstructured, as long as the content is managed and stored efficiently. That's why Oracle works closely with standards bodies to retrieve structured and unstructured data in a unified way.

"Oracle has expanded the capacity of the database to handle text as well as media, images, videos, spatial data, and other kinds of information," says Vishu Krishnamurthy, Oracle's senior director of XML development. "We have progressively optimized speed and intelligence across datatypes and enhanced the SQL syntax to handle domain-specific information."

Oracle has made XML a fundamental datatype—as part of Oracle XML DB, Oracle created the XMLType object to process XML documents and messages, with instances for both structured and unstructured storage. In Oracle Database 10g Release 2, Oracle added standards-based XQuery capabilities, a schema-based resource metadata facility, a set of SQL functions for data manipulation operations on XML data, and much more.

"Traditionally unstructured information is stored in a file system that includes files, folders, and all associated metadata," says Krishnamurthy, "but this paradigm is inherently insecure and unreliable and certainly doesn't scale well. That's why Oracle provides the same access mechanism for all types of content that can be stored in a secure, centralized repository."

Media Convergence

This approach to managing content is ideal for British Sky Broadcasting (BSkyB), the largest digital pay-television platform in the United Kingdom and Ireland and a leading broadcaster of sports, movies, entertainment, and news. The company manages real-time XML data feeds, which contain rich customer information—everything from the purchase of personal video recorders and high-definition television services to a specific movie preference in the video-on-demand program. To support its burgeoning subscriber base, BSkyB deployed an Oracle data warehouse.

"We're using Oracle to capture a great deal of information about subscribers—not just basic contact information but also case-management data arising from technical inquiries," says Dave Crichton, a senior developer on BSkyB's Customer Marketing Business Intelligence team.

Crichton is responsible for maintaining a 3TB data warehouse of customer relationship management (CRM) information about all of BSkyB's 8.2 million subscribers, such as customer name, contact information, location details, and onsite equipment. The information is used to cross-sell and up-sell additional products and services to these established customers, as well as to reach out to new customers throughout the United Kingdom.

Each night, BSkyB's data warehouse is refreshed with information from its CRM system in the form of XML data feeds into relational data structures in the data warehouse. BSkyB uses Oracle XML DB to parse the messages and break them into relational structures, which are then passed through to the data warehouse. Setting up relational tables to store XMLType data enables the growing media company to keep schema-less XML documents in its Oracle database.

Crichton likes several things about the XML functionality in Oracle Database 10g—namely, a notable performance improvement, better stability, and plenty of room to scale to 10 million customers. "Oracle has always given us the performance we need, and the XML functionality is becoming progressively more robust," he says.

Next Steps


READ
 "
Getting Started with Oracle XML DB"

 DOWNLOAD /technetwork/software/products/database/oracle10g

Versatile Frameworks

XML has become a popular way to store and exchange complex information. Next-generation application development stacks will build XML-based content-management applications that are much more versatile and dynamic.

"That's when the technology will move from the edge to the core," Oracle's Krishnamurthy predicts, "as businesses start using XML to store all types of business information and to build applications that include voice mail interfaces, videos, new sets of connections and associations, and collaborative work models."

BSkyB's Crichton agrees with that assessment. He says the company plans to use Oracle XML DB to produce output for third parties, such as customer viewing profiles for informing targeted marketing campaigns.

"Oracle's built-in XML functions give us the option to produce XML data files in a single SQL query, which simplifies development and maintenance," Crichton explains. "Oracle is definitely at the forefront when it comes to building XML functionality." 


David Baum (david@dbaumcomm.com) is a freelance business writer based in Santa Barbara, California.


Send us your comments