Integration using Data Services

by Manu Madhusudhanan R
06/20/2005

Abstract

Diverse, distributed sources of information can pose many challenges for the architects responsible for creating and maintaining applications by integrating and aggregating data from these sources. In the past, developers have had to write custom code using the APIs available for each of the disparate data sources. This is costly and fragile; changes would require modifying the custom code.

The challenge for all application architects and data experts is to easily deliver data to the applications while ensuring reusability of the work done to integrate, aggregate, and transform data. Enterprise Information Integration (EII) solutions solve this problem. These solutions typically provide a Data Service Layer, which radically simplifies access and integration of distributed data.

This article looks at a solution based on the Data Service Layer as well as the role of XQuery in solving integration problems in business data. It also shows how the level of abstraction provided by this type of solution makes disparate data sources transparent to end users, so it is very easy to compose additional services.

The Service Oriented Architecture Way

The trend toward Service Oriented Architecture (SOA) is not initiated by buzz words but rather by needs. SOA can be applied to the data integration space too. The Data Service Layer acts as the abstract layer that talks with underlying resources and takes away the data location, type, and administration from the application, leaving behind a virtual data source. To an application developer, a virtual data source means a focus on the data problem at hand, forgoing the need to rewrite the plumbing to access heterogeneous data. One of the motivations behind SOA is a loosely coupled system: Data layering provides this at the data realm.

In addition, data services (which are explained later) expose public functions that consumers may be interested in and disentangle users from underlying details.

When designing a data access layer infrastructure, many factors need to be addressed to provide an efficient, scalable solution. The following are some of the most important:

  • Everyone hates having one more standard around EII, so we have to choose an existing standard that will solve the problem and make life easier for developers and data experts.
  • Metadata is represented by different data sources in different forms; we need a unified metadata information for all sources. I think you can guess what we will use for this.
  • The format of data returned by the data sources is another challenge when different types of data sources are involved. XML comes in handy in such situations.

The rest of this article looks at how these ideas can be applied to the data service layer.

A Data Service Layer

The Data Service Layer helps to formulate a unified data catalog. Once we have this ‘data catalog,’ according to the necessity of the application it can be logically segmented and each segment represented as a “data service” that can be accessed through a standard API. The data catalog therefore becomes an interface that is easily reused by other developers and applications. The architecture of a Data Service Layer-based solution is shown in Figure 1.

Having a single catalog comes with a number of advantages:

  1. Centralized control of data makes management of data easy.
  2. Real-time data is available (against Extract, Transform, and Load (ETL) tools).
  3. An integrated virtual data source makes report generation easy.
  4. Data access is simplified, as a uniform interface now replaces multiple heterogeneous APIs that would otherwise be needed.

The architecture of a Data Service Layer
Figure 1: The architecture of a Data Service Layer

A data service encapsulates the business logic of the application. Therefore a data service is a self-contained module, with each data service being considered an XML-based class modeled after an entity in an E-R diagram. The entities can be related by functions.

To make the data services self-contained, ideally they should contain the following information.

  1. Metadata about the data source
  2. Connection information
  3. Interface Functions for reading/navigating and updating data sources
  4. Relation information among data services
  5. Security information
  6. Metadata about the data service

Therefore, a data service act as the logical interface for the underlying data infrastructure; it can be called a virtual view. This data service contains enough information to look up, execute, and transform information in useful formats for the client applications.

Even though we defined the requirements of the data service infrastructure, we have to implement it with the earlier mentioned design factors in mind. An emerging design pattern involves the use of XQuery to provide unified views of disparate data sources. XQuery is a powerful language that was fostered by industrial leaders like IBM, BEA, and Oracle. It is a very expressive functional language with a simple, familiar syntax and an organic connection to XML data structures. Data services can leverage XQuery's inherent ability to work with both relational and non-relational databases. Therefore the interface functions exposed in data services can be implemented using XQuery. The availability of excellent XQuery engines and tools make it a natural choice as an integrating language over disparate data sources. The XQuery engine provides the core of the data integration.

Since XML data can be read universally by other applications, or it can be converted to Java objects via XMLBeans or other technologies, clients can access the information using industrial standards. This makes it an attractive solution in modern enterprise requirements.

AquaLogic Data Services Platform

BEA’s AquaLogic Data Services Platform embodies this approach. The platform provides an abstract Data Service Layer for the underlying data sources. The power of XQuery is exploited here. BEA uses XQuery to implement the interface functions exposed in Data Services, which can be read or relationship functions. The relationship functions help to establish a relationship with other data services. For example, an Orders data service can be related to the Line Items Data Service using a getItems() function. In addition, one data service can talk with another data service using XPath, greatly increasing the scope for integration.

Let's say you have a Customer data service that talks with an Oracle database and a PurchaseOrder data service that talks with a Web service. The Customer data service will expose functions like getCustomerByID() or getCustomers(), and the PurchaseOrder data service will have functions like getPOStatus(). We can use a third data service, CustomerSummary, which talks to the Customer and PurchaseOrder data services to provide integrated information.

Figure 2 shows an abstracted form of what a data service looks like. Here, return types of functions are modeled by schema. The pragma is used here to represent metadata information, which may include connection information for the data sources. This connection information is used by the XQuery engine when it fetches the data. The figure shows how XQuery is used to implement the functions. If you are not familiar with XQuery, you should still be able to understand the implementation by reading through the code of the function implementation. That is the power of XQuery. You can see how relations are established by referring from one Data Service to another.

A skeleton implementation of data service
Figure 2: A skeleton implementation of a data service

XQuery can return extracted data as XML, but this make a programmer’s life difficult since XML may not be a useful form to present in user interfaces. In BEA's implementation, the data is returned as Service Data Objects (SDO), which is based on a joint specification by BEA and IBM (JSR 235). BEA’s SDO implementation is an extension of XMLBeans.

SDO is a programming model that supports a disconnected application programming architecture, which uses data graphs to represent persistent data. SDO encapsulates tree-structured data graphs that consist of data objects. The data graph can be considered collections of value objects that can be disconnected from the data source, and any changes that are made can be persisted with the help of mediators when a connection is reestablished. The disconnected data sets make localized updates possible. See the References section for more information about this technique.

Conclusion

Future development efforts will focus on the integration of data, and the importance of EII tools is increasing. In this article we looked at a Data Service Layer-based solution, and the role of XQuery in solving integration problems in business data. We also showed how the level of abstraction provided by such solutions makes disparate data sources transparent to end users, making it very easy to compose additional services.

References

Manu Madhusudhanan R is a member of the Liquid Data team at BEA Systems. His expertise includes design and analysis of enterprise applications. His areas of interest include Design Patterns, Grid Computing and Agile Technologies.