|Architect: SOA |
sCrawler: SOA Dependency Tracker
by Sandeep Phukan
Published July 2009
SOA Processes often involve orchestrations that span hundred of processes. There is an increasing need to track the dependencies of these processes and represent them in graphical models that are easy to understand and efficient enough to handle the high number of involved processes. Given any arbitrary process, it is difficult to identify the other processes with which it interacts. Moreover, it should be possible to track such dependencies without the need to know the technical intricacies of the SOA integration. The sCrawler utility attempts to achieve this. This article discusses some of the problems inherent in the SOA service life cycle, and shows how automated dependency tracking can help to analyze and alleviate these problems.
Disclaimer: Readers should note that sCrawler is an experimental Graph Theory application for analyzing SOA Artifacts, and as such is not a supported product. Please see the appendix in this document for a list of resources including fully-supported Oracle products that provide many of the features available in sCrawler.
This paper is for SOA Developers and Architects who want to design or implement a SOA solution. It can also be used by anyone who wants to know how Process Tracking can be used in general to alleviate some of the problems in performing an impact analysis within an existing SOA implementation. It is assumed that the reader has a fair amount of SOA knowledge and has had some experience in designing or developing SOA implementations based on BPEL and ESB.
A SOA-based Integration consists of many loosely coupled units that perform some well-defined activities. Each of these business units is linked together in order to achieve a business function. Often, these units may be hundreds in number. It is difficult to identify the dependencies of each of these units. As such, analyzing the impact of any change becomes a time-consuming effort. Architects may also require verification that the implementations are in sync with the original design. Currently there is no easy way to do this. Developers have to delve into the actual code that goes into such business processes in order to identify the dependencies. While there are some utilities that provide a textual representation of these dependencies, these utilities fall short of an effective solution, and text-based representations are likely to make sense only to someone who understands SOA's intricacies. Furthermore, a developer working on an IBM-based solution cannot easily identify the dependencies within an Oracle SOA Implementation.
sCrawler is a utility that tracks dependencies and presents information about them in an application-agnostic manner. The graphical representations are intuitive and can scale to thousands of processes. It's use requires no knowledge of Oracle SOA Suite at all, and any user can easily identify dependencies and perform an impact analysis.
Service Lifecycle Problems
There are a lot of interpretations of the SOA Services Life cycle. However, almost of these variations can be categorized under three broad categories: Design Time, Runtime, and Change Time.
Design Time is the phase in which the loosely coupled services are conceived and put together to form a business application. This phase consists of several related sub phases: the definition of service contracts, building and acquiring the service, and testing and deploying the service. At this stage the services exist as passive processes in a container. One of the main challenges in this stage is identifying bottlenecks, like fan-ins and fan-outs, as well as making sure that the SOA service implementation is aligned with the planned Architectural blueprint. Furthermore, incremental testing of the processes is required. End-to-end testing of scenarios is a costly operation. Automated deployment of processes is also difficult, especially when a process is dependant on another process. Further, important information, such as Message Exchange Patterns, is hidden from the developer. These challenges become even more daunting for complicated processes.
Runtime is when the SOA Implementation starts executing and the business activity begins. At this stage, the processes are instantiated in the deployed container. One of the main challenges in this phase is to be able to map design time with runtime. If a particular execution route was chosen, other possible execution routes may not be immediately apparent. Furthermore, there may be no direct knowledge of the Message Exchange Pattern between the processes.
Change Time is the phase when inevitable alterations are made as business requirements change to take advantage of SOA's promise of increased agility or to optimize a previously delivered implementation. This is the perhaps the single largest phase of the entire life cycle. Given a particular business scenario, it is difficult to identify points of change as well as the impact that the change might affect in the existing implementation. A single point of change might have a ripple effect, and often it is not immediately known which processes and sub-processes might be affected.
Dependency Based Solutions
There is no single, all-binding solution to the problems SOA Service lifecycle discussed above. However, dependency tracking can provide a more intuitive means of analyzing and solving these problems.
Dependency Tracking Problems
Usually, most of the information required to track dependencies is already present in the container in which the SOA processes are deployed. The problem, however, is that this information is usually in XML format. Manually parsing this information can be time-consuming.
XML is not very intuitive. A manager or novice who wants a quick view of the deployed processes might not be very receptive to understanding XML terminologies. The main problem is that most of the SOA process containers provide raw XML. As such, the required approach is to be able to graphically present not the XML itself but the information the XML encapsulates.
Now, a manual approach to tracking dependencies can be quite complicated. Let's take a typical use case where one BPEL process invokes another BPEL process via an ESB. The called BPEL process uses an adapter to fetch results from a relational database. A manual approach to finding all the dependencies in this case would be to start from the initial parent BPEL process in the Oracle BPEL Console or JDeveloper and finding the ESB callout URL. It would then be necessary to move to the ESB Console or the ESB code to identify the BPEL process to which the process is being routed. It would then be necessary to return to the BPEL Console/JDeveloper source code to discover which Adapter is being called. All of the partner link URLs would have to be searched manually — a time-consuming task. The main problem is that the developer would have to spend more time finding the dependencies rather than focusing on the problem. Another problem is that if BPEL makes a call to some other process via ESB, the console doesn't show which ESB process is being used. In this case, the developer's only recourse is to go back to the original source code. Moreover, using the ESB console still requires an understanding of XSLT in order to identify processes routings.
What is really required is to abstract the user from the source code and XML semantics entirely. This would not only make dependency tracking more intuitive, but also allow the information to be shared across other applications that do not understand the SOA container or implementation semantics.
Design Time Dependency
Design Time Dependency refers to the dependency of a process as it exists after deployment but before execution. Oracle Application Server provides Runtime Tracking of instances or the runtime processes. Runtime dependency tracking is best suited for testing individual use cases. But it doesn't provide a holistic view of the process. Although an instance might be executing as per the use case, it might still be tied to several other rarely invoked processes. There is no way to identify these related processes. This makes it impact analysis difficult.
Design time tracking, on the other hand, provides a comprehensive image of the actual process. Currently, there is no way to get this image without reverting to the actual source code and manually recursing through all the related processes. This is a difficult task, especially if there is a high fan-out or fan-in. Moreover, a single BPEL process might use ESB for exchanging information with other BPEL processes or Adapters. There is currently no way to show the entire end-to-end image. Also, at the ESB end, the message interaction patterns are not obvious and require drilling. sCrawler attempts to fill this small gap and present the information in a fast, intuitive, and scalable manner via graphs.
Graph-Based Dependency And GraphML
Graphs are an ideal abstract data type for representing dependencies. This is because graphs are non-hierarchical and allow for individual elements to be interconnected in complex ways. In the proposed representation, information is represented via Directed Graphs, in which nodes represent processes (BPEL/ESB/Adapters/Third Party Web Services), while the edges represent the Message Exchange Patterns.
Graph representation is intuitive. It is possible to restrict the amount of information by confining the number of child processes or nodes that are related to any parent. This allows for complicated scenarios invloving hundreds of processes.
The graph-based dependency should also be shareable across applications. For this reason, we propose GraphML. GraphML is an XML based standard notation for all graph based data structures. In the simplest terms, a GraphML consists of XML elements for nodes and edges. The nodes are identified by node_ids while edges connect these node_ids. GraphML is also extensible and can refer to external sources as well as application specific attributes. Any application that understands GraphML can consume information generated by the Dependency Tracking Tool.
Dependency Tracking Approach
Any number of process callouts or routings can be initiated from BPEL or ESB. In BPEL, a process callout is preformed via Partnerlinks. An immediate map of all the partnerlinks for a BPEL process can be represented as the immediate neighbors of the BPEL node in a graph model. A crawler then takes each of the partnerlinks from the first map and discovers all the partnerlinks it invokes. In this manner a linearly recursive algorithm provides the entire graph of partnerlinks to which a BPEL process refers.
In ESB, all routings are denoted by routing rules. These routing rules are stored as plain XML in the Application Server. An ESB crawler first discovers all of the processes and then linearly recurses through this initial list to identify all related processes for levels 2, and so on. By the time the crawler returns it contains the full graph of related processes.
For any process, if both the ESB crawler and the BPEL crawler do not return a valid process link, that link is checked for an adapter. If no adapter is indentified, the process is returned as an unknown, and cannot be further crawled.
Initially, two separate crawler threads are spawned; one for BPEL and one for ESB. These crawlers are also able to identify adapter or third-party Web Services. These threads crawl independently and are joined after they return. The merged information is then converted to the GraphML format. The GraphML is then fed to a graphical console that contains a graphML engine. The engine converts the graphML representation into Graphics2D Objects which are then displayed on the graphical console.
sCrawler is a dependency tracking utility based on the discussions outlined in the previous sections. It maps the design time dependency of deployed SOA artifacts in an OC4J container. It makes analyzation intuitive and less time-consuming, and presents information in a more well-structured manner. It combines BPEL, ESB, Adapters, and all third-party Web Services into a single whole and presents a holistic image of all dependencies.
sCrawler was designed for easy use. For connecting to the Application Server, separate configuration is required. sCrawler always connects to the default connection on start-up (Figure 2). There can be only one default connection, and that connection must be valid. While configuring the connection, sCrawler connects to the Application Server (BPEL PM and ESB DT) and verifies that the connection is valid. If the connection is not valid, sCrawler displays a prompt to edit the configuration (Figure 3).
Figure 2: Connection Console
Figure 3: Failed Connection Example
Upon successful connection setup, the user is presented with the graphical console. The graphical console contains three main panels:
sCrawler enables constraint-based search of sub-graphs with immediate child processes. This can be convenient for viewing a complicated mesh of processes.
Figure 9: Simple View with Textual Search
Figure 10 illustrates the full-fledged front end.
Figure 10: sCrawler Front End
Currently sCrawler focuses on finding and displaying dependencies. However, as previously discussed, this dependency information has a number of uses. The following features will be added to the sCrawler in future.
Sandeep Phukan has been working on Java and Integration technologies since 2004. He currently works with Oracle SSI in Bangalore on SOA and Integration. His main focus areas are Data Structures and Algorithms, Reliable Multicast Transports, and Highly Scalable Distributed Computing.