Architect: SOA
 Oracle SOA Suite
SOA, BPEL, ESB, All Architect Articles

sCrawler: SOA Dependency Tracker

by Sandeep Phukan

Published July 2009


SOA Processes often involve orchestrations that span hundred of processes. There is an increasing need to track the dependencies of these processes and represent them in graphical models that are easy to understand and efficient enough to handle the high number of involved processes. Given any arbitrary process, it is difficult to identify the other processes with which it interacts. Moreover, it should be possible to track such dependencies without the need to know the technical intricacies of the SOA integration. The sCrawler utility attempts to achieve this. This article discusses some of the problems inherent in the SOA service life cycle, and shows how automated dependency tracking can help to analyze and alleviate these problems.

Disclaimer: Readers should note that sCrawler is an experimental Graph Theory application for analyzing SOA Artifacts, and as such is not a supported product. Please see the appendix in this document for a list of resources including fully-supported Oracle products that provide many of the features available in sCrawler.

Intended Audience

This paper is for SOA Developers and Architects who want to design or implement a SOA solution. It can also be used by anyone who wants to know how Process Tracking can be used in general to alleviate some of the problems in performing an impact analysis within an existing SOA implementation. It is assumed that the reader has a fair amount of SOA knowledge and has had some experience in designing or developing SOA implementations based on BPEL and ESB.


A SOA-based Integration consists of many loosely coupled units that perform some well-defined activities. Each of these business units is linked together in order to achieve a business function. Often, these units may be hundreds in number. It is difficult to identify the dependencies of each of these units. As such, analyzing the impact of any change becomes a time-consuming effort. Architects may also require verification that the implementations are in sync with the original design. Currently there is no easy way to do this. Developers have to delve into the actual code that goes into such business processes in order to identify the dependencies. While there are some utilities that provide a textual representation of these dependencies, these utilities fall short of an effective solution, and text-based representations are likely to make sense only to someone who understands SOA's intricacies. Furthermore, a developer working on an IBM-based solution cannot easily identify the dependencies within an Oracle SOA Implementation.

sCrawler is a utility that tracks dependencies and presents information about them in an application-agnostic manner. The graphical representations are intuitive and can scale to thousands of processes. It's use requires no knowledge of Oracle SOA Suite at all, and any user can easily identify dependencies and perform an impact analysis.

Service Lifecycle Problems

There are a lot of interpretations of the SOA Services Life cycle. However, almost of these variations can be categorized under three broad categories: Design Time, Runtime, and Change Time.

Design Time is the phase in which the loosely coupled services are conceived and put together to form a business application. This phase consists of several related sub phases: the definition of service contracts, building and acquiring the service, and testing and deploying the service. At this stage the services exist as passive processes in a container. One of the main challenges in this stage is identifying bottlenecks, like fan-ins and fan-outs, as well as making sure that the SOA service implementation is aligned with the planned Architectural blueprint. Furthermore, incremental testing of the processes is required. End-to-end testing of scenarios is a costly operation. Automated deployment of processes is also difficult, especially when a process is dependant on another process. Further, important information, such as Message Exchange Patterns, is hidden from the developer. These challenges become even more daunting for complicated processes.

Runtime is when the SOA Implementation starts executing and the business activity begins. At this stage, the processes are instantiated in the deployed container. One of the main challenges in this phase is to be able to map design time with runtime. If a particular execution route was chosen, other possible execution routes may not be immediately apparent. Furthermore, there may be no direct knowledge of the Message Exchange Pattern between the processes.

Change Time is the phase when inevitable alterations are made as business requirements change to take advantage of SOA's promise of increased agility or to optimize a previously delivered implementation. This is the perhaps the single largest phase of the entire life cycle. Given a particular business scenario, it is difficult to identify points of change as well as the impact that the change might affect in the existing implementation. A single point of change might have a ripple effect, and often it is not immediately known which processes and sub-processes might be affected.

Dependency Based Solutions

There is no single, all-binding solution to the problems SOA Service lifecycle discussed above. However, dependency tracking can provide a more intuitive means of analyzing and solving these problems.

  1. Bottleneck Detection: A SOA based mesh of loosely coupled processes might have several bottlenecks, including fan-ins and fan-outs, cyclic processes, and redundant calls. Dependency tracking leads to discovery of these bottlenecks.
  2. Impact Analysis: A process may have n-number of sub-process call-outs to other loosely coupled services. Automated Dependency tracking recurses through and creates a graph of all of these service call-outs. This graph helps to identify child processes that would be candidates for a change given any change in the parent process. This would help to arrive at approximations for the required changes.
  3. Missing Process or Services: Dependency tracking always relates a parent to a child in a graph. This helps to identify missing processes or services. Normally, a parent would always end in either an adapter or a third-party provider service. If it ends in neither, a missing process has been identified.
  4. Hot Spots Detection: Dependency tracking can help identify core processes in a SOA end-to-end implementation. This helps to identify which processes may changed without drastic effect on the implementation. If a change is required owing to a change in business requirements, alternate execution routes can be selected as candidates for change.
  5. Message Exchange Patterns: Dependency tracking information will extract the message exchange patterns between any two applications. This can be used to identify potential issues with an implementation. For example, a synchronous requestor with an asynchronous provider.
  6. Automated Incremental Testing: Functional testing of a process involves testing the boundary conditions of the process based on the inbound and outbound messages and the existence or absence of all the dependant services the parent calls. Dependency tracking can help in identifying all the dependant processes that should be tested prior to testing the parent service. Since the dependency graph also lists all messages exchanged by the process, we can arrive at a way of configuring automated tests that use boundary values or average values thereof.

Dependency Tracking Problems

Usually, most of the information required to track dependencies is already present in the container in which the SOA processes are deployed. The problem, however, is that this information is usually in XML format. Manually parsing this information can be time-consuming.

XML is not very intuitive. A manager or novice who wants a quick view of the deployed processes might not be very receptive to understanding XML terminologies. The main problem is that most of the SOA process containers provide raw XML. As such, the required approach is to be able to graphically present not the XML itself but the information the XML encapsulates.

Now, a manual approach to tracking dependencies can be quite complicated. Let's take a typical use case where one BPEL process invokes another BPEL process via an ESB. The called BPEL process uses an adapter to fetch results from a relational database. A manual approach to finding all the dependencies in this case would be to start from the initial parent BPEL process in the Oracle BPEL Console or JDeveloper and finding the ESB callout URL. It would then be necessary to move to the ESB Console or the ESB code to identify the BPEL process to which the process is being routed. It would then be necessary to return to the BPEL Console/JDeveloper source code to discover which Adapter is being called. All of the partner link URLs would have to be searched manually — a time-consuming task. The main problem is that the developer would have to spend more time finding the dependencies rather than focusing on the problem. Another problem is that if BPEL makes a call to some other process via ESB, the console doesn't show which ESB process is being used. In this case, the developer's only recourse is to go back to the original source code. Moreover, using the ESB console still requires an understanding of XSLT in order to identify processes routings.

What is really required is to abstract the user from the source code and XML semantics entirely. This would not only make dependency tracking more intuitive, but also allow the information to be shared across other applications that do not understand the SOA container or implementation semantics.

Design Time Dependency

Design Time Dependency refers to the dependency of a process as it exists after deployment but before execution. Oracle Application Server provides Runtime Tracking of instances or the runtime processes. Runtime dependency tracking is best suited for testing individual use cases. But it doesn't provide a holistic view of the process. Although an instance might be executing as per the use case, it might still be tied to several other rarely invoked processes. There is no way to identify these related processes. This makes it impact analysis difficult.

Design time tracking, on the other hand, provides a comprehensive image of the actual process. Currently, there is no way to get this image without reverting to the actual source code and manually recursing through all the related processes. This is a difficult task, especially if there is a high fan-out or fan-in. Moreover, a single BPEL process might use ESB for exchanging information with other BPEL processes or Adapters. There is currently no way to show the entire end-to-end image. Also, at the ESB end, the message interaction patterns are not obvious and require drilling. sCrawler attempts to fill this small gap and present the information in a fast, intuitive, and scalable manner via graphs.

Graph-Based Dependency And GraphML

Graphs are an ideal abstract data type for representing dependencies. This is because graphs are non-hierarchical and allow for individual elements to be interconnected in complex ways. In the proposed representation, information is represented via Directed Graphs, in which nodes represent processes (BPEL/ESB/Adapters/Third Party Web Services), while the edges represent the Message Exchange Patterns.

Graph representation is intuitive. It is possible to restrict the amount of information by confining the number of child processes or nodes that are related to any parent. This allows for complicated scenarios invloving hundreds of processes.

The graph-based dependency should also be shareable across applications. For this reason, we propose GraphML. GraphML is an XML based standard notation for all graph based data structures. In the simplest terms, a GraphML consists of XML elements for nodes and edges. The nodes are identified by node_ids while edges connect these node_ids. GraphML is also extensible and can refer to external sources as well as application specific attributes. Any application that understands GraphML can consume information generated by the Dependency Tracking Tool.

Figure 1

Dependency Tracking Approach

Any number of process callouts or routings can be initiated from BPEL or ESB. In BPEL, a process callout is preformed via Partnerlinks. An immediate map of all the partnerlinks for a BPEL process can be represented as the immediate neighbors of the BPEL node in a graph model. A crawler then takes each of the partnerlinks from the first map and discovers all the partnerlinks it invokes. In this manner a linearly recursive algorithm provides the entire graph of partnerlinks to which a BPEL process refers.

In ESB, all routings are denoted by routing rules. These routing rules are stored as plain XML in the Application Server. An ESB crawler first discovers all of the processes and then linearly recurses through this initial list to identify all related processes for levels 2, and so on. By the time the crawler returns it contains the full graph of related processes.

For any process, if both the ESB crawler and the BPEL crawler do not return a valid process link, that link is checked for an adapter. If no adapter is indentified, the process is returned as an unknown, and cannot be further crawled.

Initially, two separate crawler threads are spawned; one for BPEL and one for ESB. These crawlers are also able to identify adapter or third-party Web Services. These threads crawl independently and are joined after they return. The merged information is then converted to the GraphML format. The GraphML is then fed to a graphical console that contains a graphML engine. The engine converts the graphML representation into Graphics2D Objects which are then displayed on the graphical console.

sCrawler Features

sCrawler is a dependency tracking utility based on the discussions outlined in the previous sections. It maps the design time dependency of deployed SOA artifacts in an OC4J container. It makes analyzation intuitive and less time-consuming, and presents information in a more well-structured manner. It combines BPEL, ESB, Adapters, and all third-party Web Services into a single whole and presents a holistic image of all dependencies.

  1. Intuitive Graphical Representation: sCrawler extracts all the process dependencies from the Oracle Application Server and presents the same as graphs in a 2D graphical console. It gets information on all the deployed processes and creates a selection tree. For each leaf in the process tree, it recursively builds the dependencies. The end user need not know anything about SOA or Application Server to find the dependencies.
  2. Based on GraphML Standards: sCrawler uses an built-in GraphML engine. This engine converts all the process related information, like Message Exchange Patterns, WSDL endpoints, process type, etc. in a graphML format. Any application that understands GraphML can use this information for rendering the process dependencies in any graphical format. This allows third party applications to easily use the dependency information without having to directly communicate with the SOA container.
  3. Layout Flexibility: sCrawler graphical engine is built on Java Graphics2D. The dependency graphs can be focused, highlighted, panned, resized, rotated and relocated easily. This makes analysis of complicated SOA processes very convenient. The layout can be exported as images or GraphML formats.
  4. Constrained Searches and Highlighting: sCrawler has content and contextual searches built in. A process group can be searched and highlighted with the immediate child processes that is dependant on the main search.
  5. Single Console for End-to-End Visibility: sCrawler presents all BPEL, ESB and Third Party Web Services in a single console. This makes it possible to see all the process execution routes end to end.
  6. XML Based Configuration: sCrawler maintains all Application Server Connection properties in xml format, which can be easily edited. All Dependency information is stored in standard GraphML XML formats which can be easily shared with any application that understands XML like Flex,etc.

sCrawler Usage

sCrawler was designed for easy use. For connecting to the Application Server, separate configuration is required. sCrawler always connects to the default connection on start-up (Figure 2). There can be only one default connection, and that connection must be valid. While configuring the connection, sCrawler connects to the Application Server (BPEL PM and ESB DT) and verifies that the connection is valid. If the connection is not valid, sCrawler displays a prompt to edit the configuration (Figure 3).

Figure 2: Connection Console
Figure 3: Failed Connection Example

Upon successful connection setup, the user is presented with the graphical console. The graphical console contains three main panels:

  • Process Explorer Panel - Displays all the BPEL and ESB processes that are currently deployed on the Oracle Application Server, allowing users to select processes of interest.

    Figure 4: Process Explorer Panel
  • Dependency Console - Displays the dependency graph for the processes. This console contains all the features for text-based search of the processes, allows highlighting of immediate children, and includes pan, zoom, and drag controls.

    sCrawler provides two different graphical representations: Simple View and Complex View. A simple view has identical nodes but is better suited for immediate neighbor highlighting and uses less memory. While Complex View does not support immediate neighbor highlighting, it does support the immediate neighbor hide/show feature.

    Figure 5 illustrates the notation used to identify node types in Complex View.
    Figure 5: Types of Processes in Complex View
    Figure 6: Complex View with Message Exchange Patterns
    Figure 7: Simple View with Immediate Neighbor Highlight

  • Execution Log Panel: The execution Log panel shows the execution log of the sCrawler. Information shown includes time for completing each task related to crawling the ESB and BPEL design times, conversion of Objects and rendering of the Graphical Data in the console.
    Figure 8: Execution Log Panel

sCrawler enables constraint-based search of sub-graphs with immediate child processes. This can be convenient for viewing a complicated mesh of processes.

Figure 9: Simple View with Textual Search

Figure 10 illustrates the full-fledged front end.

Figure 10: sCrawler Front End

Future Enhancements

Currently sCrawler focuses on finding and displaying dependencies. However, as previously discussed, this dependency information has a number of uses. The following features will be added to the sCrawler in future.

  1. Automated Testing: The information present in the GraphML can be used to gather more information like the messages exchanged between the processes (cardinality, type, etc). The messages will be tested against preconfigured values or ranges in an incremental manner. The GraphML representation will also be used to construct necessary stubs when individual processes require to be tested.
  2. Connectivity with WebLogic: Currently sCrawler supports only Oracle Containers for J2EE. Later, we intend to put WebLogic Connectivity and OSB into sCrawler.
  3. Automated Deployment: sCrawler's dependency tracking can be extended to incorporate the ability to find dependency of SOA Processes source code. The idea is to deploy processes starting from the edge vertices with the least edges and move into the vertices with maximum edges. Also, sub- graphs may be found which can be independently deployed. This will help in making certain deployments in parallel.
  4. JDeveloper Extension: sCrawler will be available as a JDeveloper extension plugin. This will help developers to have an end-to-end view while they work on an existing SOA Ecosystem.
  5. ANTlibs for sCrawler: For cases where users intend to use a third party graph renderer, a separate ANT library will be made available. This library will generate the GraphML xml documents.


Sandeep Phukan has been working on Java and Integration technologies since 2004. He currently works with Oracle SSI in Bangalore on SOA and Integration. His main focus areas are Data Structures and Algorithms, Reliable Multicast Transports, and Highly Scalable Distributed Computing.