by Jürgen Kress, Berthold Maier, Hajo Normann, Danilo Schmeidel, Guido Schmutz, Bernd Trops, Clemens Utschig-Utschig, Torsten Winterberg
Understanding the implications of transactions and compensation within a service-oriented architecture.
Part of the Industrial SOA article series
Some of the most important SOA design patterns that we have successfully applied in projects will be described in this article. These include the Compensation pattern and the UI mediator pattern, the Common Data Format pattern and the Data Access pattern. All of these patterns are included in Thomas Erl's book, "SOA Design Patterns" [REF-1], and are presented here in detail, together with our practical experiences. We begin our "best of" SOA pattern collection with the Compensation pattern.
Compensation is required in error situations in an SOA, as multiple atomic service operations cannot generally be linked with classic transactions this would violate the principle of loose coupling. An error situation of this sort will occur, particularly if service operations are combined into processes or new services during orchestration or by applying the Composite pattern, and the transaction bracket has to be expanded as a result. We need mechanisms to undo the effects of individual services (the status changes in the overall system) and to ensure that a consistent system state is maintained at all times, so as to preserve system integrity.
For the Compensation pattern, we would like to address the following questions:
By definition, services are atomic units that provide a clearly defined business function. If functional or technical errors occur when using a service, it is the responsibility of the service to deal with these errors, send corresponding messages to the caller of the service, and ensure that the integrity of the overall system is not harmed.
There will be situations in which the service alone cannot ensure integrity. This is the case when multiple services are interconnected with an orchestration engine to form a process or a composite service. If an error occurs in one of the services, the changes made by the preceding services have to be undone. In principle, the service operation or the connected transaction coordinator would then have to communicate with all service operations involved in the orchestration and undo the processing steps. The same applies if the runtime platform encounters errors during an orchestration process. In this case as well, it must be ensured that changes to previously called services can be undone. The classic approach to this problem would be to use a transaction coordinator according to the ACID principle (Atomicity, Consistency, Isolation, Durability).
The use of transactions would mean that all services involved are capable of undoing their changes via a rollback. To enable this, the back-end resources used, such as database records or other data sources, must generally be blocked for the entire processing period. This often results in parallel access being blocked, particularly as IT resources would be blocked for several weeks in the cases that involve very long-running processes.
However, services are loosely coupled in our SOA [REF-2]. A transaction represents a logical bracket around a range of services and therefore places the services in a dependency, which is not desirable in an SOA. For this reason, an alternative solution approach is required:
A compensation can be conceived as a "logical undo." For a modifying functionality, corresponding undo logic must be provided. Specifically, this means that each service operation, such as "create order," that causes a change in the system state requires a corresponding undo-service operation that can reset the state to its previous status.
This undo-service operation is generally not a simple reverse-service operation, such as "delete order," in which an entry is removed from a database table. A more complex result of a service operation usually has to be undone, and the process must be documentable. This corresponds to a common procedure long practiced in the business world. As opposed to being "eradicated", erroneous postings are reversed by cancellation posting, with documentation of the corresponding actions.
This concept of compensation is required for service orchestrations and compositions. A situation may arise in which various service operations have been successfully performed in one process sequence. A subsequently called service operation then calls an error, which requires the reversal of the preceding service operations that have changed the state of the system and belong in the same service context.
Instead of using a technical transaction to ensure the integrity of the system, a "compensation handler" function is added to service orchestration. The "compensation handler" can be used to define the undo-service operations to be called if an error occurs.
The back-propagation of an error in a chain is achieved by requiring each service or process to provide a fault operation. This is not a problem for synchronous services in the event of an error, a corresponding fault is returned. For all asynchronous service calls, however, an alternative solution must be found. Within service orchestration, implementation of the asynchronous Request-Response pattern is one possibility. In this case, the calling process waits until feedback is issued. If an error occurs, the called service sends an error message to the caller instead of the "normal" feedback, whereupon the corresponding compensation handler is activated to perform the logical rollback of the necessary steps. In BPEL, the pick activity would be used instead of a receive activity, in order to react to either the response or the error. In BPMN, the event-based gateway can be used to describe the different responses that come from an asynchronous service call.
The BPEL/BPMN language construct for processing compensation should be applied intuitively. In an exception handler, the occurring error is "caught" and, using the "compensate" activity where necessary, the corresponding compensation handler is triggered to perform the logical rollback [REF-4].
As locks cannot be placed on the corresponding data sources without a transaction monitor, the functional processes often work with attributes, such as "orderstatus=RELEASED." These attributes must also be manipulated accordingly in the event of errors.
Areas of Application
Compensation is used in service compositions if multiple service operations are called that change the state of the overall system independently of one another. Compensation cannot be used if services change data implicitly, e.g. by activating database triggers.
When encapsulating "legacy systems" through Web services and providing their functions as service operations, use of the compensation pattern immediately suggests itself as the use of a transaction manager is rarely a consideration here. Often, the only option is to reset a service operation by compensation in the event of an error because the legacy system does not offer the possibility of restoring the original status via a rollback.
Bear the following effects in mind when using compensation:
Rather than being automatically available for use, as in database rollbacks, compensation logic is handled entirely by the service designers and must be taken into account during the design. As this relates to the integrity of the overall system, this task is particularly important and ultimately has a direct effect on the quality of the services.
For example, if undo-service operations are structured in such a way that they reset everything indiscriminately without consideration of the business logic, this can lead to the assumption that the compensation has been performed cleanly, even though the overall system is no longer in a consistent state. This can occur if the designer of the composition does not have a detailed description of the logic of the called service or its undo-service operations.
In the compensation phase, all service calls are undone in reverse order by corresponding undo-service operations. The limitations of this design are encountered in some cases, such as when service calls have been rejected by conditional logic in the process sequence. For example, a rule service determines whether the operation may functionally be performed at the time of the call. At the time of design, a compensation handler would not know which undo operations it should call.
Two use patterns can be considered for this case. Ideally, services and undo-service operations, in particular, are implemented idempotently. A call of all undo-service operations in the process would then have no consequences, apart from the system load. Alternatively, all calls must be logged in a separate variable. If using a BPEL or BPMN process engine, the variables are persisted and remain available for a subsequent activation/rollback. Without a process engine, or if processes only implement the "Fire and Forget" Message Exchange pattern, the logging of all calls in a separate header area of the message is a popular but unattractive solution for undoing the calls made using a separate compensation process.
However, there are also situations and use cases in which it is vital to perform transactions. The classic example of this is a bank transfer, in which the use of compensation as a basis is critical from a business process perspective. In this case, the SOA principle of loose coupling [REF-2] clashes with the concept of integrity.
Alternative solutions can be developed in these cases, of which two are described.
If there are compelling reasons for not using the Compensation pattern, you can convert the process into a "compound service" [REF-1]. This extends the functional scope of the service (transferMoneyFromAccountToAccount(…)) and handles the transaction within this service.
One alternative is to use the proprietary mechanisms supplied with certain SOA suites that allow transaction contexts to be managed between service calls for ESB and BPEL integration processes. This enables a composite transaction to be undone via a rollback in the event of an error, without having to provide corresponding undo operations.
To recap, the use of a classic transaction bracket creates a dependency between the service operations (issue, receive) and thereby artificially extends the technical interface agreement to include the details of the transaction bracket. This circumvents certain SOA principles, such as reuse or loose coupling.
Instead of shifting transaction responsibility to the service operation, Thomas Erl considers the use of a transaction coordinator, such as WS-TX or XA [REF-5]. For services, this means that they register with the transaction coordinator once an operation is executed and so delegate their transaction management responsibility.
As IT resources must not be blocked for an extended period, this implementation is only recommended in the case of transient transactions and so is often found in high-performance integration processes.
When discussing SOA or attempting an initial implementation, the topic of fault handling is often set aside to focus on the "happy paths." However, it is not possible to avoid addressing the concepts of fault-handling, as some adaptations to the architecture will be required. Every architect must therefore deal with the implications of transactions and compensation within an SOA, in order to be able to estimate the effects.
Alternatives, such as shifting the transaction to the service implementation or using distributed transactions, are available. However, these alternatives contradict fundamental SOA principles, particularly the principle of loose coupling, and should be avoided.
Thought must be given to compensation as early as in the service design phase. Service and process designers must define the scope of undo-service operations. This is required both for services that encapsulate legacy applications and for newly developed services.
Compensation handlers are provided by many process engines on the market in compliance with standards, and enable the robust installation of systems, even in the environment of complex enterprise systems.
[REF-1] Thomas Erl. "SOA Design Patterns." http://www.soapatterns.com/
[REF-2] Berthold Maier, Hajo Normann, Bernd Trops, Clemens Utschig-Utschig, Torsten Winterberg. "Lose Kopplung." Java Magazin 3, 2009
[REF-3] OASIS Standard: Web Services Business Process Execution Language Version 2.0. "Compensation Handlers": http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html#_Toc164738526
[REF-4] Cross Service Transaction: http://www.servicetechspecs.com/ws-atomictransaction
[REF-5] OMG. "Business Process Model and Notation (BPMN) Version 2.0": http://www.omg.org/spec/BPMN/2.0