BPEL Knowledge Base - Technical Note #007

Managing BPEL Run-time Exceptions

The BPEL spec enables fault handling capability via the "faultHandler" construct. BPEL programmers are able to deal with different faults in catch-and-handle fashion. But sometimes some faults, expecially with the invoke activity, occur due to system misconfiguration or instability in the network. It would be tedious for the programmer to handle these kind of faults at the level of each and every invoke activity. However, the Oracle BPEL Process Manager provides the following features to assist the developer in dealing with these errors:

  • failover: Allows multiple service implementations to be configured for a given partnerLink. If a retryable runtime fault (discussed in the following section) occurs, the server will try other implementations.
  • retry: The server retries later based on a user specified retry interval and retry count.

However, there are other runtime faults that the above two mechanisms won't help. For example, if a remote service has upgraded and the interface has changed. We call this kind of fault a "bindingFault". The strategies to deal with a bindingFaults are to escalate it to a human administrator via the built-in TaskManager service or to place the document in a dead letter queue via a JMS service. The scenario here shows how to do the latter while the LoanFlowPlus demo sample illustrates the former.

Taxonomy of BPEL Faults

A bpel fault has a fault name, which is a qname (name qualified with a namespace), and a possible messageType. There are two categories of faults in BPEL: business faults and runtime faults.

Business faults are application specific faults and occur when an explicit <throw> activity is executed or an <invoke> activity gets a fault as response. The fault name of a business fault is specified by the BPEL process and the messageType, if one exists, is defined in the WSDL.

Runtime faults are not user defined and will not appear in the WSDL for a process or service. The BPEL spec defines 11 standard faults: selectionFailure, conflictingReceive, conflictingRequest, mismatchedAssignmentFailure, joinFailure, forcedTermination, correlationViolation, uninitializedVariable, repeatedCompensation and invalidReply. They are all in the namespace "http://schemas.xmlsoap.org/ws/2003/03/business-process/" and are typeless, meaning they don't have associated messageTypes. The Oracle BPEL Process Manager also introduces two more runtime faults: bindingFault and remoteFault. They are in the namespace of "http://schemas.oracle.com/extension" and are associated with an Oracle defined messageType "RuntimeFaultMessage". The WSDL that defines the RuntimeFaultMessage messageType is available in the c:/orabpel/system/xmllib directory.

RemoteFault is retryable. It has the following possible fault codes

FaultCode Reason>
ConnectionRefused The remote server is not up
WSDLReadingError Fail to read WSDL
GenericRemoteFault Generic remote fault

BindingFault is not retryable. It has the following possible fault codes:

FaultCode Reason
VersionMismatch The processing party found an invalid namespace for the SOAP Envelope element
MustUnderstand An immediate child element of the SOAP Header element that was either not understood or not obeyed by the processing party contained a SOAP mustUnderstand attribute with a value of "1"
Client.GenericError Generic error at client side
Client.WrongNumberOfInputParts input message part number mismatch
Client.WrongNumberOfOutputParts output message part number mismatch
Client.WrongTypeOfInputPart input message part type error
Client.WrongTypeOfOutputPart output message part type error
Server.GenericError Generic error at server side
Server.NoService Server is up but no service
Server.NoHTTPSOAPAction Request is missing HTTP SOAP Action
Server.Unauthenticated Request is not authenticated
Server.Unauthorized Request is not authorized


If a faultVariable (of messageType "RuntimeFaultMessage") is used when catching the fault, the fault code can be queried from the faultVariable, along with the fault summary and detail.

ResilientFlow

We have created a BPEL process "ResilientFlow" to showcase the fault handling features and strategies. The following is the diagram of the process and its interaction with two axis webservices.

Installation

This code has been tested with Oracle BPEL Process Manager 2.0 RC8.

The test suit is packaged into a zip file. To install it on your environment:

  1. Go to 'c:/orabpel/samples/demos/ResilientDemo
  2. Start tomcat.
  3. Go to ResilientDemo/AxisService and modify ant.cmd (or ant.sh) to configure AXIS_HOME and ANT_HOME variable. Copy everything under the classes directory to TOMCAT_HOME/webapps/axis/WEB-INF/classes/ and then run ant.cmd. (or ant.sh).
  4. cd into directory ResilientDemo and run obant. This will package and deploy the rest of the modules.

Use Case #1 : Testing Failover

The first <invoke> activity in the process shows the failover feature. The partnerLink of this <invoke> has two possible implementations and they are configured in the deployment descriptor as follows:

 <properties id="RatingService">
   <property name="wsdlLocation">
    http://localhost:8080/axis/services/RatingService1?wsdl
    http://localhost:8080/axis/services/RatingService2?wsdl
   </property>
  </properties>
  

We configure RatingService1's wsdl to have a nonexisting soapAddress, while RatingService2 is correct and should work. So when the <invoke> tries to call RatingService1, a RuntimeFault named "ConnectionRefused" will occur. Since this is a retryable fault, the BPEL server will automatically try to call the other service, RatingService2.

  • Step 1: Start your tomcat server.
  • Step 2: Login in to the BPEL Console and click on ResilientFlow. Type in a SSN such as "123456789" and invoke the process.
  • Step 3: Go to the audit view of the process. You will see two events under the "RatingService (getRating)" invoke activity. One is a remoteFault and the other is a successful invocation.

Use Case #2 : Testing Retry

The second <invoke> activity in the process shows how the system retry works. The partnerLink of this <invoke> is configured as follows:

 
<properties id="FlakyService">
   <property name="wsdlLocation">http://localhost:8080/axis/services/FlakyService?wsdl</property>
   <property name="location">http://localhost:2222/axis/services/FlakyService</property>
   <property name="retryCount">2</property>
   <property name="retryInterval">60</property>
  </properties>
  

Suppose the service is not listening at port 2222. Then this invoke will fail with a "ConnectionRefused" RuntimeFault. Since it is a retryable fault and the retryCount and retryInterval are defined, to 2 and 60 respectively, the server will retry twice, with 60 seconds intervals between each attempt. If a tcp tunnel is run to connect port 2222 to your tomcat port, the <invoke> activity will be successful.

  • Step 1: Refresh the audit trail view of the previous instance, you will see the "FlakyService (getAccountId)" activity is in "pending" state and there are two events under it. One is a remoteFault and the other is a "schedule retry".
  • Step 2: Wait for the 1st retry to execute, which will result in another "remoteFault".
  • Step 3: Before the 2nd retry takes place, start the TCP tunnel to connect port 2222 to port 8080 (tomcat port).
  • Step 4: After the 2nd retry, refresh the audit trail view. You will see that the second retry completes successfully.

Use Case #3 : Testing Escalation to User Task

A user task is created in the faultHandler of the second invoke. If failover and retry doesn't solve the problem, a user task is created.

  • Step 1: Turn off the TCP tunnel. Start a new instance of the ResilientFlow.
  • Step 2: Let the second <invoke> be retried twice and fail. Refresh the audit trail view and you will see the process is pending on activity "ExceptionManagementManager (onTaskResult)".
  • Step 3: Go to the BPEL Console and you will see a TaskManager instance is created. Go to the custom JSP at http://localhost:9700/ExceptionUI. Here you can take on the role of an administrator or customer service rep and review the content of the fault and complete the task.
  • Step 4: After completion of the task the process should finish successfully.

Use Case #4 : Sending to a JMS "Dead Letter" Queue

In case the above mechanisms all fail or are not appropriate, one other strategy is to send the failed messages to a dead letter queue. The ResilientFlow has a process level catchAll which sends failed requests to a JMS queue.

  • Step1: Stop your tomcat server.
  • Step 2: Go to your ResilientTest/ExceptionQueue directory and run obant runwl. This will start a process that listens to the dead letter queue.
  • Step 3: Start a new ResilientFlow instance. In this case the first activity will fail even with the failover feature. Then the catchAll catches the fault and sends a JMS message and you will see a message in the window running the queue listener.

 

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy