|
BPEL Knowledge Base - Technical Note #007
Managing BPEL Run-time Exceptions
The BPEL spec enables fault handling capability via the "faultHandler"
construct. BPEL programmers are able to deal with different faults in catch-and-handle
fashion. But sometimes some faults, expecially with the invoke activity, occur
due to system misconfiguration or instability in the network. It would be
tedious for the programmer to handle these kind of faults at the level of each and
every invoke activity. However, the Oracle BPEL Process Manager
provides the following features to assist the developer in dealing with these errors:
- failover: Allows multiple service implementations to be configured for a given partnerLink.
If a retryable runtime fault (discussed in the following section) occurs, the
server will try other implementations.
- retry: The server retries later based on a user specified retry interval and retry
count.
However, there are other runtime faults that the above two mechanisms won't help.
For example, if a remote
service has upgraded and the interface has changed. We call this kind of fault a
"bindingFault". The strategies to deal with a bindingFaults are to escalate
it to a human administrator via the built-in TaskManager service or to place the document in
a dead letter queue via a JMS service. The scenario here shows how to
do the latter while the LoanFlowPlus demo sample illustrates the former.
Taxonomy of BPEL Faults
A bpel fault has a fault name, which is a qname (name qualified with a namespace),
and a possible messageType.
There are two categories of faults in BPEL: business faults and runtime faults.
Business faults are application specific faults and occur when
an explicit <throw> activity is executed or an <invoke> activity gets a fault
as response. The fault name of a business fault is specified by the BPEL process and
the messageType, if one exists, is defined in the WSDL.
Runtime faults are not user defined and will not appear in the WSDL for a process or
service. The BPEL spec defines 11 standard faults: selectionFailure, conflictingReceive, conflictingRequest, mismatchedAssignmentFailure,
joinFailure, forcedTermination, correlationViolation, uninitializedVariable,
repeatedCompensation and invalidReply. They are all in the namespace "http://schemas.xmlsoap.org/ws/2003/03/business-process/"
and are typeless, meaning they don't have associated messageTypes.
The Oracle BPEL Process Manager also introduces two more runtime faults: bindingFault and remoteFault.
They are in the namespace of "http://schemas.oracle.com/extension"
and are associated with an Oracle defined messageType "RuntimeFaultMessage".
The WSDL that defines the RuntimeFaultMessage messageType is available in the c:/orabpel/system/xmllib directory.
RemoteFault is retryable. It has the following possible fault codes
| FaultCode |
Reason> |
| ConnectionRefused |
The remote server is not up |
| WSDLReadingError |
Fail to read WSDL |
| GenericRemoteFault |
Generic remote fault |
BindingFault is not retryable. It has the following possible fault codes:
| FaultCode |
Reason |
| VersionMismatch |
The processing party found an invalid namespace for the SOAP Envelope
element |
| MustUnderstand |
An immediate child element of the SOAP Header element that was either
not understood or not obeyed by the processing party contained a SOAP mustUnderstand
attribute with a value of "1" |
| Client.GenericError |
Generic error at client side |
| Client.WrongNumberOfInputParts |
input message part number mismatch |
| Client.WrongNumberOfOutputParts |
output message part number mismatch |
| Client.WrongTypeOfInputPart |
input message part type error |
| Client.WrongTypeOfOutputPart |
output message part type error |
| Server.GenericError |
Generic error at server side |
| Server.NoService |
Server is up but no service |
| Server.NoHTTPSOAPAction |
Request is missing HTTP SOAP Action |
| Server.Unauthenticated |
Request is not authenticated |
| Server.Unauthorized |
Request is not authorized |
If a faultVariable (of messageType "RuntimeFaultMessage") is used
when catching the fault, the fault code can be queried from the faultVariable,
along with the fault summary and detail.
ResilientFlow
We have created a BPEL process "ResilientFlow" to showcase the fault
handling features and strategies. The following is the diagram of the process
and its interaction with two axis webservices.

Installation
This code has been tested with Oracle BPEL Process Manager 2.0 RC8.
The test suit is packaged into a zip file. To install it on your environment:
- Go to 'c:/orabpel/samples/demos/ResilientDemo
- Start tomcat.
- Go to ResilientDemo/AxisService and modify ant.cmd (or ant.sh) to configure
AXIS_HOME and ANT_HOME variable. Copy everything under the classes directory to
TOMCAT_HOME/webapps/axis/WEB-INF/classes/ and then run ant.cmd. (or ant.sh).
- cd into directory ResilientDemo and run obant. This will package and deploy
the rest of the modules.
Use Case #1 : Testing Failover
The first <invoke> activity in the process shows the failover feature.
The partnerLink of this <invoke> has two possible implementations and
they are configured in the deployment descriptor as follows:
<properties id="RatingService">
<property name="wsdlLocation">
http://localhost:8080/axis/services/RatingService1?wsdl
http://localhost:8080/axis/services/RatingService2?wsdl
</property>
</properties>
We configure RatingService1's wsdl to have a nonexisting soapAddress, while
RatingService2 is correct and should work. So when the <invoke> tries to call RatingService1,
a RuntimeFault named "ConnectionRefused" will occur. Since this is
a retryable fault, the BPEL server will automatically try to call the other
service, RatingService2.
- Step 1: Start your tomcat server.
- Step 2: Login in to the BPEL Console and click on ResilientFlow. Type in a SSN such
as "123456789" and invoke the process.
- Step 3: Go to the audit view of the process. You will see two events under
the "RatingService (getRating)" invoke activity. One is a remoteFault
and the other is a successful invocation.
Use Case #2 : Testing Retry
The second <invoke> activity in the process shows how the system retry
works. The partnerLink of this <invoke> is configured as follows:
<properties id="FlakyService">
<property name="wsdlLocation">http://localhost:8080/axis/services/FlakyService?wsdl</property>
<property name="location">http://localhost:2222/axis/services/FlakyService</property>
<property name="retryCount">2</property>
<property name="retryInterval">60</property>
</properties>
Suppose the service is not listening at port 2222. Then this invoke will fail
with a "ConnectionRefused" RuntimeFault. Since it is a retryable fault
and the retryCount and retryInterval are defined, to 2 and 60 respectively,
the server will retry twice, with 60 seconds intervals between each attempt.
If a tcp tunnel is run to connect port 2222 to your tomcat port,
the <invoke> activity will be successful.
- Step 1: Refresh the audit trail view of the previous instance, you will
see the "FlakyService (getAccountId)" activity is in "pending"
state and there are two events under it. One is a remoteFault and the other
is a "schedule retry".
- Step 2: Wait for the 1st retry to execute, which will result in another
"remoteFault".
- Step 3: Before the 2nd retry takes place, start the TCP tunnel to connect
port 2222 to port 8080 (tomcat port).
- Step 4: After the 2nd retry, refresh the audit trail view. You will see
that the second retry completes successfully.
Use Case #3 : Testing Escalation to User Task
A user task is created in the faultHandler of the second invoke. If failover
and retry doesn't solve the problem, a user task is created.
- Step 1: Turn off the TCP tunnel. Start a new instance of the ResilientFlow.
- Step 2: Let the second <invoke> be retried twice and fail. Refresh
the audit trail view and you will see the process is pending on activity "ExceptionManagementManager
(onTaskResult)".
- Step 3: Go to the BPEL Console and you will see a TaskManager instance is created.
Go to the custom JSP at http://localhost:9700/ExceptionUI. Here you can take on
the role of an administrator or customer service rep and review
the content of the fault and complete the task.
- Step 4: After completion of the task the process should finish successfully.
Use Case #4 : Sending to a JMS "Dead Letter" Queue
In case the above mechanisms all fail or are not appropriate, one other strategy
is to send the failed messages to a dead letter queue. The ResilientFlow has a process level
catchAll which sends failed requests to a JMS queue.
- Step1: Stop your tomcat server.
- Step 2: Go to your ResilientTest/ExceptionQueue directory and run obant
runwl. This will start a process that listens to the dead letter queue.
- Step 3: Start a new ResilientFlow instance. In this case the first activity
will fail even with the failover feature. Then the catchAll catches the fault
and sends a JMS message and you will see a message in the window running
the queue listener.
|