Charteris Community Server

Welcome to the Charteris Community
Welcome to Charteris Community Server Sign in | Join | Help
in Search

Chris Dickson's Blog

BizTalk Orchestration Exception Handling: What's changed?

One thing which has changed significantly between BizTalk Server 2004 and BizTalk Server 2006 is the way that the product behaves when unhandled exceptions occur during execution of an orchestration.

In BizTalk Server 2004, the model is brutally straightforward: any exception which is not caught and handled in an exception handler block suspends the orchestration instance in a state (Suspended(Not Resumeable)) from which it cannot be resurrected, and BizTalk's XLANG/s engine logs an error event in the Application event log looking like this:

Event Type: Error
Event Source: XLANG/s
Event Category: None
Event ID: 10034
Date: 21/11/2006
Time: 15:56:20
User: N/A
Computer: CHRISDI-VM1
Description:
Uncaught exception terminated service BTSDefaultExceptions.BizTalk_Orchestration1(b08ac914-74bc-6032-8029-29cd7aeb5541), instance 497a22a5-0e93-4cc4-8fd8-43499e58e3ea

Invalid data: Some text
Exception type: ApplicationException
Source: ExceptionGenerator
Target Site: Void DoNotALot(System.String)
Help Link:
Additional error information:

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

This behaviour creates some major problems for applications:

  1. it is often impossible to tell from the limited information in the event log entry what caused the failure
  2. the orchestration state cannot be retrieved
  3. messages associated with the failed orchestration can only be recovered by using HAT to dump them out to a file, or by implementing special recovery code using WMI
  4. The suspended service instances remain clogging up the MessageBox until they are purged by administrative action

As a result, it is imperative when developing orchestrations in BizTalk Server 2004:

  • to adopt a standard pattern which provides a last-resort exception handler: that is, a scope containing catch blocks for both .NET exceptions of type System.Exception and General Exceptions, in which a more informative description of the exception context can be constructed and used as the Message property of a new exception which is then rethrown; this mitigates the first issue above, as the richer context information will then appear in the XLANG/s event log entry.
  • to implement recovery logic explicitly for any exception types which can be anticipated. This logic might comprise:
    • compensation of work already committed within atomic scopes
    • retry loops using the Suspend shape (e.g. to handle temporary resource outages such as network connectivity failures);
    • business process level error handling (e.g. sending a response message containing an error status following a data validation exception).
    • graceful termination of the orchestration with appropriate logging, in the event of unrecoverable exceptions 

Essentially, the aim in BizTalk 2004 must be to avoid the XLANG/s default exception handling behaviour completely.

BizTalk Server 2006's exception handling model is quite different: an exception which is not caught within the exception handlers of the orchestration causes the orchestration instance to be moved to the Suspended (Resumeable) state, rather than Suspended (Non-resumeable). An event log error entry is still written by XLANG/s, but it contains more context information, including the name of the orchestration shape where the exception emerged and the stack trace at the exception site (as well as some bad spelling ;-)):

Event Type: Error
Event Source: XLANG/s
Event Category: None
Event ID: 10034
Date:  21/11/2006
Time:  11:20:42
User:  N/A
Computer: VM-WS2K3
Description:
Uncaught exception (see the 'inner exception' below) has suspended an instance of service 'BizTalk_Server_Project1.BizTalk_Orchestration1(872ef22d-51f6-5cde-aaa0-5a7dcc036b3b)'.
The service instance will remain suspended until administratively resumed or terminated.
If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.
InstanceId: 014e260d-336c-458e-9d47-829afbe9148a
Shape name: Expression_1
ShapeId: 17d5c466-163b-460e-8372-cf385835b397
Exception thrown from: segment 1, progress 6
Inner exception: We don't like the name 'name_0'!
       
Exception type: ApplicationException
Source: ClassLibrary1
Target Site: Void .ctor(System.String)
The following is a stack trace that identifies the location where the exception occured

   at ClassLibrary1.Class1..ctor(String name)
   at BizTalk_Server_Project1.BizTalk_Orchestration1.segment1(StopConditions stopOn)
   at Microsoft.XLANGs.Core.SegmentScheduler.RunASegment(Segment s, StopConditions stopCond, Exception& exp)

       

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

As indicated in the event log text, the orchestration instance can be resumed (using HAT or WMI), and it will restart from its last persisted state. This goes some way to resolving the issues from which the BizTalk 2004 model suffers: the event log entry is more informative; and the orchestration state can be recovered much more easily.

But we should not allow ourselves to think that these product changes mean that we can simplify to any great extent the way that we handle exceptions within our orchestrations. The fact that orchestration service instances can be resumed after an unanticipated and unhandled exception may be a saving grace in some circumstances, but it does not provide an automatic panacea which can be relied upon to replace specific exception handling within the orchestration:

  • for one thing, the resume mechanism is one that you do not want to be invoking routinely. It either requires use of the HAT tool by a human user, requiring appropriate alerting of operations personnel; or it needs to be done automatically using WMI, and controlling BizTalk via WMI can be problematic if done in high volumes - the BizTalk WMI providers place considerable load on the Message Box, are not written to be highly scalable and may not behave correctly when multiple operations are performed concurrently (particularly those which change service instance states).
  • although more exception context information is displayed in the event log entry than was the case with BizTalk Server 2004, using this context information in any automated recovery mechanism is not very easy.
  • working out where the various persistence points in an orchestration are is not as easy as it might be. And it is possible for orchestration shapes executed after the last persistence point to have side effects: if these are not idempotent, automatic resumption from the last persistence point may lead to incorrect system state.

So in general, I would not advocate trying to incorporate this out-of-the-box exception handling functionality into your routine error handling mechanisms; you still need to analyse all the failure scenarios in your process carefully, catch all anticipatable exceptions in the appropriate places, and provide appropriate compensation blocks and explicit retry loops where necessary; the default exception handling functionality should remain a safety blanket which you ideally never use. Regarding any XLANG/s event log error as a symptom of a defect in your orchestration is, I think, a good mindset with which to approach the design of orchestration exception handling.

There is one exception scenario where BizTalk Server 2004 was deficient, and workarounds were very difficult, but BizTalk Server 2006 functionality is very much better: that is when an exception occurs during an attempt by BizTalk to execute a persistence point, for example if network problems cause connection errors when the BizTalk host instance running the orchestration tries to communicate with the Message Box. In BizTalk Server 2004, such exceptions caused an unhandled exception in the orchestration, and the orchestration service instance was rendered immediately Suspended (Not Resumable). In BizTalk Server 2006 the BTSNtSvc host process is now more savvy when it encounters difficulties communicating with the MessageBox: it logs an error to the event log but keeps Active orchestration instances alive while it periodically retries the database call. If a retry succeeds, the orchestration instance proceeds on its way completely unaware of the temporary hiatus in MessageBox communication.

 

 

 

Published Dec 19 2006, 06:03 PM by chrisdi
Filed under:

Comments

 

djb said:

indeed very useful.

thanks

January 30, 2008 6:48 AM
 

Yossi Dahan said:

Very detailed and useful description. thanks.

There is one slight down side to this change, on which I've blogged here -

www.sabratech.co.uk/.../and-then-just-when-you-actually-needed.html

May 7, 2008 5:13 PM
 

Peter Wullems said:

If someone could just explain the what a segment and a progress actually referred to that would be extremely helpful.

October 31, 2008 3:14 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit