Mark’s rebuttal on compensations with REST and JBPM


I got what I hoped for. Mark Little responded to my previous blog. Cool, now I’ll really start to learn something…. Let me respond to some of his comments:

Bill: “One of my concerns is the complexity of the WS-* layer and how it affects maintainability and evolvability of a distributed system. WS-* requires both the client and server to have the necessary middleware components and libraries installed to work.”

Mark: Yes, but that’s the case for any distributed system: the endpoints have got to be able to understand each other or go through some intermediary that can do the protocol/data translation. The same is true for HTTP though.
Mark’s right, I said something obvious, but I think the layers you have to have in sync are very different in size and complexity. While a REST over HTTP web service only really requires an HTTP client, HTTP web service, and maybe an XML parser, WS-* requires those plus all the WS-* libraries, plus any generated stubs. What I like about REST over HTTP you can actually tell the server what data format you’re sending (content-type) and what format you want back (accept). Maybe there is something equivalent in the WS soup of specifications, but it doesn’t seem like that is what the industry is pushing as its default framework.

Bill: “What I continually like about REST over HTTP is that there is little to no requirement on either the client or server side for additional libraries or generated stubs other than a http client library and a http server.

Mark: So let’s assume we can get everyone to agree to use HTTP (not a big assumption, so I’m happy with this). Then we have to get them to agree to the protocol data, the format, representation, etc. that actually flows between the endpoints. It’s no good someone saying “because we agree on PUT or GET that’s the end of the story.” That’s a bit like saying that because we agree on using letters and envelopes to send mail around the world the recipients will be able to understand what’s being sent!

Its eerie. I gave the same argument to Vinoski over private email months ago, but then I thought about it a bit more. Don’t you have to do the same thing with WS? You still have to agree on the message you are sending. What is the name of the remote method call you are invoking. What are its parameters. What is the data format of its parameters, and so on. What allowed me to break the internal conflict I had was when I came to this realization: What if you think of PUT, GET, POST, DELETE, and OPTIONS as a set of metadata?

@Put void createOrder(Document theOrder);
@Delete void deleteOrder(int orderNum);
@Get Document getOrder();

Maybe I’m crazy, but these HTTP operations are really nothing more than annotations on a method. In using REST over HTTP, yes, you still have to agree on the data format exchange, but you are providing clearly defined, distinct, exact, metadata on the behavior of your endpoint. Behavior that can be generically communicated to mature tools and services that have been around forever that your operations department knows how to operate and fine tune your application with.

Mark: The Web works because the standards/specifications on which it is based were laid down back in the early 1990’s and haven’t really changed. The W3C manages them and everyone agreed to abide by them very early on; there’s strong commercial incentive not to change things. That has nothing to do with REST and everything to do with protocol agreement: same thing has happened with WS-*.

I am officially in my late 30s. I did work through both the short-lived DCE and long-lived CORBA eras that touted the same story of interoperability. Sorry if I am skeptical and worried that the industry will yet again re-invent itself in another 5-10 years.

Mark: Yes, we have interoperability on the WWW (ignoring the differences in HTML syntax and browsers). But we do not have interoperabilty for transactions, reliable messaging, workflow etc.

A wise man once told me if you run into a wall, redefine the problem. To me, that’s what the whole REST exercise is, redefining the problem in distributed computing. The whole underlying point of my previous blog was that maybe you don’t need interoperability for transactions, reliable messaging, and workflow in a distributed environment, if you let the client drive the application. Yes, you need transactions, reliable messaging and workflow at the client. The idioms I described with jBPM use all three of these subsystems, but they do not require a remote protocol definition. As responsible engineers, we should be questioning the need for WS-*. The amount of investment is just too huge for any organization. There’s too much money and time to be lost.

Bill’s next point was around flexibility. He mentions the ordering of compensations can sometimes (often?) be important and that you can’t rely on ordering within WS-BA. Unfortunately that’s not correct, as was pointed out several years ago.
Apologies. I didn’t make this mistake intentionally. The article Mark points out was one of the things I read before writing my previous blog. The talk of scopes was a little confusing there. Seemed more of talk of nested transactions than the control of ordering. But, even so, I still don’t get the need for it. Why the need for BA if it is not going to do things automatically for you? If the line between your business process and your compensation engine is starting to blur, why not have your bpm and compensation engine be the same thing? Cuts out one less piece of infrastructure you have to install and manage.

So just because I decide to use REST and HTTP doesn’t mean I get instant portability and interoperability. Yes, I get interoperability at the low level, but it says nothing about interoperability at the payload. That’s damn hard to achieve, as we’ve seen over the past 7+ years.

Not so sure this is true. If your content type is text/xml, you’ll probably have an XSD associated with it. If you use the design by contract to implement your web services that people like the Spring guys are pushing, you really have the same thing.

Bill: You still have strong decoupling in a bpm driven activity. The difference is that instead of compensation being hidden by the server, it is visible, and in your face, embedded directly in the business process. With jBPM, you can look at the process diagram and know exactly the flow of success and failure conditions of your distributed application. With BA, you would have to dive into the documentation and code to determine how your application functions.”

Mark: Yes, that’s right, but no different to what you’d find with a good WS-BPEL implementation.

See? I knew I’d learn something. But when I said “a bpm engine like jBPM might be a better solution to manage compensations.” I meant any bpm engine. What I really want here is to have the client drive the application and simplify the complexity of the distributed protocol. This is why I said I really liked the idea of using BA with a web application. It would all be local and controllable. After reading up on BPEL compensation handlers, I also like the idea of using them with RESTful web services. Again, put all the responsibility on the client for managing the coordination.  Then we don’t have a dual dependency on an in-sync client and server.

Distributed Compensation with REST and jBPM


Over the past few months I’ve been leading up to something. Although I was researching and diving into specifics of compensations, jbpm, and REST, I had a much grander purpose in writing these specific blogs:

These blogs were specifically written as a thought exercise in what it would take to write a coordinated distributed application. What pieces would we really need? Where does WS-* fit in? How about REST and BPM? Do we really need a business activity framework? How can we write scalable and maintainable distributed systems? I want to pull together my thoughts on the answers to these questions in this blog entry as well as ask a few more.

Maintainability: REST over WS

One of my concerns is the complexity of the WS-* layer and how it affects maintainability and evolvability of a distributed system. WS-* requires both the client and server to have the necessary middleware components and libraries installed to work. A distributed application usually means that it is an integration project. You are probably integrating with systems you do not control. There is probably a good chance you would not be able to bring down these systems to install or upgrade their middleware architecture. They may be poorly maintained or even not maintained at all. This is where the idea of REST comes in to simplify things and make things more usable over time. Since it has constrained, well defined interfaces over a mature and stable protocol, it has a higher chance of letting your distributed applications evolve as the WWW has evolved.

What I continually like about REST over HTTP is that there is little to no requirement on either the client or server side for additional libraries or generated stubs other than a http client library and a http server. This doesn’t mean that you cannot use a client and/or server side framework to implement your RESTful services or connect to an existing service. You are free to use whatever you want to make yourself productive, but not at the expense of requiring the client or server to have anything installed other than HTTP support.

Flexibility: BPM vs. WS-Business Activity

Earlier this year Mark Little introduced me to the idea of compensating transactions and a framework that could manage such an application. My first thought was that this stuff was interesting, but my intuition was telling me that compensations were something that might have to be modeled rather than managed. That compensations were more of a business process than something that could be coordinated automatically. My feeling is that compensation could easily become too complex for a transaction manager to handle generically. That if you modified a transaction manager to handle these more complex scenarios it would quickly start to look like a bpm framework.

Take an abort event for example. The coordinator in a compensating framework like WS-Business Activity has to make decisions on what order to perform a compensatory action on the resources participating in the activity. A default LIFO ordering seems very appropriate on the surface, but it falls apart in even the simplest of examples. For instance, let’s revisit the example I gave in my previous blog of the ordering of a computer at an online retailer. The business process might look like this:

  1. Bill credit card
  2. Pull parts from inventory
  3. Assemble computer
  4. Install software
  5. QA
  6. Ship

Let’s say the customer canceled the order after it had reached the QA state, #5. Having the customer wait for a LIFO compensation to receive refund on their credit card is just not acceptable. They will not want to wait for installed software to be wiped, the computer disassembled and parts sent back to inventory before they receive their refund. Therefore a 1-4-3-2 ordering of compensations might be more appropriate. Let’s expand this scenario even further. What if the customer paid in cash? The cash refund process might take just as long as disassembling the computer would. Maybe it would make more sense to do the refund in parallel with the other compensations? So, state #1 could be done in parallel with 4, 3, and 2, but, 4-3-2 still has ordering dependencies. You can’t disassemble the computer before you’ve wiped the hardrive. You can’t send parts back to inventory before you’ve disassembled the computer. So, you see, in even the simplest of business activities, you can have variable and complex compensation ordering. This is why I think a bpm engine like jBPM might be a better solution to manage compensations.

When I had discussed this with Mark, he made a good point that you are just replacing one coordinator(BA) with another (jBPM). You still need a coordination engine. Although he is semantically correct, I think the solutions are very different. With BA you have something generic and decoupled from the business process. In a distributed system, it is also a defined contract that both the client and server must obey. Which brings me to my favorite quote in the O’Reilly RESTful Web Services book:

The client should be in charge of managing its own path through the application

While WS-BA puts the onus on both the client and server to have WS libraries that are of the same version and can interact with one another. This all works beautifully in controlled environments. You have a very nice decoupling between compensations and business logic in both client and server side code. But…. What happens when BA versions are out of sync, there’s interoperability problems, or even worse, one or more of the services you have to coordinate does not support WS-BA? With jBPM, success and failure paths are a part of the business process. There isn’t necessarily a contract between the client and server as the client is responsible for knowing how to compensate and to perform the compensation. You still have strong decoupling in a bpm driven activity. The difference is that instead of compensation being hidden by the server, it is visible, and in your face, embedded directly in the business process. With jBPM, you can look at the process diagram and know exactly the flow of success and failure conditions of your distributed application. With BA, you would have to dive into the documentation and code to determine how your application functions.

In his blog in late August, Heuristics, one-phase commit and compensations, Mark Little talks about his early compensation framework and its interaction with heuristics:

In both of these problem scenarios what typically happens is that someone (e.g., a system administrator) has to get to grips with the inconsistent data and figure out what was going on in the rest of the application in order to try to impose consistency. One of the important reasons this can’t really happen automatically (at the TM level) is because it required semantic information about the application, that simply isn’t available to the transaction system. They compensate manually.

Until then. What we were proposing was allowing developers to register compensation transactions with the coordinator that would be triggered upon certain events, such as heuristic outcomes or one-phase errors. And to do it opaquely as far as the application developer was concerned. Because these compensations are part of the transaction, they’d get logged so that they would be available during recovery. Plus, a developer could also define whether presumed abort, presumed commit or presumed nothing were the best approaches for the individual transaction to use (it has an affect on recovery and failure scenarios).

Reading this fed more fuel to my idea that compensations really belonged as part of a process model. Transactions that have heuristic outcomes many times require human intervention as well as action by the framework itself. Yes, there’s a lot a transaction manager could do to make it easier to record business activities, but you’re still going to have interact with a real person to resolve many issues. jBPM is great at modeling work that requires a human.

But what about reliability?

Great, so we’re going to use something like jBPM as both our coordination and compensation engine and REST to expose our services on the network. Can we trust these things to provide us with the reliability and consistency that we expect out of something like WS and WS-BA? If you’re modelling compensations in jBPM, you’re gonna need the ability to reliably perform these compensations. Luckily, I’ve already answered how you can guarantee execution and failure handling in jBPM. But what about REST, what does it add? If you have truly written RESTful services, you have a lot of reliability and consistency built into your application. HTTP GET, PUT, and DELETE are supposed to be idempotent, meaning that it doesn’t matter how many times you’ve invoked them, the result is the same. While jBPM can guarantee you execution and failure handling, it can’t guarantee that these events won’t be duplicated. Since your RESTful services should be idempotent, you won’t have to worry about it! (I know, easier said then done.)

Where does that leave WS-BA?

I did discuss my ideas a bit with Mark. One of his comments that struck home was “Imagine if you had to explicitly tell all of your XA resource managers to prepare and commit?” I’ll admit that in a WS-BA world, writing a client that integrates a bunch of distributed services into one business activity would be uber simple and clean. The work being done with the JBoss Business Activity Framework simplifies the server side greatly as well. A compensation framework like JBoss Business Activity, might work really really well in the controlled environment of a web application. Until I actually write one of these applications or talk to a bunch of users and customers that have used WS-BA to model their integration project, I will remain skeptical. I just hope that I can get the feedback I need to further formulate my thoughts on this subject. Your thoughts and experiences??????

Guaranteed jBPM Failure Handling


jBPM 3.x allows you to define exception handlers in .jpdl. You can define exceptions you want to catch and either recover from the exception, or allow the execution to fail. These exception handlers are incomplete in functionality though. In my last blog, I talked about how asynchronous continuations give you the ability to have guaranteed transitions. If you dive into the asynchronous code though, you see that exception handlers are executed within the same transaction as the command service triggering the node execution. What does this mean? What are the consequences of this?

Let’s say that your action logic runs into an error condition and it needs to rollback any database changes it has made. Since the thread of execution is marked for rollback, you cannot make any changes to the execution context of the business process, for instance, like transitioning to a failure state. Any failure changes you make to the execution context would just be rolled back by the transaction. Another scenario is the guaranteed transitions I talked about in my last blog. On a rollback situation, what if you want to let your business process decide on whether the node execution should be tried again or not instead of the simple logic of the MDB’s retry count. We talked a bit about these problems on the jBPM forum. Let me summarize some solutions that were discussed.

Execute in a separate transaction

One approach is to solve the problem by invoking non-jbpm code in its own transaction. Your jbpm action could just delegate to an EJB or Spring bean that uses a REQUIRES_NEW semantic. If the invocation succeeds, populate the appropriate bpm variables and move on. If it fails, transition to a failure state or abort and retry the action.

I don’t think this is a good approach because you wouldn’t be able to use frameworks like Seam which make integrating business processes and business logic much easier. Seam uses annotations to biject (push data to and from) jbpm context data into your beans. Freeing you from writing a lot of jbpm specific code.

More importantly, I don’t think this is a solid solution. For instance, what if your non-jbpm code is successful and commits, but the machine crashes before the jbpm context commits? Then your system ends up in an inconsistent state and you are screwed.

Have a Failure State

A better way to solve this is to actually define a transition to a failure state in your process definition

     <transition to='business-state-1'/>
   <state name='business-state-1' asynch='true'>
     <transition name="failure" to='failure-state-1'/>
   <state name='failure-state-1' asynch='true'>


Your failure processing would be encapsulated within actions of the failure state instead of within exception handlers in the original business state. The failure state should be asynchronous because we want to get guaranteed processing of this state. There are two ways to ensure that either a) the business-state-1 state gets processed, or b) the failure state gets processed. I already discussed in a previous blog how to guarantee (a), but for (b) we needs some more thought.

Guarnateed failure handling: Dead Letter Queue Processing

One way to do guaranteed failure handling would be to take advantage of dead letter queues. Most MDB containers allow you to specify how many times you want a message redelivered. After the message redelivery count is reached and there still is failure, the container will reroute the message to a dead letter queue. This queue is a normal queue that you can listen on. Although not a standard JMS feature, I looked into it a little, and at least JBoss, ActiveMQ, and MQSeries have the notion of a DLQ.

So, to do jBPM failure handling, you could just register a new MDB to listen on the dead letter queue. Asynchronous continuations post instances of org.jbpm.job.ExecuteNodeJob. The MDB would check to see if the message contained an org.jbpm.job.ExecuteNodeJob, if it did, check the node for a “failure” transition. If one exists, then signal the process instance to traverse that “failure” transition.

@MessageDriven(activationConfig= {
      ActivationConfigProperty(propertyName="destinationType", propertyValue="javax.jms.Queue"),
      ActivationConfigProperty(propertyName="messageSelector", propertyValue="jobId IS NOT NULL"),
      ActivationConfigProperty(propertyName="destination", propertyValue="queue/DLQ")})
public class DLQFailureHandler implements MessageListener
    JbpmConfiguration jbpmConfiguration = null;
    @Resource(name="JbpmCfgResource") String jbpmCfgResource = null;

    public void initJbpm() {
      jbpmConfiguration = JbpmConfiguration.getInstance(jbpmCfgResource);

    public void onMessage(Message message) {
        long jobId = 0;
        try {
            jobId = message.getLongProperty("jobId");
        } catch (JMSException ignored) {
            // message selector confirms existence

        JbpmContext jbpmContext = jbpmConfiguration.createJbpmContext();
        try {
            JobSession jobSession = jbpmContext.getJobSession();
            Job job = jobSession.loadJob(jobId);
            if (!(job instanceof ExecuteNodeJob)) {
                // do some other failure handling
            ExecuteNodeJob executeNodeJob = (ExecuteNodeJob)job;
            if (executeNodeJob.getNode().hasLeavingTransition("failure")) {
            else {
                // push to a different DLQ?
        } finally {

To make sure we process dead jbpm jobs, we use a message selector configured at line 02. Lines 06-12 lookup and configure the jbpm context. Line 24-25 extracts the Job. If the Job is not an ExecuteNodeJob, you’ll want to add specific handling there or forward the message to another DLQ. Next in line 32 we check to see if the node has a “failure” transition. We transition and delete the job. If there is no “failure” transition, you might want to repost the message to a different DLQ that you create. If there is a failure in this processing, because of message delivery, we are still guaranteed to reprocess the failure logic.

The advantage of this approach is that it is totally seemless to your application: both in code and in your process diagram. You are guaranteed to either be in a successful or failure jbpm state. (At least with using the JBoss JMS inflow adapter but you should probably be ok in other implementations). The disadvantage of this approach is that you have no knowledge of the error condition and can’t report it to the “failure” state. Your failure logic will not know the cause of the failure. This isn’t always an issue because sometimes, specifically in a machine crash, you may not know why there is a failure.

Guaranteed failure handling: Transaction Synchronization

Another idea that came up in the forum discussion was to register a javax.transaction.Synchronization with the Transaction Manager. In afterCompletion(), do the transition to a failure state if there was a rollback. The advantage of such an approach over DLQ is that you can programmatically add the behavior in action code or write an action that performs the TM synchronization. Because of this, you can propagate any exception or failure reason to the failure state. I’m not going to get into details of this approach as I am not sure it will work, specifically:

  • You must set your redelivery count to 0. The synchronization has no way of knowing whether or not the message had been redelivered or whether or not you should transition the token to the failure state with jBPM code out-of-the-box. Since you don’t want the original JMS message redelivered after you have transitioned to the failure state, you need a retry count of zero.
  • JMS Providers may not be able to guarantee that a message isn’t redelivered. Since we require a redelivery count of zero, this is not an option.
  • Your machine could crash before transitioning to the failure state. Since we cannot redeliver the message, the node will never transition to the failure state. You may say, “well, can’t the same thing happen with the DLQ option?”. With the JBoss Inflow adapter, no. The JBoss adapter performs the DLQ check within the same transaction as the delivered method. Since both the sending to the DLQ and acknowledgement of the message are within the same transaction, you are guaranteed of one or the other succeeding.
  • Probably the most important thing is that Synchronization.afterCompletion() is not guaranteed to be executed on TX recover, after a machine crash.

So, the winner is DLQ! Of course I may have missed something. This approach would be better if we could work out the issues.

Side Effect: Redelivery handling

In my previous blog on Guaranteed Execution in jBPM, I discussed at the end how jBPM actions have know current way of knowing whether or not they have been retried by a new transaction. An indirect consequence of the DLQ approach to failure handling is that we now have some ability to recognize and keep track of redeliveries.

     <transition to='guarantee'/>
   <state name='business-state-1' asynch='true'>
     <transition name="failure" to='failure-state-1'/>
   <decision name='failure-state-1' asynch='true'>
     <transition name="recover" to='recovery-state-1'>
        <condition expression="retryCount >=3"/>
     <transition name="retry" to='business-state-1'>
        <condition expression="retryCount < 3"/>



Since failures result in a transition to a failure state we can keep increment and store redelivery counts within the token. We can programmatically decide wheter or not we are allowed to retry the message.


So that’s how you can have guaranteed failure handling. I do have a disclaimer though. I’m too lazy to actually try out these ideas. When I do I’ll get back to you in a follow up blog. Please try it out yourself and provide feedback on the jbpm forum topic. We’re still working out the issues here and your thoughts would be most appreciated.

Guaranteed Execution in jBPM


Was thinking a bit about jBPM today. One thought crossed my mind: How can you guarantee that a specific state transition happens in jBPM? We’ll talk about guaranteed state transition in this blog, but first of all, before we even think of jBPM, how could you guarantee that *any* event happens in a general application? There are few things you have to think about here:

  • You want to be guaranteed that the event is scheduled to be processed
  • You only want the event to be scheduled if your business logic succeeds
  • You want to be guaranteed that somebody processes the event

Being as stupid and slow as I am, took me a few minutes to realize that JMS provides guaranteed delivery of messages. This is obvious to many people, but let’s walk through what you would have to do in JBoss to have the guaranteed processing of an event.

First things first is to write the sender code. Let’s do this in an EJB 3.0 stateless session bean. I know this is basic stuff, but you may not be familiar with a) EJB 3.0 or b) the JBoss specifics:

public class BusinessLogicBean implements BusinessLogic {
   @Resource(mappedName="java:/JmsXA") ConnectionFactory factory;
   @Resource(mappedName="queue/mylogic") Destination destination;

   public void doSomething() {
      ... do some logic ...
      Connection con = factroy.createConnection();
      ... do all the JMS sugar ...

The doSomething() method is a transactional unit of work. We want every piece of business logic to have succeeded when we deliver the message. In other words, we want the message delivered as part of a transaction. To use a transactionally aware sender, we injected the “java:/JmsXA” connection factory into our SLSB. We also set the delivery mode of the producer to be PERSISTENT. So, we both guaranteed that the message would only be sent if the business logic succeeded and that the message has been scheduled for processing. On to the server:

@MessageDriven(activationConfig= {
      ActivationConfigProperty(propertyName="destinationType", propertyValue="javax.jms.Queue"),
      ActivationConfigProperty(propertyName="destination", propertyValue="queue/mylogic"),
      ActivationConfigProperty(propertyName="DLQMaxResent", propertyValue="5")
public class MyMessageConsuomer implements MessageListener {
   public void onMessage(Message msg) {
      ... do the work ...

Typical message driven bean in EJB 3.0. Since this bean is using the container to manage transactions and being invoked inside of a transaction, the receipt of the message will not be sent back to the JMS provider if the business logic is rolled back. So, if there is a system failure, or some other reason for abort, the message will be redelivered, thus guaranteeing its processing. The “DLQMaxResent” property is JBoss specific and tells the JMS adapter how many times the message should be resent before it is sent to a “dead letter queue” and put to death. There’s other configuration items around dead-letter queues. Go read our documentation for additional DLQ settings.

Guaranteed State Transitions in jBPM

Guaranteed state transitions with jbpm is all about using asynchronous continuations and the Enterprise Archive. In jBPM if you mark a node as asynch then jBPM will use its configured messaging system to transition to that state in another thread and transaction. If you have deployed jBPM through the Enterprise Archive, JMS is used to do the transition. Here’s example .jpdl

     <transition to='guarantee'/>
   <state name='guarantee' asynch='true'>

So, starting a process instance of this definition will generate a JMS message that transitions to the ‘guarantee’ state. By default, jBPM is configured to use transactional connection factory, “java:/JmsXA”, describe earlier in this blog. So, when the start-state transitions, it will create a persistent JMS message and deliver it to the queue transactionally. Now, the ‘guarantee’ node will be triggered by an MDB that receives the message sent by the jBPM message layer. This MDB uses container managed transactions. This means that if there is a failure within the execution of the state, the message will be redelivered until the transaction completes successfully. This MDB is configured to use JBoss’s old EJB 2.1 container which does not use JCA message inflow. The default message redelivery count is 10. Chapter 5 in the JBoss AS documentation shows how you can define the max redelivery as well as dead letter queue configurations. You’ll need to open up the Enterprise Archive, jbpm-enterprise.ear, to get to the file, jbpm-enterprise.jar. This file contains the ejb-jar.xml and jboss.xml files you need to modify to change the jbpm configuration of the message driven bean.

The only thing missing with this JMS transitioning in jBPM is that the node receiving the transition does not have knowledge on whether the message was redelivered or not. Future versions might incorporate this knowledge. I have pinged the jBPM team and there is some discussion on the jbpm forum.

That’s about it! Yeah, I know a lot of this stuff is basic, hope I didn’t bore you too much. I just wanted to set down some foundation for other topics I want to discuss that may use these JMS and jBPM features.

Must I compensate for everything?


I spent a lot of time over the past few months with the JBoss ESB and jBPM teams. It was my first real hands-on experience with the orchestration of both local and remote “web” services. Usually, as a middleware developer you think in terms of milliseconds. How fast can I transmit information from one point to the next and back again? How can I optimize this query to run faster? The jBPM guys think about things very differently. For them, the processes they model could span minutes, hours, days and even weeks or months. Its a very different way of looking at things. For instance, what if you have a process that spans days and something goes wrong? How do you handle failure and abort conditions? You sure as hell can’t do long running activities within a JTA transaction. This would hold up things like database resources too long. No, what you really need is the ability to undo operations in abort situations.

In middleware land, undo is more commonly referred to as Compensating Transactions. What’s an example? Cancellation of an order is a common compensation. When a customer makes an order over the internet, the business process usually goes through a set of committed states before the product is delivered. Let’s take the ordering of a computer for instance:

  • Bill credit card
  • Pull parts from inventory
  • Assemble computer
  • Install software
  • QA
  • Ship

You don’t want to build the computer without billing the customer. Once parts are pulled from inventory, inventory counts must be decremented. No way this can be done in a traditional ACID transaction. Each state transaction must be its own transaction that is committed. If an order is canceled, there are a number of things that must be done. The buyer must have their credit card credited. The computer must also be disassembled and parts sent back to inventory.

Long running conversations aren’t just a property of a business process run inside something like jBPM. Web applications are a perfect example of a long running activity where you don’t want to hold up resources. The Hibernate guys like to handle these situations by holding database updates in memory and flushing them when the activity commits. They suggest the combination of optimistic concurrency and flush mode NEVER/MANUAL to do this. Flush mode NEVER holds all database updates within the Hibernate session until you invoke flush(). Optimistic concurrency ensures you that your database is in the state you want it to be in when you perform the flush. With this approach compensation is not needed at all, because nothing is committed to the database. The problem is that you might have a resource that can’t be managed by Hibernate or that has a high probability of failing optimistic concurrency checks. In that scenario, compensation is the only thing you can do.

So, that’s my explanation of what compensating transactions are. In my next blog I want to talk about how middleware interacts with compensating transactions and whether or not you need a special compensating transaction engine or business activity service to enable compensating transactions.

%d bloggers like this: