I spent a lot of time over the past few months with the JBoss ESB and jBPM teams. It was my first real hands-on experience with the orchestration of both local and remote “web” services. Usually, as a middleware developer you think in terms of milliseconds. How fast can I transmit information from one point to the next and back again? How can I optimize this query to run faster? The jBPM guys think about things very differently. For them, the processes they model could span minutes, hours, days and even weeks or months. Its a very different way of looking at things. For instance, what if you have a process that spans days and something goes wrong? How do you handle failure and abort conditions? You sure as hell can’t do long running activities within a JTA transaction. This would hold up things like database resources too long. No, what you really need is the ability to undo operations in abort situations.
In middleware land, undo is more commonly referred to as Compensating Transactions. What’s an example? Cancellation of an order is a common compensation. When a customer makes an order over the internet, the business process usually goes through a set of committed states before the product is delivered. Let’s take the ordering of a computer for instance:
- Bill credit card
- Pull parts from inventory
- Assemble computer
- Install software
- QA
- Ship
You don’t want to build the computer without billing the customer. Once parts are pulled from inventory, inventory counts must be decremented. No way this can be done in a traditional ACID transaction. Each state transaction must be its own transaction that is committed. If an order is canceled, there are a number of things that must be done. The buyer must have their credit card credited. The computer must also be disassembled and parts sent back to inventory.
Long running conversations aren’t just a property of a business process run inside something like jBPM. Web applications are a perfect example of a long running activity where you don’t want to hold up resources. The Hibernate guys like to handle these situations by holding database updates in memory and flushing them when the activity commits. They suggest the combination of optimistic concurrency and flush mode NEVER/MANUAL to do this. Flush mode NEVER holds all database updates within the Hibernate session until you invoke flush(). Optimistic concurrency ensures you that your database is in the state you want it to be in when you perform the flush. With this approach compensation is not needed at all, because nothing is committed to the database. The problem is that you might have a resource that can’t be managed by Hibernate or that has a high probability of failing optimistic concurrency checks. In that scenario, compensation is the only thing you can do.
So, that’s my explanation of what compensating transactions are. In my next blog I want to talk about how middleware interacts with compensating transactions and whether or not you need a special compensating transaction engine or business activity service to enable compensating transactions.
Jul 24, 2007 @ 23:03:36
Do you know how seriously hard it is to use flush mode never with more than one developer? Or if said developers are…uh hum…of “various” skill levels.
Jul 24, 2007 @ 23:05:07
However…I don’t get what the two have to do with each other.. You can just not *commit*…it doesn’t matter that you flushed. No compensation is needed.
Jul 25, 2007 @ 10:30:36
That’s what I’m saying(poorly) Andy. If you use flush mode never you don’t need to have a compensating transaction.
Jul 25, 2007 @ 13:46:16
Right but if you DONT use flush mode and you just don’t COMMIT…you ALSO don’t need a compensating transaction. Rolling back works too.
Jul 25, 2007 @ 14:22:03
Andy, this isn’t a good idea. To just not commit. Think of a web application that has a GUI Wizard with 5 steps. You shouldn’t start a tranaction at the first step and commit at step 5. You don’t want to hold database resources for seconds, minutes, hours… You need compensation for “long running activities”. Re-read this blog for the example given. In no place in that particular business process can you just “don’t commit”. For web apps, you may have a web flow that needs to interact with the database. The state transitions are faster in a web flow, but still very very slow when thinking of 10s, 100s, 1000s of users on your system.
Jul 25, 2007 @ 17:30:08
No I read it perfectly you just didn’t say enough. Right but flush mode doesn’t fix that either. You mean disconnected session + flush mode. A different beast all together. However disconnected session + flush mode doesn’t scale quite as well as you’d think. Why? Because you have to manually evict things that were read into the session or you end up sucking down way too much memory or worse replicating it across the cluster. Version columns plus detached objects scales far better but again at a cost of complexity.
Jul 25, 2007 @ 20:37:53
So what you’ve basically done is further justify the need for compensation. Thanks π
Jul 26, 2007 @ 06:49:27
In some cases. Compensating transactions wouldn’t ever be my first choice for most stuff, but there are plenty of edge cases. It is a tool, however, dang how many people even use EJB transactions correctly? Crap how many people use basic database transactions correctly? I also don’t know that canceling an order is a great case for compensating transactions as I know them. In a good system these follow accounting rules and there is more of a workflow. The data is maintained but put in a new state rather than undone. I don’t know that this falls into what I call “compensating transactions”…certainly compensation is involved ;-). I kinda like Gavin’s new terminology of “conversation” for this. Certainly JBPM would be a way to map the workflow and states. I just don’t know that I’d call your specific example quintessential compensation per se — but what do I know… I just crashed gnome for the 5th time today.
Jul 26, 2007 @ 13:49:13
Andy, you’re stealing my thunder π¦ My next blog was going to be about how compensation is better to be handled as a business process through jbpm than handled by a business activity service or compensating transaction coordinator.
Aug 01, 2007 @ 03:49:29
I guess my big whine is your use of “compensating transactions” which I regard as something different. Write this in JBossese: http://blogs.msdn.com/rogerwolterblog/archive/2006/05/24/606184.aspx He more or less explains what I was trying to say and what you were trying to say and hocks BizTalk to boot π
Aug 01, 2007 @ 21:17:39
First off, all transactions are “compensation transactions” π The ones we know and long just do backward compensation. What you’re describing is forward compensation (at least that’s was WS-BA/Sagas is all about). Second, I don’t see this as a jBPM versus BA approach. Ultimately you need something that can reliably record who was involved in the “transaction” and hence who needs to be told to undo, even in the presence of arbitrary failures. That’s a coordinator. Plus, some of the participants in the “transaction” may not need to be compensated (c.f., returning VoteReadonly during prepare of 2PC). That’s all WS-BA does: it provides a standard (for Web Services) for the protocol between coordinator and participants. It doesn’t define when or if a “transaction” should be terminated in a successful or failure state. That’s definitely the domain of the business logic. Plus, it doesn’t define how any compensations should be done. Again, that’s an implementation decision. Participants in a forward compensation based transaction are more often than not tied to the business logic/object/service that they’re compensating: they’re rarely re-usable (e.g., forward compensating a flight reservation is different to compensating buying a book). WS-BPEL defines compensation handlers in the same way that jBPM would for the business process. Some implementations use WS-BA, others don’t. But those that don’t have a coordinator somewhere to do the same thing; they just don’t bother with the interoperability. One of the earlier Web Services TX specifications (BTP) tried to mix business logic with the coordinator, but that wasn’t the right approach. The business logic should always be in charge, driving the coordinator appropriately. You might find this useful.
Aug 01, 2007 @ 21:45:50
Mark, seems BA is about further decoupling. With BPM, the process drives and must know about whether to compensate or not. With BA driven services, the client is decoupled from whether the resource must be compensated or not and how it is compensated.
Aug 01, 2007 @ 22:00:35
You know, I wish we had a whiteboard again or I lived in Boston ’cause I’m always sure we’d have good conversations π Anyway … in BA the client doesn’t know how a service compensates for the work that it did for the client, but the client definitely knows whether the business transaction (for which the BA provides the scope) needs to undo or not. So I don’t quite get the “the client is decoupled from whether the resource must be compensated”. BTW, I’m assuming resource is service/component/object/whatever and not the participant registered with the coordinator? Maybe there’s some disconnect there too: as with traditional tx, there’s a service/participant split in BA too. Clients talk to services/objects/whatevers that encompass the business logic, whereas the BA coordinator talks to participants (registered by the service or on behalf of the service somehow).
Aug 01, 2007 @ 22:04:01
I’m talking about business logic. In the BPM case, which is business logic driven, the business logic invokes all the compensation actions. In the BA case, doesn’t the framework coordinator handle all this? Meaning, all this compensation logic is hidden from the application developer?
Aug 02, 2007 @ 09:19:59
I think the difference is that in the BA case the client doesn’t need to worry about how to invoke a compensator on a service (or a number of services that must be compensated together or not at all). The information about how to compensate is down to the service developer and this is wrapped within a participant that then gets enrolled with the BA coordinator. In the case where you’ve only got a single service to compensate then something like BA is overkill (though an equivalent of onePhaseCommit would help there). But as soon as you enlist more than one service, it becomes too much overhead on the client developer IMO. BTW, BA also supports a “mixed” outcome. By default, the termination of a BA scope is atomic, so either all of the participants compensate for their services or none of them do. But you can change that so a subset are compensated and the remaining aren’t. The coordinator does this reliably too, so we fail safe.
Distributed Compensation with REST and jBPM « Angry Bill
Sep 18, 2007 @ 12:18:27