Over the past few months I’ve been leading up to something. Although I was researching and diving into specifics of compensations, jbpm, and REST, I had a much grander purpose in writing these specific blogs:

These blogs were specifically written as a thought exercise in what it would take to write a coordinated distributed application. What pieces would we really need? Where does WS-* fit in? How about REST and BPM? Do we really need a business activity framework? How can we write scalable and maintainable distributed systems? I want to pull together my thoughts on the answers to these questions in this blog entry as well as ask a few more.

Maintainability: REST over WS

One of my concerns is the complexity of the WS-* layer and how it affects maintainability and evolvability of a distributed system. WS-* requires both the client and server to have the necessary middleware components and libraries installed to work. A distributed application usually means that it is an integration project. You are probably integrating with systems you do not control. There is probably a good chance you would not be able to bring down these systems to install or upgrade their middleware architecture. They may be poorly maintained or even not maintained at all. This is where the idea of REST comes in to simplify things and make things more usable over time. Since it has constrained, well defined interfaces over a mature and stable protocol, it has a higher chance of letting your distributed applications evolve as the WWW has evolved.

What I continually like about REST over HTTP is that there is little to no requirement on either the client or server side for additional libraries or generated stubs other than a http client library and a http server. This doesn’t mean that you cannot use a client and/or server side framework to implement your RESTful services or connect to an existing service. You are free to use whatever you want to make yourself productive, but not at the expense of requiring the client or server to have anything installed other than HTTP support.

Flexibility: BPM vs. WS-Business Activity

Earlier this year Mark Little introduced me to the idea of compensating transactions and a framework that could manage such an application. My first thought was that this stuff was interesting, but my intuition was telling me that compensations were something that might have to be modeled rather than managed. That compensations were more of a business process than something that could be coordinated automatically. My feeling is that compensation could easily become too complex for a transaction manager to handle generically. That if you modified a transaction manager to handle these more complex scenarios it would quickly start to look like a bpm framework.

Take an abort event for example. The coordinator in a compensating framework like WS-Business Activity has to make decisions on what order to perform a compensatory action on the resources participating in the activity. A default LIFO ordering seems very appropriate on the surface, but it falls apart in even the simplest of examples. For instance, let’s revisit the example I gave in my previous blog of the ordering of a computer at an online retailer. The business process might look like this:

  1. Bill credit card
  2. Pull parts from inventory
  3. Assemble computer
  4. Install software
  5. QA
  6. Ship

Let’s say the customer canceled the order after it had reached the QA state, #5. Having the customer wait for a LIFO compensation to receive refund on their credit card is just not acceptable. They will not want to wait for installed software to be wiped, the computer disassembled and parts sent back to inventory before they receive their refund. Therefore a 1-4-3-2 ordering of compensations might be more appropriate. Let’s expand this scenario even further. What if the customer paid in cash? The cash refund process might take just as long as disassembling the computer would. Maybe it would make more sense to do the refund in parallel with the other compensations? So, state #1 could be done in parallel with 4, 3, and 2, but, 4-3-2 still has ordering dependencies. You can’t disassemble the computer before you’ve wiped the hardrive. You can’t send parts back to inventory before you’ve disassembled the computer. So, you see, in even the simplest of business activities, you can have variable and complex compensation ordering. This is why I think a bpm engine like jBPM might be a better solution to manage compensations.

When I had discussed this with Mark, he made a good point that you are just replacing one coordinator(BA) with another (jBPM). You still need a coordination engine. Although he is semantically correct, I think the solutions are very different. With BA you have something generic and decoupled from the business process. In a distributed system, it is also a defined contract that both the client and server must obey. Which brings me to my favorite quote in the O’Reilly RESTful Web Services book:

The client should be in charge of managing its own path through the application

While WS-BA puts the onus on both the client and server to have WS libraries that are of the same version and can interact with one another. This all works beautifully in controlled environments. You have a very nice decoupling between compensations and business logic in both client and server side code. But…. What happens when BA versions are out of sync, there’s interoperability problems, or even worse, one or more of the services you have to coordinate does not support WS-BA? With jBPM, success and failure paths are a part of the business process. There isn’t necessarily a contract between the client and server as the client is responsible for knowing how to compensate and to perform the compensation. You still have strong decoupling in a bpm driven activity. The difference is that instead of compensation being hidden by the server, it is visible, and in your face, embedded directly in the business process. With jBPM, you can look at the process diagram and know exactly the flow of success and failure conditions of your distributed application. With BA, you would have to dive into the documentation and code to determine how your application functions.

In his blog in late August, Heuristics, one-phase commit and compensations, Mark Little talks about his early compensation framework and its interaction with heuristics:

In both of these problem scenarios what typically happens is that someone (e.g., a system administrator) has to get to grips with the inconsistent data and figure out what was going on in the rest of the application in order to try to impose consistency. One of the important reasons this can’t really happen automatically (at the TM level) is because it required semantic information about the application, that simply isn’t available to the transaction system. They compensate manually.

Until then. What we were proposing was allowing developers to register compensation transactions with the coordinator that would be triggered upon certain events, such as heuristic outcomes or one-phase errors. And to do it opaquely as far as the application developer was concerned. Because these compensations are part of the transaction, they’d get logged so that they would be available during recovery. Plus, a developer could also define whether presumed abort, presumed commit or presumed nothing were the best approaches for the individual transaction to use (it has an affect on recovery and failure scenarios).

Reading this fed more fuel to my idea that compensations really belonged as part of a process model. Transactions that have heuristic outcomes many times require human intervention as well as action by the framework itself. Yes, there’s a lot a transaction manager could do to make it easier to record business activities, but you’re still going to have interact with a real person to resolve many issues. jBPM is great at modeling work that requires a human.

But what about reliability?

Great, so we’re going to use something like jBPM as both our coordination and compensation engine and REST to expose our services on the network. Can we trust these things to provide us with the reliability and consistency that we expect out of something like WS and WS-BA? If you’re modelling compensations in jBPM, you’re gonna need the ability to reliably perform these compensations. Luckily, I’ve already answered how you can guarantee execution and failure handling in jBPM. But what about REST, what does it add? If you have truly written RESTful services, you have a lot of reliability and consistency built into your application. HTTP GET, PUT, and DELETE are supposed to be idempotent, meaning that it doesn’t matter how many times you’ve invoked them, the result is the same. While jBPM can guarantee you execution and failure handling, it can’t guarantee that these events won’t be duplicated. Since your RESTful services should be idempotent, you won’t have to worry about it! (I know, easier said then done.)

Where does that leave WS-BA?

I did discuss my ideas a bit with Mark. One of his comments that struck home was “Imagine if you had to explicitly tell all of your XA resource managers to prepare and commit?” I’ll admit that in a WS-BA world, writing a client that integrates a bunch of distributed services into one business activity would be uber simple and clean. The work being done with the JBoss Business Activity Framework simplifies the server side greatly as well. A compensation framework like JBoss Business Activity, might work really really well in the controlled environment of a web application. Until I actually write one of these applications or talk to a bunch of users and customers that have used WS-BA to model their integration project, I will remain skeptical. I just hope that I can get the feedback I need to further formulate my thoughts on this subject. Your thoughts and experiences??????