Buggy/Broken Tomcat 6 Comet NIO APIs
Posted by billburke on November 12, 2008
I’ve had some bad experiences lately with the Tomcat 6 Comet APIs with the NIO Connector. I have a simple COMET servlet that spawns a thread and waits 5 seconds to respond to a Firefox browser. I also call event.setTimeout(10000) (timeout on the event of 10 seconds). After the response is finished writing I call event.close(); When using the Tomcat NIO, I don’t see a response until after event.setTimeout() time. So, if I set the timeout to 100 seconds I don’t see a response in the browser for 100 seconds.
Maybe I’m not adding some undocumented secret sauce, but when I test this out with JBoss Web and our APR connector, it works as expected: I see a response in the browser after 5 seconds. I also had a go with Servlet 3.0 through Jetty 7.0 pre3. Works as expected there as well.
This isn’t the only problems I’ve had. I also had a ClassCastException when I add a ServletFilter in front of a COMET Servlet. The APIs assume that all your filters implement CometFilter. Granted, I experienced this particular problem in the version of Tomcat 6 that ships with JBoss 4.2.3.GA, but after trying out the timeout problem described above with Tomcat 6 binaries from apache.org, I didn’t bother going any further to see if this was a JBoss specific problem or not. I did test to see if JBossWeb has the same problem…As expected it DID NOT.
So what’s the deal here?
- I’m just stupid?
- Nobody uses Tomcat 6 COMET + NIO APIs?
- Everybody who is doing asynchronous HTTP is using Jetty?
- Asynchronous HTTP is just hype generated by Greg Wilkins to breath new life into his increasingly irrelevant company?
My personal guess is that 2-3 are the main problems, with #1 a high possibility, with a sprinkle of 4. Anybody out there have any experiences with Tomcat 6 + COMET + NIO?
As a result of this, I’m not going to support COMET + Tomcat NIO with RESTEasy’s Async HTTP APIs. Only Jetty and JBossWeb.

Greg Wilkins said
Bill,
Well if Async HTTP is just hype, then thanks for the puff of hot air for my irrelevant company.
But async is hard and only of benefit to a few specific use-cases. Most of the first attempts at APIs have not been the best. Jetty-6 Continuations definitely have their problems, but Tomcat CometProcessors almost completely missed the point. Even the current draft of Servlet 3.0 leaves a bit to be desired. But I’m hopeful the next draft will address the few remaining issues.
Eventually Framework developers should be able to hide the complexities from the average developer. DWR, JSR, Cometd and BlazeDS have all been ported to the draft servlet 3.0 API (and to Jetty-6 Continuations), and hopefully that trend will continue. Developers should be able to use messaging APIs and not have to deal with the details of the transport, so that eventually things like WebSocket will improve the infrastructure below applications.
billburke said
From the little I’ve dealt with it the past few weeks, I think the difficulties of async HTTP in Java are self inflicted. Jetty 6 continuations were just so bizarrely implemented (using exceptions as control flow?) that I refused to integrate it with Resteasy. The Tomcat COMET APIs are at least functional and straightforward, its too bad the documentation (and implementation, well, except for Remy’s of course) is so horrible. Servlet 3.0 is much better, but all the retry stuff is just bizarre and I don’t see why somebody would need or want the complication. I’m glad at least you guys have a pure Servlet 3.0 implementation or I’d be forced to use JBossWeb with a native plugin to use async HTTP at all. (Remy refuses to write a NIO transport).
As far as COMET itself goes, I also don’t like it as it is tunneling a proprietary protocol over HTTP, which is an architectural no-no in my book. You can still get 90% of the performance benefits using pure HTTP over the async APIs. WebSocket will probably take forever to take hold and will meet strong resistance from IT.
Still, I believe the performance benefits of async HTTP are real. The few specific use-cases you talk about will crop up more and more as AJAX applications come online. As REST becomes more popular, you’ll also see HTTP being used as a messaging protocol.
Greg Wilkins said
Bill,
the retry request stuff is purely for applications that wish to use filters, servlets and JEE style stuff to generate responses. Ie asynchronous wait for a webservice response and then generate a JSF page more or less normally.
there is definitely another style of async HTTP that does not want or need to deal with filters, servlets and JEE stuff. For them, servlets are just a PITA and retry even more so. I certainly encourage new HTTP api to be developed that are more suitable for async. But for developers that are invested in servlet technology, the retry style of async makes many of the benefits available with out the need to switch to an entirely new way of generating content.
Rajiv Mordani said
Bill,
I would definitely like your feedback on the Servlet 3.0 async API. I am working on the Public Review of the spec (don’t look at the EDR that API is no longer valid). You can get access to the current proposal from Remy or wait for a few days and hopefully it will be public. I also agree that the retry is bizarre but that is a use case that Greg has been bringing up over and over again. I would like to get more feedback on the whole API.
billburke said
Rajiv, unrelated, but something I’d *really* like to see is JAX-RS expressions available for security, servlet, and filter mappings. I’ll look at Remy’s proposal.
Rajiv Mordani said
Bill are you referring to the URI templating stuff? Send me email about what you are thinking.
Rajiv Mordani said
Also get the proposal from Remy. It isn’t Remy’s proposal
.
Franck Wolff said
GraniteDS (Flex/JEE alternative to BlazeDS/LCDS) has Comet support for Tomcat 6 (CometProcessor/APR) and Jetty 6 (Continuation).
Tomcat 6 implementation (with APR) has been hell: unexpected invalid CometEvents, very complicated race conditions, no relevant support in Tomcat forums, etc. On the other hand, Jetty 6 was much more easier, mainly because it provides *true* samples. The Tomcat sample doesn’t even compile and, when compilation problems are fixed, doesn’t work (not to say that it is so basic that nobody can really start from it for a real world application)…
So, after 2/3 weeks of nightmare, I was also thinking that #1 was highly probable, but, because it’s (seems to be?) working now, I would say that #2 is the true problem (Tomcat developers included)…
billburke said
I talked to Remy about this earlier. When he refactored “Comet” within Jboss web, he found a number of bugs of which I mentioned in this blog and fixed them in JBoss Web.
jmarranz said
I agree with you, Comet using servlets seems to be immature.
The good news is Comet (long polling) is being to be standardized on Servlet 3.0 and project Atmosphere promises to bring us an unified API on top of the different servlet based asynchronous approaches for Comet.
Meanwhile we can do long polling using a synchronous approach (by holding the thread). This works and seems to scale too.
Servlet 3.0 Public Review Sparks a Debate | Xingyu.zhang's Technology Circle said
[...] Rajiv also refers to a blog post that Bill Burke from RedHat had made, where he criticized the implementation of async servlets in Jetty 6 continuations. [...]
Yakov Fain said
We’ve had great performance results with Jetty/BlazeDS/Servlet3.0. You can read more at http://flexblog.faratasystems.com/?p=368
Paul Eftis said
A possible answer to the problem (i.e. secret sauce
. I am working on implementing a comet/server-push based solution, and use Tomcat and Jboss, so am trying to get Tomcat 6 working. I am wondering if after you finish writing the response, do you call writer.flush()? If not, this may be the answer…….on timeout I am guessing the response is automatically flushed, which would explain why the response is not received until timeout.
Thought I would mention one other thing as I find all this comet stuff to be an amazing deja vu…..we implemented servlet-push ten years ago for a http based chat-presence system we built, using Java 1.1 and an applet for the client interface, where we implemented a long-response approach to deliver sent messages and presence updates, and then applet would reconnect. We also implemented nio (yes, with JDK 1.1……we modified src.jar/rt.jar and modified Java base classes to use JNI and native socks/winsock library to implement socket select). It tickles me to read this stuff now as it appears we weren’t so crazy back then, eh?
billburke said
Paul,
Yes, i flushed the toilet.
Vlad said
I have been trying to debug comet problems in Tomcat 6 (6.0.16).
It works OK most of the time but there are race conditions in both APR and NIO connectors which lead to Tomcat closing down sockets when it shouldn’t. If you set up clients and the Tomcat config to use HTTP keepalives you will see that occasionally a client request will fail because of a TCP RST. If you use the APR connector with HTTPS this will also show up as a SEGV in openssl function: ssl3_write_pending.
If you look at the Tomcat code in org.apache.coyote.http11.Http11AprProcessor#proces(), for example, you can see that the SocketState to return can depend on ‘comet’ boolean value. This value can be set from within process() or event() or in the action() callback. It is set in the action() callback from Response.finish() when closing a writer / stream.
The comet flag is not protected by locking but can be set from multiple threads if the response is sent on a separate thread. It is not thread safe. Code in the process() and event() functions appears to assume that the comet flag will not be set by another thread but that is possible. Also, because it is not protected by locking changes made in one thread may not be visible to another thread, or only become visible after an indeterminate interval (esp. on multi-processor machines).
Implementing async API – which is inherently multi-threaded – in a non-threadsafe way is crazy.
This one problem can be worked-around by making Http11AprProcessor methods process(), event() and action() synchronized. This locking protects the ‘comet’ flag. May be similar fix for the NIO connector.
billburke said
@Vlad
Try out JBossWeb. Remy has done a lot of work to stabilize this stuff. Would be interesting to know whether or not this is true
nisse said
But Remy is one of the most incompetent programmers around.
the code he produces show very silly beginner mistakes, and a complete lack of understanding of any more advanced concepts, such as basic threadsafety.
he shoud be draged outside and shot in the head.
billburke said
@Nisse:
I guess the whole Tomcat community is screwed then. The Tomcat Comet guys don’t seem to know what they are doing or how to fix the problem. And Remy is an idiot
Not my experience with him, but to each is own.