Recently, a RESTEasy user had a need to transfer a lot of data as efficiently as possible and found Google Protocol Buffers. He came across these blogs questioning the viability of Goggle Protocol Buffers within a RESTful architecture: Ryan’s, Vinoski’s, and Subbu’s. Although I am very late to the discussion, I thought I’d blog about my opinion on the subject to give my RESTEasy user some form of answer.
Who cares if its RESTful or not?
Firstly, who cares if it is RESTful or not? Does PB fill a need? If so, don’t worry about the rants of some academic or some architect that hasn’t written a line of code in years (just to be clear, neither Steve, Subbu, or Ryan fall into this academic/non-coding-architect catagory!!!). REST is described as an architectural style, a set of guidelines, a set of attributes on what the uniqueness of the Web is. The key words are style and guideline, NOT laws! Whether you’re borrowing from object-oriented, aspect-oriented, or RESTful principles and guidelines, there’s always going to be tradeoffs you have to make. It is always much more important to get a working, maintainable, on-time, on-budget system than to satisfy some academic checklist.
Treat Protocol Buffers as a Representation Format
In Steve’s blog, he rants that PB is just a starter drug that puts you on the path to RPC crack-cocaine. Steve says:
In fact, if Google had stopped there [as a definition of a message format], I think Protocol Buffers could be a superb little package.
I agree with this statement. Treat Protocol Buffers as a representation format only. Follow REST principles when designing your interface. Don’t use the RPC junk available in the PB specification.
Not Self-Describing
Ryan makes a good point that PB is not self-describing. IMO, this is a weak argument. Unless your clients are going to be rendering agents, self-description is pretty much a pedantic exercise. Code-based clients generally can’t make many dynamic decisions so self-description information is pretty much useless to Code-based clients. They have to understand interactions and formats before hand, or they just can’t work.
Subbu, in the comments section of his blog, and Ryan suggest that custom media types are going to have to be defined to satisfy self-description. Because PB is very brittle (I’ll get into this later) you’ll need to define custom (and versioned) media types to support both older and newer clients. Something like:
application/customer+x-protobuf
and/or even embed a URL pointing to a .proto file:
application/x-protobuf;format=http:/.../customer.proto
Not Hypermedia Driven?
Ryan’s statement that Protocol Buffers is not hypermedia driven because:
Protocol Buffers do not have a means to describe links to external messages.
This is untrue. If you’re exchanging PB representations over HTTP, there’s no reason you can’t embed a URL within a PB message body. i.e.
message BookOrder { ... repeated Link links = ...; message Link { required string url = 1; optional string type = 2; } }
You have to declare these same kinds of “types” within JSON and XML as well, so I don’t see an issue here.
Stubs means its UnRESTful?
I have to disagree with this point as well. Stubs are just a means to productively interact with the data format. You have this issue with XML and strongly typed languages. Is using XML schema generated JAXB classes in Java any different here? IMO, no.
Protocol Buffers is a Very Brittle Format
IMO, perhaps the most compelling reason not to use Protocol Buffers is that it is a very very brittle format. You need to have access to .proto file metadata to parse a PB message body. Because PB defines a very strict message definition you’re going to have a hard time having older and newer clients co-existing as you add more information to each different message format. XML and JSON are much more forgiving formats as generally you can ignore extra information. I’m not sure this is the case with PB.
Edited 10/26: I was wrong, PB is not so brittle. Read Bjorg’s comments below. Apologies for only scanning the PB documentation. While the stub requirement does make it a little brittle, it does seem you can design your messages to be backward compatible.
As Ryan states in his blog, this brittleness may violate RESTful principles. IMO though, it should always be a measure of less RESTful vs. more RESTful, rather than the black and white approach of is RESTful vs. is not RESTful. This is because, again, there’s always going to be tradeoffs you have to make when implementing your applications. If you’re following RESTful constraints when applying Protocol Buffers in your implementation, it should be fairly easy to move to less-brittle types like JSON or XML if you no longer find the need to use an ultra-efficient message format like Protocol Buffers.
Conclusion
Protocol Buffers can be used RESTfully if you treat it solely as a representation format. You can still embed URLs within message definitions to make your representations hypermedia driven. PB is brittler format than what we’re used to and you may have versioning issues as your interfaces evolve. Unless PB radically improves the performance of your system, you should probably stick to using formats like XML or JSON as its probably going to be easier to support them across the variety of languages that are now used within the industry.
Oct 26, 2010 @ 04:36:28
A ProtocolBuffer package can contain fields not captured in the .proto file. They will simply be ignored. Google being masters of distributed computing would probably not define a format that would require all servers and clients to be upgraded simultaneously.
Oct 26, 2010 @ 13:46:34
But don’t you still have to worry about field ordering? i.e. if you added additional fields to an embedded message wouldn’t this screw up older clients?
Oct 26, 2010 @ 13:47:20
I should say if you added fields to a nested message.
Nov 06, 2010 @ 20:13:15
Bill, yes you do have to worry about ordering, but generally this isn’t a problem. Ideally, a new version of a protobuf definition would not redefine an existing tag with a new value. Assuming that version 1 is something like so:
message Person {
required int32 id = 1;
required string name = 2;
optional Email email = 3;
}
message Email {
required string label = 1;
required string emailAddres = 2;
}
If we wanted to add a prefered property to the Email message, one should not reuse an existing tag for obvious reasons. The correct way to do it would be to append the new field in version 2 as follows:
message Person {
required int32 id = 1;
required string name = 2;
optional Email email = 3;
}
message Email {
required string label = 1;
required string emailAddres = 2;
required bool prefered = 3 [default = false];
}
If a client parsed a message that was produced by version 2, the prefered field would simply be ignored. Google have done a pretty decent job at addressing a lot of the versioning issues.
As for the stubs argument, yes you can generate Java classes from an XML schema or DTD, but your don’t have to. You could process the XML message dynamically. I can process a JSON object with or without bound classes. The difference with Protobuf is that you ALWAYs have to generate classes. The message is useless without the genrated code.
More on stubs and self-description later 😉
Apr 28, 2011 @ 21:36:34
@Ryan J. McDonough
I’m a little late to the discussion here, but I thought I’d note that you do NOT need to generate code to work with a protocol buffer message. All you need is access to the FileDescriptorProto – the definition file – at run time. Protocol Buffers provides a mechanism to read FileDescriptor’s (the proto file) as a protocol buffer message and then dynamically read and construct messages based on that descriptor. I do this all the time through the Java API… no compile-time awareness of the message structure at all. You do, however, need the protobuf libraries to accomplish this and not all languages that support protobuf support dynamic messages.
Oct 26, 2010 @ 14:23:18
If you look at the Protocol Buffer definition you’ll see that nested messages follow the same structure. Protocol Buffers use numbers instead of tag names for fields. Each nesting level has its own numbering scope, making it possible to add fields at any level of the structure. You can even declare ranges of numbers to be available to 3rd party extensions so that the final structure of a Protocol Buffer messages is a composition of multiple definitions.
In short, I would say that because server and clients can evolve independently and that the format allows for 3rd party extensibility, it is rather robust.
Oct 26, 2010 @ 14:35:23
Thanks Bjorg. I edited the blog to reflect your comments.
BTW, i’m wondering if CORBA’s CDR might be interesting to re-use. They have very good message description facilities (i.e. the Interface Repository), and well, its a standard. IIRC, the format is brittle and you have to worry about versioning, but hey its a standard 😉 and, well, I used to work for Iona… ;p
Jan 27, 2011 @ 11:59:53
Then do not call it REST. It is that simple. And saying that some academic has not written a line of code detracts from what you are saying. REST has a clear enough definition. PB is not RESTful, so don’t call it so!!
Jan 27, 2011 @ 14:56:22
Given your strict definition and usage, there are very few things on the internet (that aren’t just static web pages) that can be called RESTful or REST. So, I therefore ask fundamentalist RESTafarians to stop using the Web as their prime example of a RESTful system.
In Roy’s PhD, he describes REST as a style and set of architectural guidelines, not laws. If he, and others like yourself, continue to treat his PhD as a set of laws (or even worse, to selectively bless something as REST or not), then, IMO, Roy should make a REST 2.0 PhD, declare them laws, and officially trademark REST so that idiots like myself don’t pollute your sacred cow.
Jan 27, 2011 @ 15:54:27
PB is just as restful as JSON/XML/HTML. As a matter of fact, the structure is virtually irrelevant. One could build a RESTful system with CSV. It’s really that simple. IThe format doesn’t matter, only its descriptive capabilities which are, by definition, always captured out-of-band.
May 30, 2011 @ 04:45:01
Thank you very helpful, and all the comments too. I am about to start new work doing etl in python with a PB data source. I’ve downloaded a proto compiler, read Google’s tutorials and some of the other docs and blogs surrounding the subject.
What I’d like to know now that PB has been around for a while is how robust and easy to work with (specifically in Python) is it really? Are there commonly known limitations to the windows or linux implementations of the def generators, methods that don’t work as advertised, etc?
The only other such generators I’ve worked with lately were JSON, and they were very problematic not so much because code didn’t run as it should, but because the data could easily fall foul of the standard as parsed by available python JSON libraries. It seems the PB is much more straightforward in that way. Being binary I would think that it either parses correctly or fails totally without the partial success that seems to characterize my JSON experience. Does this make sense? Is my instinct correct? 🙂
thanks.