WebHooks, Syndication and the Programmable Web

By Kurt Cagle
February 19, 2009 | Comments: 6


A friend of mine, NASA systems analyst Joshua McKenty, dropped a note recently in my twitter feed about WebHooks, and why they're superior to syndication as a mechanism for building cross-server applications. While I have run into webhooks periodically in the last couple of years and been intrigued by them, Josh's comments made me go back and really think about them again. While I think that there are still a number of issues to be resolved, overall, I'm beginning to think that he may be right.

The concept of webhooks should be familiar to anyone who's ever had to do a callback function for an event handler. In an asynchronous situation, there are theoretically two ways that you get a function called when an event occurs. The first is to periodically call the event queue to see if there are any new messages of a given type, and if there are, then invoke the function to process that particular message. This is typically how syndication works - the client makes a call to the server to get an RSS or Atom feed, and if the feed is different from what it was last time, then you process any new messages on the feed.

The problem with this approach is that the client has to sit and poll the server periodically, which puts a small, but definite, load on the server even if nothing actually happens. You can cache content to get around some of this, but there will always be some processing involved.

The second approach, and the one far more universally used, is the callback. In this case, the message queue includes a hook that a programmer can then assign a callback function to. When the message pops up, the queue application calls each callback function, passing as parameters the event state. The asynchronous mode of the XMLHttpRequest object is a perfect example of such a callback mechanism - you assign a function to the OnReadyStateChange event hook, then ignore it until the contents from the server starts coming back. This callback may be called several times, depending upon the state of the download feed.

Suppose that you applied this same pattern to web services. For instance, suppose that every time a picture came up on Flickr for a given keyword (such as "cow"), you wanted that picture posted to a directory on a different server, then wanted a twitter message sent to your twitter feed and an email message sent to your email inbox. You could petition Flickr to set up an image posting service and adding twitter support, but chances are pretty good that not much will happen there. On the other hand, suppose that instead of this you asked Flickr to set up up a hook in their syndication, and every time a given feed was updated, then the RSS or Atom feed would be posted to a URL you specify.

I'll get into why they may be more amenable to this idea in a moment, but now its worth concentrating on how this technique would work. The feed gets updated, the feed gets posted to the URL. At this stage, you can actually process the feed to get the URL of the picture being posted then download that to the inbound directory of "cows". At the same time, you can parse this same feed to create a Twitter message, then send that out, and finally do the same with your email, perhaps even embedding the newly downloaded cow into the mail as mime content.

Now here's where things get interesting. Suppose that you could get twitter to do the same thing. In this case, the event hook would go back to your server whenever you got a response to your cow post, and the incoming message would then get converted into an XML block and stored in a list of favorable comments on your server, perhaps even filtering out the unfavorable comments by looking for specific terms.

What you've done here is created an application that spans four distinct servers. Your server acts as the orchestrator of these services, but you're gaining the benefits of Flickr and Twitter with comparatively little investment on their end. Put another way, once you enable such hooks (which have, thanks to the efforts of Jeff Lindsay at WebHooks.org, gained the moniker "Web Hooks") you can create server-side mashups in a manner analogous to the way that you create client side mashups.

Lindsay has become an evangelist for the use of web hooks, including putting together an engaging slide presentation showing entitled Web Hooks and the Programming World of Tomorrow. In it, he points out that the web hook paradigm keeps getting reinvented to the extent that many social networking services provide some form of web hook mechanism: Facebook, PayPal, Digg and others all use the concept of web hooks in some form, though currently there is no consistent implementation.

One immediate corollary of web hooks is that if you have some way to map organizations' web hook inputs and outputs, then it becomes possible to abstract these out as pipes, then string these pipes together as pipelines. There are two approaches that can be taken here. The first is that the POST URL of the first item in the post could send the feed directly to the second. Given the sheer variety of different hook implementations, this generally doesn't work. The second alternative is that a controlling server acts as the orchestrator to these pipes, accepting the output of the first pipe then mapping it to an internal format that can then mesh with the input of a the second pipe.

There are a number of advantages to this approach, not least of which being that you can temporarily cache content in more complex pipes until either results or error codes return from requests. This also means that you can turn complex code applications into easy drag and drop GUIs, something that Yahoo Pipes first started exploring a couple of years ago and that more recently Tarpipe.com has tapped, again almost certainly using a form of web hooks to create orchestrated applications.

Will Web hooks replace syndication? I'm not all that sure of that, and I'm more inclined to suspect that web hooks and syndication will likely end up forming a symbiotic relationship with one another. Web hooks proponents for the moment tend to think of syndication servers as being processors that do nothing but wait for incoming requests, but from a systems management standpoint, a request for an RSS feed is not that much different (identical, for that matter) from a request for a web page. Most RSS is cached, sometimes several levels deep, which means that in the case of static content, the overhead for retrieving RSS is probably an order of magnitude lower than in invoking a live service feed, even with the use of cron or similar time management process.

Where things get more interesting is in the case where data changes frequently, thus placing a fairly significant overhead on the generation of news feeds. An example of something like this would be Twitter, or possibly updating a feed from a high frequency GPS transmitter. In this case, disabling caching and syndication overall and going with a web hooks alternative makes a great deal more sense. As with most emergent web technologies, it is likely that it will take some time for best practices in this regard to determine which regimes work best where.

Web hooks should also be seen as part of an overall continuum. In some respects, web hooks handle the same types of problems that the XMPP/Jabber protocol does, but in general XMPP is far more performant, albeit at a considerably higher level of complexity than web hooks or syndication. This means that while XMPP makes sense when dealing with high bandwidth, large data repositories working along dedicated channels, web hooks in general may be superior for orchestration along HTTP.

It is interesting to note as well that the preferred form of web hooks seems to be very RESTful in nature; a webhook URL can be seen as a virtual collection or pipeline (something I will be covering shortly in an upcoming series on restful services) that respects the GET, POST and DELETE verbs constraints.

For this reason, I also think that the W3C (and the IETF Atom working group) should pay particular attention to web hooks development. Indeed, it is not hard to envision a scenario where XProc, the XML Pipelining Language, is used as the abstraction mechanism for handling pipelining within certain scenarios (I see this especially in the domain of XQUery and XRX applications, where an XML pipelining architecture makes most sense and the tools exist within an XQuery extension API to handle the requisite processing.

The primary challenge then comes in establishing a standards-based mechanism for invoking such web hooks. While I can definitely see the influence of the AJAX crowd in the development of web hooks, the AJAX community in general isn't standards oriented, and I suspect that this is one area where getting a consensus of opinion on APIs could do wonders to promote the concepts and technologies.

Web hooks are an intriguing answer to a very real problem - how to effectively create orchestration of web services among disparate, REST-centric data providers and consumers. As with a number of community-driven Internet services work (including RESTful services, syndication, AJAX and microformats), web hooks seem to be emerging as a simpler, HTTP-friendly alternative to the heavily engineered WS-* and ebXML initiatives, geared towards the bulk of web users and providers who don't need the specialized capabilities that the SOAP stack provides. It should be interesting to see how this plays out over the next several years.

Kurt Cagle is Online Editor for O'Reilly Media, and independently runs XML Today. You can subscribe to his O'Reilly posts at http://broadcast.oreilly.com/kurt-cagle.atom.xml or follow him on Twitter.


You might also be interested in:

6 Comments

Lindsay's presentation is on the "programmable" world of tomorrow, not the "programming" world of tomorrow as you typoed, FYI :)

Kurt,

Two comments, a nit, and a thank you. First, the comments:

1. Where Webhooks particularly shine, in my view, are in the more refined types of requests - especially those that are likely to be infrequently updated, such as (on Flickr) the comments on an image, or a search for an image tag, by a specific user. Why would flickr want to be polled every 60 minutes, for content that likely will change once a week? Similarly, why would they want to maintain an XMPP connection (and the attendant fountain of bandwidth) for such a request?

XMPP is great for large volume, low-specificity connections - it's not ideal for high specificity, low volume ones.

2. Timothy Fitz, another of the now-growing legion of "Web Hook"ists, has a great rant about why HTTP should be used in almost ALL cases (http://timothyfitz.wordpress.com/2009/02/12/why-http/) - and he specifically compares it to XMPP. I'll let you read his arguments, which leads me to the nit:

NIT: XMPP is *not* performant. The reason *everyone* thinks it is, is because of Erlang. The DEFAULT XMPP server is eJabberd - it's written in erlang, and derives all of it's wonderful, massively concurrent behaviours from the language upon which it was built. There's nothing magical about a continuous stream of XMPP between two points that's somehow better than a call-and-response of XML, (or JSON, UUENCODED binary, etc).

Now the thank you (and a disclaimer) - I've poached Jeff Lindsay for my team at NASA, so it was partly on his behalf that I pinged you about syndication. But thanks anyway for the shout-out.

PS - Mike, give Kurt a break. Or at least throw in some substantive criticism; engage the debate. It wasn't that serious a typo.

I have to get used to the fact that not all blogging platforms autolink (although all of them SHOULD). Timothy's blog post should have read:

http://timothyfitz.wordpress.com/2009/02/12/why-http/

Again, thanks for spreading the word, Kurt! You totally get it. I wrote up a response at the Web Hooks blog. :)

What about recursion? Don't you think it's a concern, given hooks low latency? I see it as an enourmous risk for web traffic, not only because of direct recursion (A triggers B triggers A) but indirect recursion (A triggers B triggers C triggers ... triggers A) as well. You know how creative and not so careful programmers are!

Josh,

Thanks for the initial pointer - I've been moving that way myself on some of the XQuery work, but I hadn't really realized the extent to which it is becoming a meme.

Also, I understand on XMPP. We've developed a number of ad hoc solutions for creating connected sessions over the years, and while I understand that most of the processing performance behind XMPP comes from Erlang, the protocol does seem to be taking off as a way of dealing with transactional systems in a consistent, standardized manner, and it's definitely more RESTful overall than the WS-* stack in that regard.

Jeff,

I'm glad that Josh picked you up for his team (as you've probably discovered, he's one of the good guys), and I look forward to more ideas coming from your direction.

Ion,

Most web hook servers generally keep a trace on the number of pings against their server from a given source, and will throttle it down if it becomes excessive. In honesty, I don't doubt that you'll have deliberate loops be formed via web hooks - indeed, this may be the only solution for situations where you want a continuous monitoring loop, for instance without having to maintain a cron at some point in the circuit.

News Topics

Recommended for You

Got a Question?