Spawn: reliable background processing for rails

Since I've got time to spare and free WiFi (!) here at the new Bangalore International Airport, I figure I might as well catch up on my blogging. At the top of my list is Spawn, the latest bit of Rails plugin goodness my colleagues introduced me to.

Rails, being single threaded, has always been plagued with the problem of long running tasks locking up a mongrel until they're done. Need to notify a dozen users by e-mail when some event occurs in your app? Kiss a mongrel instance good-bye for the thirty or so seconds it takes to send those e-mails. This adversely impacts server response times, making applications seem sluggish even when not serving very many users.

You hardly need something complicated like backgroundrb or some such to solve this problem, but sadly, the Rails world lacked anything simpler - well, at least until Spawn came along.

Spawn has been around since September 2007 and is now on version 0.9. It's quite simple to use - pick any block of code you wish to run as a background process, pass it to Spawn and it will fork a separate process for it to run in (it also takes care of stuff like creating a new ActiveRecord connection for that process). This of course assumes you're running a *nix system; Windows and JRuby users can choose to follow the less reliable path of creating threads instead, and would have to set config.active_record.allow_concurrency = true. Forking also has the advantage of allowing you to use a multicore machine more effectively - the OS will take care of scheduling forked processes across cores, while Ruby's green threads are invisible to it and so only use a single core. Look to the Spawn README for more detailed explanation of the relative merits and demerits of fork vs. thread.

An obvious limitation of Spawn which I'd like to point out is that it is limited to one machine. If you're working with a cluster and are looking to distribute the load across different machines then you need some sort of message queue (Starling) or tuple space (Rinda).

Finally, a word of caution - imagine the e-mail scenario described above scaled to five hundred users needing notification. Assuming you've wrapped the call to ActionMailer in a spawn block, you'll have five hundred new processes forked, each with it's own AR connection, each hogging upwards of twenty megs of memory. In such a situation, starvation and thrashing on the server are guaranteed, so be sure you've examined the situations where spawn will be used very carefully.

PS: All advice is based on actual experiences with Spawn.
Post a Comment