Spawn: reliable background processing for rails

Since I've got time to spare and free WiFi (!) here at the new Bangalore International Airport, I figure I might as well catch up on my blogging. At the top of my list is Spawn, the latest bit of Rails plugin goodness my colleagues introduced me to.

Rails, being single threaded, has always been plagued with the problem of long running tasks locking up a mongrel until they're done. Need to notify a dozen users by e-mail when some event occurs in your app? Kiss a mongrel instance good-bye for the thirty or so seconds it takes to send those e-mails. This adversely impacts server response times, making applications seem sluggish even when not serving very many users.

You hardly need something complicated like backgroundrb or some such to solve this problem, but sadly, the Rails world lacked anything simpler - well, at least until Spawn came along.

Spawn has been around since September 2007 and is now on version 0.9. It's quite simple to use - pick any block of code you wish to run as a background process, pass it to Spawn and it will fork a separate process for it to run in (it also takes care of stuff like creating a new ActiveRecord connection for that process). This of course assumes you're running a *nix system; Windows and JRuby users can choose to follow the less reliable path of creating threads instead, and would have to set config.active_record.allow_concurrency = true. Forking also has the advantage of allowing you to use a multicore machine more effectively - the OS will take care of scheduling forked processes across cores, while Ruby's green threads are invisible to it and so only use a single core. Look to the Spawn README for more detailed explanation of the relative merits and demerits of fork vs. thread.

An obvious limitation of Spawn which I'd like to point out is that it is limited to one machine. If you're working with a cluster and are looking to distribute the load across different machines then you need some sort of message queue (Starling) or tuple space (Rinda).

Finally, a word of caution - imagine the e-mail scenario described above scaled to five hundred users needing notification. Assuming you've wrapped the call to ActionMailer in a spawn block, you'll have five hundred new processes forked, each with it's own AR connection, each hogging upwards of twenty megs of memory. In such a situation, starvation and thrashing on the server are guaranteed, so be sure you've examined the situations where spawn will be used very carefully.

PS: All advice is based on actual experiences with Spawn.

7 comments:

Wes Maldonado said...

I've been using async_observer http://async-observer.rubyforge.org/ which uses beanstalkd to do this and like that I can run light weight workers to do the processing if I don't need the complete rails environment. I'll try to post more on my blog and will ping you when I do, gotta jump on a plane now though.

Unknown said...

Thanks for the tip, Wes. I'll be sure to check it out.
I've also just tried out Skynet, and man, initial impressions are very very good. I'll have more concrete info shortly.

Anonymous said...

An alternative to spawning another process is to serialize the code block and send it to a background process for execution. This has the advantage that the background processing is not limited to one machine.

The background plugin (http://devblog.imedo.de/admin/articles/show/23) does allow exactly that. It does not implement a communication protocol, but build on top of existing protocols to send the code to execute to the background process. Another important thing to mention is that it is failsafe: if for some reason, the background process does not respond, it is possible to execute the code block in-process or to write it to disk for later replay.

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...

How do you test the code in the spawn block???.

Unknown said...

Simple answer: You can't test across the spawn process boundary.

Workaround: Of course, first you write tests for the unit you're running in a forked process separately. Then you stub the spawn helper call to not do anything at all. This disables the forking of processes for that test. That way you can do some integration testing.

Ideally, the stuff in the spawn block should be completely independent and shouldn't require an integration test across the process boundary.

Unknown said...

PS: You can stub the spawn helper to do nothing (i.e. not fork, but simply execute the code block) by setting Spawn::method :yield in your test.rb config