Ruby blocks gotchas

New to blocks in Ruby? RubyMonk has chapters covering both introductory topics as well as more detailed lessons on blocks. Do try them out!
There's this thing they say about Ruby - everything is an object. It's true, with very few exceptions, one of them being the block. Well guess what, this little gem of an inconsistency came back to bite me when I was trying to do something involving dynamic redefinition of methods.

The context: I recently wrote a little method decorator to help me figure out the execution times of the methods in a class. Nothing complicated - for a given class, alias each method, then redefine it; the new method invokes the original method while measuring the execution time. Here's a pseudocode-ish example to clarify:
define_method method do |*args|
 t = Time.now

 result = self.send(aliased_original_method, *args)

 diff =  Time.now-t
 puts "#{klass}##{method} took #{diff} s" if diff > 0
 return result
end
You may have already noticed that the psedocode above doesn't handle methods which accept blocks - if I tried to decorate Array, then Array#each would fail to execute. My actual solution did handle this, and I'll publish that in another post, maybe others will find it useful.

Anyways, this didn't take long, but once I was done, I was intrigued by the notion of a generic method decorator. It would be pretty cool if I could include my Decorator module into a class, pass it an arbitrary block to do any of those AOP-ish things like logging or, as I said, measuring execution times, and have all the methods decorated by that block. All the decorator block should have to do to execute the original method would be to yield.

So this led me to try to figure out the whole deal with blocks. Simply put, there are two ways to handle blocks as parameters - implicitly and explicitly.

Implicitly passing and invoking blocks

This is the usual way in which blocks are passed to methods. Here's what it looks like:
def foo(*args)
 yield(args.join(' '))
end
foo('Sidu', 'Ponnappa'){|name| puts "Hello #{name}"} # => "Hello Sidu Ponnappa"
*args allows us to handle an arbitrary number of parameters - they're made available inside the method as an array, where we join them and pass them to our block via yield.
The block is passed to the method by enclosing it in curly braces and placing it after the method invocation. Only one block can be passed to a method in this manner.
Most importantly, the block is never bound, and so is not available as an object. It is implicitly invoked by calling yield within the method.

Explicitly passing, binding and invoking blocks

We go this route if we want a handle to the block. Here's a code example - it's similar to the one above, but we bind to the block and then invoke it explicitly.
def foo(*args, &blk)
 blk.call(args.join(' '))
end
foo('Sidu', 'Ponnappa'){|name| puts "Hello #{name}"} # => "Hello Sidu Ponnappa"
The & binds the block to the variable blk making it available as a Proc object.

An even more explicit style involves first binding a variable to the block and then passing it to the method as an argument (as opposed to using & and having Ruby do it automagically). This style is often used when doing functional programming - Reg Braithwaite has a beautiful article covering this style of programming in Ruby.

Anyways, here's the example:
def foo(*args)
 blk = args.delete_at(-1) # We know that the last argument 
                          # is the bound block
 blk.call(args.join(' '))  
end

the_block = lambda {|name| puts "Hello #{name}"}
foo('Sidu', 'Ponnappa', the_block) # => "Hello Sidu Ponnappa" 
As you can see, we bind the block to the_block using the built in Ruby method lambda and pass it as a regular argument. No magic like the previous examples - the block (now a Proc object) is treated like any other object would be. This, to my eyes, is the most consistent way to use blocks (everything should be an object). It has a significant disadvantage, however, as we'll see in the next section.

The difference - implicit invocation is much faster

The reason why there are two approaches is simple - performance. Binding a block takes time, so we try to avoid it by going the implicit invocation route. Let's get a handle on the actual differences in performance, though, by benchmarking the examples above (modified slightly to avoid 100000 'puts'). I've renamed the three different example methods to foo, bar and ooga respectively.
require 'benchmark'

# Implicit
def foo(*args)
 yield(args.join(' '))
end
puts foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block when passed
def bar(*args, &block)
 block.call(args.join(' '))
end
puts bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block before passing
def ooga(*args)
 blk = args.delete_at(-1)
 blk.call(args.join(' '))  
end

the_block = lambda {|name| "Hello #{name}"}
puts ooga('Sidu', 'Ponnappa', the_block) # => "Hello Sidu Ponnappa" 

puts "Starting benchmark"

n = 100000

Benchmark.bmbm(10) do |rpt|
 rpt.report("foo") do
  n.times {foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
 end

 rpt.report("bar") do
  n.times {bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
 end

 rpt.report("ooga") do
  n.times {
    the_block = lambda {|name| "Hello #{name}"}
    ooga('Sidu', 'Ponnappa', the_block)
  }
 end
end
Output:
Hello Sidu Ponnappa
Hello Sidu Ponnappa
Hello Sidu Ponnappa

Starting benchmark

Rehearsal ---------------------------------------------
foo 0.781000 0.000000 0.781000 ( 0.782000)
bar 1.406000 0.000000 1.406000 ( 1.406000)
ooga 1.438000 0.016000 1.454000 ( 1.453000)
------------------------------------ total: 3.641000sec

user system total real
foo 0.782000 0.000000 0.782000 ( 0.781000)
bar 1.375000 0.015000 1.390000 ( 1.406000)
ooga 1.453000 0.032000 1.485000 ( 1.485000)


As you can see, bar, which uses an explicit invocation is approximately 75% slower than foo. ooga, where the block is bound right at the beginning and passed as a parameter is the slowest. TANSTAAFL, I guess.

This trick of benchmarking is borrowed from Joel VanderWerf, who posted a similar benchmark involving all permutations of implicit and explicit invocations over at the Ruby forum.

The catch - implicitly invoking a block from within another block does not work

As a direct consequence of this performance benefit, most of the Ruby code I've seen takes the implicit route. Unfortunately, it is not possible to dynamically redefine methods which expect blocks as implicit parameters - not, and have them continue to behave as before. I know that sounds weird, but read on to the example and all shall be made clear. Hah, always wanted to say that. Ahem.

Getting back to the point, if you dynamically define a method using define_method, the method body is passed to it as a block. You cannot pass a block to this dynamically defined method implicitly - at least not that I could find. If there is a way, please let me know - it would help me get a lot of stuff done neatly. In the meanwhile, here's an example demonstrating this inconsistent behaviour.
class SandBox
  def abc(*args)
    yield(*args)
  end

  define_method :xyz do |*args|
   yield(*args)
  end
end

SandBox.new.abc(1,2,3){|*args| p args}  # => [1, 2, 3]
SandBox.new.xyz(4,5,6){|*args| p args}  # => no block given (LocalJumpError)

SandBox.new.method(:abc).call(1,2,3){|*args| p args} # => [1, 2, 3]
SandBox.new.method(:xyz).call(4,5,6){|*args| p args} # => no block given (LocalJumpError)
The calls to abc succeed, but those to xyz throw a LocalJumpError. There seems to be some fundamental difference in the methods created by def and define_method, with the latter being unable to handle implicitly passed blocks. Here's something else which I tried, which didn't work either:
lmbda = lambda{|*args| yield(*args)}
prc = Proc.new{|*args| yield(*args)}

lmbda.call(7, 8, 9){|*args| p args}  # => no block given (LocalJumpError)
prc.call(10,11,12){|*args| p args}  # => no block given (LocalJumpError)
Note that while lambda and Proc.new both bind a block creating a Proc object, lamda causes the bound block to behave more like a method. It also has some differences in the scope available to the bound block. Proc.new is mildly deprecated in favour of lambda.

To Summarise
  • Blocks violate the 'everything is an object' rule in Ruby for performance reasons. They only become objects when bound to a variable.

  • Implicit invocation of a block using yield is much faster than alternatives involving binding the block to a variable.

  • Most Ruby code uses implicit block passing to avoid binding blocks.

  • Blocks cannot themselves accept a block as an implicit parameter (rather, I couldn't find any way to do this - suggestions welcome).

  • If you define a method using define_method, the method body is passed in as a block. This new method cannot itself make use of yield to invoke an unbound block passed to it implicitly.

  • This is inconsistent behaviour, which, if I haven't missed something, kinda sucks.

While searching for a solution to my problem, I came across Paul Cantrell's exhaustive documentation of the different flavours of blocks/closures in Ruby, as well as their little eccentricities. It's well worth a read.

Update 2007-11-27:
As an anonymous commenter pointed out, Ruby 1.9 will indeed fix this inconsistency. The details can be found here.

You may also want to read: Ruby blocks redux: Ruby 1.9.0, Ruby 1.8.6 and JRuby 1.0.3, which was posted after the release of 1.9.0





Looking for help with your Ruby/Rails project? Hire us!




If you liked this post, you could

subscribe to the feed

or simply comment on this post


16 comments:

Anonymous said...

def foo(*args)
blk = args.delete_at(-1) # We know that the last argument
# is the bound block
blk.call(args.join(' '))
end

looks unnecessary clumsy. Why not:

def foo(blk, *args)
blk.call(args.join(' '))
end

Anonymous said...

Ruby 1.9 block will accept a block as a parameter.

Anonymous said...

I really dont think anyone ever wrote anything about blocks being objects. Where did you read that?

Unknown said...

@anonymous #1 - you're right, it is clearer your way. Thanks!

@anonymous #2 - thanks, for the tip, that's worthy of an update to the post.

@anonymous #3 - I didn't read it anywhere. What I did read was 'pure OO' and 'everything is an object'. Nothing mentioned exceptions to these rules.

Unknown said...

re: Blocks violate the 'everything is an object' rule in Ruby for performance reasons. They only become objects when bound to a variable.

I don't think this is correct. Two reasons: 1 - I have read that ruby blocks are not first class objects due to not wanting to expose the implementation to allow for future change. 2 - I'm an old Smalltalk'er. Blocks are first class objects in Smalltalk and a Smalltalk VM of 15 years ago in most cases outperforms a Ruby interpreter of today; including dynamic block dispatches. enjoy!!!

AkitaOnRails said...

You're right, I've been investigating something similar and I stumble upon the same caveats that you did. If someone knows some little known black magic here would be great.

Gregory said...

I see four problems with blocks in Ruby 1.8:

1. Can't pass a block to a block, thus causing problems with define_method and similar.

2. The differences between Proc.new, proc, lambda, and implicit blocks. I have a hard time keeping the differences straight, and try to use only lambda and implicit blocks.

3. The performance difference between explicit and implicit block-passing.

4. The only way to pass more than one block to a method is to pass (all but one of) them as explicit procs/lambdas.

Of these, the first one is definitely fixed in Ruby 1.9, and I think the second one is as well. The third is a quirk of MRI, and it would be worth running some benchmarks in other Ruby implementations (JRuby, Rubinius, IronRuby...). The fourth comes up rarely enough that I have a hard time suggesting a more convenient syntax, even though it annoys me on occasion.

In addition, an implicit block can be captured into a Proc object:

def foo
Proc.new
end

bar = foo { |arg| puts "You called me with #{arg}" }
bar.call(22)
bar.call(99)

I might argue that the implicit block and yield construct in Ruby is a mistake to begin with, but it improves the readability of iteration methods. That one case may or may not justify the complication of two parallel block-passing constructs, however.

Sudhindra Rao said...

Looks like this post is still alive and people read it.. A post that clarifies a lot more things about blocks, closures, procs in ruby is here http://innig.net/software/ruby/closures-in-ruby.rb

Anonymous said...

Why are you handrolling your own dubious benchmarking class?

Anonymous said...

Kudos on tackling a tricky subject with this post. I'd like to offer a couple of "clarifications" which hopefully will help people.

The reason the implicit version is faster is because you are repeatedly constructing the Proc every time in the explicit version.

If you never access the Proc directly, why add the overhead of constructing an object? This way, you only pay for what you need.

I don't really see how this violates the "everything is an object" rule, since as soon as you try to access it, the block becomes an object. :)

Also: the problem with passing blocks to blocks is not due to define_method, just blocks. A similar difference exists with default arguments. Method arguments and block arguments will operate the same way (I think) in Ruby 1.9.

IMHO, the biggest inconsistency with blocks, especially post 1.9, is one you haven't alluded to here, which is that return behaves differently between Proc.new and lambda.

See:
http://tinyurl.com/29a28d

When returning from blocks, passed explicitly or implicitly, return takes you all the way out of the calling method. That is, they behave like Proc.new (which makes sense), not lambda.

This is arguably necessary to support the POLS when dealing with inline blocks (say, with #each). You don't expect return to just pop you out of the block, you expect to return from the method. Not sure if that changes in 1.9?

Anyway, thanks again for the interesting article!

Anonymous said...

I really don't think it can get any simpler than this:

x = lambda do
puts 'hello'
yield
end

x.call { puts 'world' }

And the above example still doesn't work with ruby 1.9.0 (2008-07-31 revision 18282) [x86_64-linux].

hello
test.rb:3:in `block in <main>': no block given (yield) (LocalJumpError)
from test.rb:6:in `call'
from test.rb:6:in `<main>'

So as far as I can see, no, Ruby 1.9 most definitely does not (yet) allow me to IMPLICITLY pass a block to another block, unless we are going to argue that lambdas are some kind of special case.

A new thing Ruby 1.9 can do is allow me to EXPLICITLY pass a block. This works in Ruby 1.9 (but didn't work in 1.8)

x = lambda do |&block|
puts 'hello'
block.call
end

x.call { puts 'world' }

Functionally, this is good enough. But the whole business with |&block| is ugly and inconsistent with the use of yield inside methods. Wouldn't allowing the use of yield inside a lambda be nicer? Or am I missing something here?

Anonymous said...

This is 3 years after your original post hehe...anyway..

I don't believe this behaviour is an inconsistency. It is a result of blocks being captured by closures and so the 'yield' instead invokes any extant block in the enclosing context.

See here: http://banisterfiend.wordpress.com/2010/11/06/behavior-of-yield-in-define_method/

Gerry said...

Still some useful info in this post and comments. Was fighting a similar problem and this helped.

In 1.8.7, this was working:

... do |*a, &blk|
...
send meth, *a, &blk

But 1.8.6 didn't like this syntax.
After much playing around, I found I needed a "passthrough block" :

... do |*a| ...
send meth, *a { yield }

Hope this comment saves someone else some time.

Anonymous said...

Thanks for the post! I've also written an article about Ruby blocks and closures with code examples.

Saager Mhatre said...
This comment has been removed by the author.
Saager Mhatre said...

I know it's been a while since this post was first published, but since people are still commenting on it and noone's suggested possibly the most awesome solution to this, here goes.

I know that the point of the post wasn't to implement a profiling/benchmarking framework, but what you're trying to achieve could be implemented using a trivial DTrace script => rb_functime.d. Since you're on OSX, you should already have libdrace (as well as its dtrace frontend utility), but if you insist on going ruby all the way, there's always ruby-dtrace.

On a more interesting note, DTrace probes for Ruby were released just a few months before your blog post! :P