Ruby blocks gotchas

New to blocks in Ruby? RubyMonk has chapters covering both introductory topics as well as more detailed lessons on blocks. Do try them out!
There's this thing they say about Ruby - everything is an object. It's true, with very few exceptions, one of them being the block. Well guess what, this little gem of an inconsistency came back to bite me when I was trying to do something involving dynamic redefinition of methods.

The context: I recently wrote a little method decorator to help me figure out the execution times of the methods in a class. Nothing complicated - for a given class, alias each method, then redefine it; the new method invokes the original method while measuring the execution time. Here's a pseudocode-ish example to clarify:
define_method method do |*args|
 t = Time.now

 result = self.send(aliased_original_method, *args)

 diff =  Time.now-t
 puts "#{klass}##{method} took #{diff} s" if diff > 0
 return result
end
You may have already noticed that the psedocode above doesn't handle methods which accept blocks - if I tried to decorate Array, then Array#each would fail to execute. My actual solution did handle this, and I'll publish that in another post, maybe others will find it useful.

Anyways, this didn't take long, but once I was done, I was intrigued by the notion of a generic method decorator. It would be pretty cool if I could include my Decorator module into a class, pass it an arbitrary block to do any of those AOP-ish things like logging or, as I said, measuring execution times, and have all the methods decorated by that block. All the decorator block should have to do to execute the original method would be to yield.

So this led me to try to figure out the whole deal with blocks. Simply put, there are two ways to handle blocks as parameters - implicitly and explicitly.

Implicitly passing and invoking blocks

This is the usual way in which blocks are passed to methods. Here's what it looks like:
def foo(*args)
 yield(args.join(' '))
end
foo('Sidu', 'Ponnappa'){|name| puts "Hello #{name}"} # => "Hello Sidu Ponnappa"
*args allows us to handle an arbitrary number of parameters - they're made available inside the method as an array, where we join them and pass them to our block via yield.
The block is passed to the method by enclosing it in curly braces and placing it after the method invocation. Only one block can be passed to a method in this manner.
Most importantly, the block is never bound, and so is not available as an object. It is implicitly invoked by calling yield within the method.

Explicitly passing, binding and invoking blocks

We go this route if we want a handle to the block. Here's a code example - it's similar to the one above, but we bind to the block and then invoke it explicitly.
def foo(*args, &blk)
 blk.call(args.join(' '))
end
foo('Sidu', 'Ponnappa'){|name| puts "Hello #{name}"} # => "Hello Sidu Ponnappa"
The & binds the block to the variable blk making it available as a Proc object.

An even more explicit style involves first binding a variable to the block and then passing it to the method as an argument (as opposed to using & and having Ruby do it automagically). This style is often used when doing functional programming - Reg Braithwaite has a beautiful article covering this style of programming in Ruby.

Anyways, here's the example:
def foo(*args)
 blk = args.delete_at(-1) # We know that the last argument 
                          # is the bound block
 blk.call(args.join(' '))  
end

the_block = lambda {|name| puts "Hello #{name}"}
foo('Sidu', 'Ponnappa', the_block) # => "Hello Sidu Ponnappa" 
As you can see, we bind the block to the_block using the built in Ruby method lambda and pass it as a regular argument. No magic like the previous examples - the block (now a Proc object) is treated like any other object would be. This, to my eyes, is the most consistent way to use blocks (everything should be an object). It has a significant disadvantage, however, as we'll see in the next section.

The difference - implicit invocation is much faster

The reason why there are two approaches is simple - performance. Binding a block takes time, so we try to avoid it by going the implicit invocation route. Let's get a handle on the actual differences in performance, though, by benchmarking the examples above (modified slightly to avoid 100000 'puts'). I've renamed the three different example methods to foo, bar and ooga respectively.
require 'benchmark'

# Implicit
def foo(*args)
 yield(args.join(' '))
end
puts foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block when passed
def bar(*args, &block)
 block.call(args.join(' '))
end
puts bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block before passing
def ooga(*args)
 blk = args.delete_at(-1)
 blk.call(args.join(' '))  
end

the_block = lambda {|name| "Hello #{name}"}
puts ooga('Sidu', 'Ponnappa', the_block) # => "Hello Sidu Ponnappa" 

puts "Starting benchmark"

n = 100000

Benchmark.bmbm(10) do |rpt|
 rpt.report("foo") do
  n.times {foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
 end

 rpt.report("bar") do
  n.times {bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
 end

 rpt.report("ooga") do
  n.times {
    the_block = lambda {|name| "Hello #{name}"}
    ooga('Sidu', 'Ponnappa', the_block)
  }
 end
end
Output:
Hello Sidu Ponnappa
Hello Sidu Ponnappa
Hello Sidu Ponnappa

Starting benchmark

Rehearsal ---------------------------------------------
foo 0.781000 0.000000 0.781000 ( 0.782000)
bar 1.406000 0.000000 1.406000 ( 1.406000)
ooga 1.438000 0.016000 1.454000 ( 1.453000)
------------------------------------ total: 3.641000sec

user system total real
foo 0.782000 0.000000 0.782000 ( 0.781000)
bar 1.375000 0.015000 1.390000 ( 1.406000)
ooga 1.453000 0.032000 1.485000 ( 1.485000)


As you can see, bar, which uses an explicit invocation is approximately 75% slower than foo. ooga, where the block is bound right at the beginning and passed as a parameter is the slowest. TANSTAAFL, I guess.

This trick of benchmarking is borrowed from Joel VanderWerf, who posted a similar benchmark involving all permutations of implicit and explicit invocations over at the Ruby forum.

The catch - implicitly invoking a block from within another block does not work

As a direct consequence of this performance benefit, most of the Ruby code I've seen takes the implicit route. Unfortunately, it is not possible to dynamically redefine methods which expect blocks as implicit parameters - not, and have them continue to behave as before. I know that sounds weird, but read on to the example and all shall be made clear. Hah, always wanted to say that. Ahem.

Getting back to the point, if you dynamically define a method using define_method, the method body is passed to it as a block. You cannot pass a block to this dynamically defined method implicitly - at least not that I could find. If there is a way, please let me know - it would help me get a lot of stuff done neatly. In the meanwhile, here's an example demonstrating this inconsistent behaviour.
class SandBox
  def abc(*args)
    yield(*args)
  end

  define_method :xyz do |*args|
   yield(*args)
  end
end

SandBox.new.abc(1,2,3){|*args| p args}  # => [1, 2, 3]
SandBox.new.xyz(4,5,6){|*args| p args}  # => no block given (LocalJumpError)

SandBox.new.method(:abc).call(1,2,3){|*args| p args} # => [1, 2, 3]
SandBox.new.method(:xyz).call(4,5,6){|*args| p args} # => no block given (LocalJumpError)
The calls to abc succeed, but those to xyz throw a LocalJumpError. There seems to be some fundamental difference in the methods created by def and define_method, with the latter being unable to handle implicitly passed blocks. Here's something else which I tried, which didn't work either:
lmbda = lambda{|*args| yield(*args)}
prc = Proc.new{|*args| yield(*args)}

lmbda.call(7, 8, 9){|*args| p args}  # => no block given (LocalJumpError)
prc.call(10,11,12){|*args| p args}  # => no block given (LocalJumpError)
Note that while lambda and Proc.new both bind a block creating a Proc object, lamda causes the bound block to behave more like a method. It also has some differences in the scope available to the bound block. Proc.new is mildly deprecated in favour of lambda.

To Summarise
  • Blocks violate the 'everything is an object' rule in Ruby for performance reasons. They only become objects when bound to a variable.

  • Implicit invocation of a block using yield is much faster than alternatives involving binding the block to a variable.

  • Most Ruby code uses implicit block passing to avoid binding blocks.

  • Blocks cannot themselves accept a block as an implicit parameter (rather, I couldn't find any way to do this - suggestions welcome).

  • If you define a method using define_method, the method body is passed in as a block. This new method cannot itself make use of yield to invoke an unbound block passed to it implicitly.

  • This is inconsistent behaviour, which, if I haven't missed something, kinda sucks.

While searching for a solution to my problem, I came across Paul Cantrell's exhaustive documentation of the different flavours of blocks/closures in Ruby, as well as their little eccentricities. It's well worth a read.

Update 2007-11-27:
As an anonymous commenter pointed out, Ruby 1.9 will indeed fix this inconsistency. The details can be found here.

You may also want to read: Ruby blocks redux: Ruby 1.9.0, Ruby 1.8.6 and JRuby 1.0.3, which was posted after the release of 1.9.0





Looking for help with your Ruby/Rails project? Hire us!




If you liked this post, you could

subscribe to the feed

or simply comment on this post


Post a Comment