Ruby blocks redux: Ruby 1.9.0, Ruby 1.8.6 and JRuby 1.0.3

I was stuck at home for the better part of a week (food poisoning, oh joy) and I figured I might as well taking another look at the behaviour of blocks and see what's changed in Ruby 1.9.0. My original post on the topic, Ruby blocks gotchas, covered the different ways in which blocks could be invoked, their performance implications and a crucial limitation - you cannot pass one block to another except as an ordinary parameter; no &block and definitely no yield.

In this post we take another look at block invocations, comparing Ruby 1.8.6 and 1.9.0. Following Gregory's suggestion, I'm also throwing JRuby 1.0.3 into the mix. Expect updates for IronRuby too, shortly (just as soon as I get it to build on my laptop).

Invoking a block from within another block
Here's a piece of code from my previous post demonstrating the problem:
class SandBox
def abc(*args)
yield(*args)
end

define_method :xyz do |*args|
yield(*args)
end
end

SandBox.new.abc(1,2,3){|*args| p args} # => [1, 2, 3]
SandBox.new.xyz(4,5,6){|*args| p args} # => no block given (LocalJumpError)

SandBox.new.method(:abc).call(1,2,3){|*args| p args} # => [1, 2, 3]
SandBox.new.method(:xyz).call(4,5,6){|*args| p args} # => no block given (LocalJumpError)
It's essentially impossible to create a method using define_method which can itself accept and invoke a block in Ruby 1.8.x. This was fixed in Ruby 1.9, but not to allow implicit invocation of blocks using yield, so the sample given above does not work in 1.9.0 either (it blows up with the same errors). Let's see what does.

Explicitly invoking one block from another in Ruby 1.9.0
This method was something I didn't even cover in my previous post, because the parser would simply blow up when parsing |*args, &block|. Here's what it looks like.
puts "Ruby #{RUBY_VERSION}, #{RUBY_RELEASE_DATE}, #{RUBY_PLATFORM}"

class SandBox
def abc(*args)
yield(*args)
end

define_method :xyz do |*args, &block|
block.call(*args)
end
end

SandBox.new.abc(1,2,3){|*args| p args} # => [1, 2, 3]
SandBox.new.xyz(4,5,6){|*args| p args} # => [4, 5, 6]

SandBox.new.method(:abc).call(1,2,3){|*args| p args} # => [1, 2, 3]
SandBox.new.method(:xyz).call(4,5,6){|*args| p args} # => [4, 5, 6]
This method involves binding the block as an argument and invoking it with block.call() rather than yield. Let's try it out across JRuby 1.0.3, Ruby 1.8.6, and Ruby 1.9.0.

JRuby 1.0.3 (equivalent to Ruby 1.8.5)
>jruby benchmark3.rb
:1: benchmark3.rb:7: syntax error, expecting tPIPE but found ',' instead (SyntaxError)
Ruby 1.8.6
>ruby benchmark3.rb
benchmark3.rb:8: syntax error, unexpected ',', expecting '|'
define_method :xyz do |*args, &block|
^
benchmark3.rb:11: syntax error, unexpected kEND, expecting $end
Ruby 1.9.0
>ruby benchmark3.rb
Ruby 1.9.0, 2007-12-25, i386-mswin32
[1, 2, 3]
[4, 5, 6]
[1, 2, 3]
[4, 5, 6]
So cool, that works now.

A fresh look at blocks performance

If you've seen the benchmarking code in my previous post you'll find this one slightly different. I've modified ooga based on the suggestions of an anonymous commenter so that delete_at() is no longer used making it more similar to the other two methods.
require 'benchmark'

puts "Ruby #{RUBY_VERSION}, #{RUBY_RELEASE_DATE}, #{RUBY_PLATFORM}"

# Implicit
def foo(*args)
yield(args.join(' '))
end
puts foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block when passed
def bar(*args, &block)
block.call(args.join(' '))
end
puts bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block before passing
def ooga(blk, *args)
blk.call(args.join(' '))
end

the_block = lambda {|name| "Hello #{name}"}
puts ooga(the_block, 'Sidu', 'Ponnappa') # => "Hello Sidu Ponnappa"

puts "\nStarting benchmark"

n = 1000000

puts "\n#{n} iterations\n"

Benchmark.bmbm(10) do |rpt|
rpt.report("foo") do
n.times {foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
end

rpt.report("bar") do
n.times {bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
end

rpt.report("ooga") do
n.times {
the_block = lambda {|name| "Hello #{name}"}
ooga(the_block, 'Sidu', 'Ponnappa')
}
end
end
And the results:

Ruby 1.8.6
Ruby 1.8.6, 2007-03-13, i386-mswin32
Hello Sidu Ponnappa
Hello Sidu Ponnappa
Hello Sidu Ponnappa

Starting benchmark

1000000 iterations
Rehearsal ---------------------------------------------
foo 5.953000 0.000000 5.953000 ( 5.969000)
bar 11.484000 0.157000 11.641000 ( 11.672000)
ooga 11.547000 0.234000 11.781000 ( 11.781000)
----------------------------------- total: 29.375000sec

user system total real
foo 5.969000 0.000000 5.969000 ( 5.969000)
bar 11.406000 0.203000 11.609000 ( 11.609000)
ooga 11.563000 0.141000 11.704000 ( 11.782000)


JRuby 1.0.3 on unoptimised Sun JRE 1.6
Ruby 1.8.5, 2007-12-15, java
Hello Sidu Ponnappa
Hello Sidu Ponnappa
Hello Sidu Ponnappa

Starting benchmark

1000000 iterations
Rehearsal ---------------------------------------------
foo 27.156000 0.000000 27.156000 ( 27.156000)
bar 38.000000 0.000000 38.000000 ( 38.000000)
ooga 39.375000 0.000000 39.375000 ( 39.375000)
---------------------------------- total: 104.531000sec

user system total real
foo 26.844000 0.000000 26.844000 ( 26.844000)
bar 37.984000 0.000000 37.984000 ( 37.984000)
ooga 39.406000 0.000000 39.406000 ( 39.406000)


Ruby 1.9.0
Ruby 1.9.0, 2007-12-25, i386-mswin32
Hello Sidu Ponnappa
Hello Sidu Ponnappa
Hello Sidu Ponnappa

Starting benchmark

1000000 iterations
Rehearsal ---------------------------------------------
foo 5.140000 0.062000 5.202000 ( 5.203000)
bar 7.157000 0.078000 7.235000 ( 7.250000)
ooga 7.453000 0.078000 7.531000 ( 7.531000)
----------------------------------- total: 19.968000sec

user system total real
foo 5.015000 0.047000 5.062000 ( 5.094000)
bar 7.172000 0.047000 7.219000 ( 7.234000)
ooga 7.391000 0.063000 7.454000 ( 7.500000)


The performance of the different methods relative to one-another does not change across Ruby runtimes. foo (implicit invocation using yield) is always substantially faster than bar (explicit invocation with auto-magical binding of the block) which in turn is just marginally faster than ooga (binding the block in advance and explicitly passing it as a parameter). However, the gap between implicit and explicit has dropped dramatically in Ruby 1.9.0.

You may be wondering why I haven't mentioned the clearly large differences in the performance of the different runtimes. That's because I'm getting wildly different results in this other benchmark I have and I'm not sure how I should interpret the results. If you can, please do chime in with an explanation because this has me stumped.

This 'purer' benchmark posted on the Ruby forms by Joel VanderWerf tells a very different story of the relative performance of JRuby, Ruby 1.8 and Ruby 1.9. This benchmark uses different permutations of the ways in which blocks can be invoked to demonstrate performance differences. Since my previous post simply described performance differences within a single runtime, it didn't matter so much, but now it does. Check it out.
puts "Ruby #{RUBY_VERSION}, #{RUBY_RELEASE_DATE}, #{RUBY_PLATFORM}"
require 'benchmark'

def outer11(&bl)
inner1(&bl)
end

def outer12(&bl)
inner2(&bl)
end

def outer21
inner1 {yield}
end

def outer22
inner2 {yield}
end

def inner1(&bl)
bl.call
end

def inner2
yield
end

n = 1000000

Benchmark.bmbm(10) do |rpt|
rpt.report("outer11") do
n.times {outer11{}}
end

rpt.report("outer12") do
n.times {outer12{}}
end

rpt.report("outer21") do
n.times {outer21{}}
end

rpt.report("outer22") do
n.times {outer22{}}
end
end

Ruby 1.8.6
Ruby 1.8.6, 2007-03-13, i386-mswin32
Rehearsal ---------------------------------------------
outer11 7.578000 0.250000 7.828000 ( 7.890000)
outer12 6.047000 0.203000 6.250000 ( 6.282000)
outer21 11.625000 0.344000 11.969000 ( 12.000000)
outer22 1.765000 0.000000 1.765000 ( 1.765000)
----------------------------------- total: 27.812000sec

user system total real
outer11 7.688000 0.141000 7.829000 ( 7.828000)
outer12 6.047000 0.140000 6.187000 ( 6.187000)
outer21 11.547000 0.344000 11.891000 ( 11.891000)
outer22 1.750000 0.000000 1.750000 ( 1.750000)


JRuby 1.0.3
Ruby 1.8.5, 2007-12-15, java
Rehearsal ---------------------------------------------
outer11 10.172000 0.000000 10.172000 ( 10.172000)
outer12 5.359000 0.000000 5.359000 ( 5.359000)
outer21 9.375000 0.000000 9.375000 ( 9.375000)
outer22 2.219000 0.000000 2.219000 ( 2.219000)
----------------------------------- total: 27.125000sec

user system total real
outer11 9.438000 0.000000 9.438000 ( 9.438000)
outer12 4.875000 0.000000 4.875000 ( 4.875000)
outer21 8.953000 0.000000 8.953000 ( 8.953000)
outer22 2.187000 0.000000 2.187000 ( 2.187000)


Ruby 1.9.0
Ruby 1.9.0, 2007-12-25, i386-mswin32
Rehearsal ---------------------------------------------
outer11 2.844000 0.047000 2.891000 ( 2.890000)
outer12 2.453000 0.062000 2.515000 ( 2.516000)
outer21 4.625000 0.063000 4.688000 ( 4.687000)
outer22 0.922000 0.000000 0.922000 ( 0.922000)
----------------------------------- total: 11.016000sec

user system total real
outer11 2.812000 0.062000 2.874000 ( 2.875000)
outer12 2.469000 0.047000 2.516000 ( 2.516000)
outer21 4.625000 0.078000 4.703000 ( 4.703000)
outer22 0.922000 0.000000 0.922000 ( 0.922000)


Overall execution time for JRuby goes from 5x Ruby 1.8.6 to 1x. But even though the overall time is now the same, the execution time of the individual permutations differ a great deal. And Ruby 1.9.0 turns out to be about 2.5x faster than both JRuby and Ruby 1.8.6. Here's a graph of the results.


Summary
  • We find that implicit invocation of a block using yield is always substantially faster than explicit invocation with auto-magical binding of the block, which in turn is just marginally faster than binding the block in advance and passing it to the method as a parameter. This is consistent across JRuby 1.0.3, Ruby 1.8.6 and Ruby 1.9.0
  • The performance of block invocations when compared across different runtimes is beyond the capability of the author to explain and is left to better heads than his.
  • It is clear that in this context, Ruby 1.9.0 is substantially faster than Ruby 1.8.6 and JRuby 1.0.3
You may also want to read: Ruby blocks gotchas

2 comments:

Ola Bini said...

Interesting results. You have to realize that the 1.0-branch of JRuby is not focused on performance. Basically none of the performance improvements we have applied to the 1.1-branch have been backported to 1.0.

So, as a typical example, MRI 1.8.6 does like this on your example:
user system total real
foo 2.840000 0.010000 2.850000 ( 2.860675)
bar 5.580000 0.030000 5.610000 ( 5.629368)
ooga 5.590000 0.020000 5.610000 ( 5.660512)

And JRuby current trunk (r5500) running with -J-server:
user system total real
foo 1.615000 0.000000 1.615000 ( 1.615000)
bar 2.303000 0.000000 2.303000 ( 2.303000)
ooga 2.206000 0.000000 2.206000 ( 2.206000)

So you can see that JRuby is actually substantially faster for this benchmark. I can't compare with latest 1.9 since I don't have it at the moment, though.

Unknown said...

That's really cool.

In case anyone's wondering what difference -J-server makes, here's a run using exactly the same setup as in my post (JRuby 1.0.3), but with -J-server (I'm and idiot, really, should have done this in the first place).
user system total real
foo 8.562000 0.000000 8.562000 (8.562000)
bar 14.641000 0.000000 14.641000 (14.641000)
ooga 15.906000 0.000000 15.906000 (15.906000)

JRuby 1.1 is 5x to 7x faster than 1.0.3! Awesome!