Electric Sheep Blog: metaprogramming

Showing posts with label metaprogramming. Show all posts

Dynamic languages, Twitter, kind_of? and some statistics

Some time ago I'd written about how Twitter's descriptions of their own codebase made my hackles rise.

I wasn't alone in this, and one discussion thread on the internal ThoughtWorks dev list later we had some hard numbers extracted from all the Ruby work we've done or are doing.

Martin's put them up on his bliki - take a look. There are the numbers to back the talk - you don't need type checks all over your codebase in Ruby (or any other dynamic language), Alex Payne's opinion notwithstanding.

A simple define_method example

Someone on #ruby-lang wanted to know how to define a method on a class when the method is first called. Here's a quick example.

require 'rubygems'
require 'spec'

module DynamicGetter
  def method_missing(name, *args)
    if(@attributes.has_key? name)
      self.class.class_eval do
       define_method(name){ @attributes[name] }
      end
      self.send(name, *args)
    else
      super
    end
  end
end

describe DynamicGetter do
  before(:each) do
  # Reference the class used in the specs
  # as an ordinary variable rather than a
  # named constant (@ooga instead of Ooga)
  # so that it can be created afresh for
  # each spec.
  # If we did class Ooga, the second spec
  # would fail because the first spec already
  # created the method #woot.

    @ooga_klass = Class.new(Object)
    @ooga_klass.class_eval do
      include DynamicGetter

      def initialize(attributes = {})
        @attributes = attributes
      end
    end
  end

  it "should know how to add a method to a class on first call" do
    o = @ooga_klass.new(:woot => 5)
    o.should_not respond_to(:woot)
    o.woot.should == 5
    o.should respond_to(:woot)
  end

  it "should raise a method not found exception if the attribute isn't present" do
    lambda{ @ooga_klass.new.woot }.should raise_error(NoMethodError)
  end
end

Another example is the Rails find_by_* methods, which are defined the first time you call them.
If, for some reason, you want to apply the example I've given on a per-instance basis, look at this post.

Related posts:

Consistent interfaces, contrived examples and define_method for instances

Whenever I see code which does two (or more) things in sequence in order to achieve one objective, I feel obliged to try to figure out a better way. It has to do with the fact that I'm pretty compulsive - I like to have to use just one interface to achieve one logical objective. I also check the lock on my front door several times every night before admitting that it is, in fact, locked.

I came across one such example in the Pickaxe when I was reading up on threads in Ruby, specifically the one that talks about synchronising a method on a single instance using MonitorMixin(p 137). Needless to say my compulsions asserted themselves, threading went out of the window and I started looking for a cleaner way to implement the example. I'm duplicating it below to save you the trouble of looking it up:

require 'monitor'
class Counter
  attr_reader :count
  def initialize
    @count = 0
  end
  def tick
    @count += 1
  end
end
c = Counter.new
c.extend(MonitorMixin)
t1 = Thread.new { 10000.times { c.synchronize { c.tick } } }
t2 = Thread.new { 10000.times { c.synchronize { c.tick } } }
t1.join; t2.join

The authors conclude the example with this observation:

Here, because class Counter doesn’t know it is a monitor at the time it’s defined, we have to perform the synchronization externally (in this case by wrapping the calls to c.tick). This is clearly a tad dangerous: if some other code calls tick but doesn’t realize that synchronization is required, we’re back in the same mess we started with.

We'd like to fix this problem and have our code read like

c = Counter.new
c.extend(ThreadSafeInstance).make_safe(:tick)
t1 = Thread.new { 10000.times { c.tick }}

.

Before we get started trying to sort this out, I should point out that situations where you need to extend individual instances and then decorate methods on them occur rarely. You're likely to find an easier solution if you re-examine your architecture and try to identify precisely why you need to mess with instances. The example above is clearly contrived and is almost certainly a code smell. What we're going to do to fix this contrived example is what a friend of mine (Srushti, the chap behind XStream.net) called a 'freaky cool' solution - it's cool, but if you need it in real life then you probably have some design issues in your architecture. We can consider it an exercise in metaprogramming, but not much more.

This problem calls for Rails' alias_method_chain style method decoration - a means by which we can decorate tick() so that we can have two new methods, tick_without_synchronization() and tick_with_synchronization(). In Java or C#, you could achieve a similar effect by using dynamic proxies. The catch is, in this case we only want to decorate a single method on a single instance whereas alias_method_chain and dynamic proxies work on classes and consequentially modify the perceived behaviour of all instances of a class - something we don't want. Instead, we need to implement alias_method_chain ourselves, but in a way which allows us to modify just a single instance.

This shouldn't be a big deal - we can use Ruby's alias_method, right? No such luck. alias_method works at a module level; since Class is a subclass of Module in Ruby, this method modifies all instances of a class that it acts on. The same holds good for define_method. This seems problematic - if we had define_method for instances, we can implement our own alias_method and in turn alias_method_chain.

The only way I could find which allowed me to programmatically add methods to an instance was by means of extending it with a module. That's what I ultimately did - I created an anonymous module, defined the method within it and then extended my instance with that module. Here's the RSpec specification for define_method followed by the code.

describe 'An instance of', Object do
 before(:each) do
  @o = Object.new
 end
 
 it 'should know how to define a new method' do
  @o.should_not respond_to(:ooga)
  @o.define_method(:ooga){}
  @o.should respond_to(:ooga)
 end
 
 it 'should not affect other instances when defining a new method' do
  @o.define_method(:ooga){}
  Object.new.should_not respond_to(:ooga)
 end
 
 it 'should be able to define a new method which accepts parameters' do
  @o.define_method(:echo){|*args| args}
  @o.echo(1,2,3,4,5).should == [1,2,3,4,5]
 end
end

class Object 
 def define_method(method_name, &block)
  self.extend(
    Module.new{
     define_method(method_name, block)
    }
   )
 end
end

Now for alias_method - spec, followed by implementation

describe 'An instance of', Object do 
 before(:each) do
  @o = Object.new
 end
 
 it 'should be able to alias a method' do
  @o.alias_method(:new_to_s, :to_s)
  @o.should respond_to(:new_to_s)
 end
 
 it 'should have aliased methods respond like the originals' do
  @o.alias_method(:new_to_s, :to_s)
  @o.new_to_s.should == @o.to_s
 end
 
 it 'should ensure that aliased methods are copies of, not references to the originals' do
  @o.to_s.should_not be_nil
  @o.alias_method(:new_to_s, :to_s)
  
  @o.define_method(:to_s){}
  @o.to_s.should be_nil 
  
  @o.new_to_s.should_not be_nil
 end
end

class Object
 def alias_method(new_id, original_id)
  original = self.method(original_id).to_proc
  define_method(new_id){|*args| original.call(*args)}
 end
end

Let's apply these two new methods to the original problem:

require 'monitor'

class Counter
  attr_reader :count

 def initialize
    @count = 0
  end
  def tick
    @count += 1
  end
end

module ThreadSafeInstance
 def self.extended(instance)
  instance.extend(MonitorMixin)
 end
 
 def make_safe(method_symbol)
  original_method = "_unsafe_#{method_symbol}_"
  alias_method(original_method, method_symbol)
  define_method(method_symbol){|*args|
   self.synchronize{self.send(original_method)}
  }
  self
 end
end

c = Counter.new
c.extend(ThreadSafeInstance).make_safe(:tick)
t1 = Thread.new { 10000.times { c.tick }}
t2 = Thread.new { 10000.times { c.tick }}
t1.join; t2.join

There! The interface is much cleaner now - client code which consumes instance c needn't be aware of the fact that tick() needs to be synchronised. They need only be concerned with the single objective of tick(), that it allows us to increment Counter.

Incidentally, we cannot define methods which accept blocks using these methods we just created, a limitation of the original define_method which our implementation uses.

Loading classes from strings in Ruby

When working with Ruby, every once in a while you'll find yourself messing with a bunch of strings which are the names of classes. Given these strings, you'll need to go and instantiate the appropriate classes - something like taking "Array", "File" or "Booga" and figuring out how to call Array.new, File.new or... well, you get the picture. If you're wondering when you'd ever need this, just try instantiating controllers at runtime based on the urls being requested.

The usual suspect - Module#const_get

The first time I needed to do this, I did a Google search without actually thinking about it too much. The number one result was Module.const_get which does pretty much what is needed.

irb(main):005:0> Module.const_get('Array').new
=> []

As the docs would tell you, this returns the named constant which matches the string, which is pretty much what classes are - named constants which are instances of the Class class. Well, it worked, and beyond reading that this wouldn't work for classes which are nested in modules (const_get doesn't know or care about parsing stuff like the :: in Net::HTTP), I didn't bother too much. I did, however, come across something called Kernel#qualified_const_get which gets around this limitation, but more on that later.

But Module#const_get is a hack

Using const_get is effectively a hack - it uses the fact that class names are also constants to allow you to get hold of them.
Why do I say it's a hack? Well, if for some obscure reason you decided to create a class which was not a named constant, then you can't get hold of it with const_get. Here's an example:

ooga = Class.new   # Create a class the hard way
ooga.class_eval do # Add a method to it the hard way
  def hello
    "Hello!"
  end
end

p ooga.new.hello   # Prove that ooga can be instantiated
p Module.const_get('Array').new # As can 'Array', a named constant, using const_get
p Module.const_get('ooga').new.hello # But not 'ooga', which isn't

Output:

"Hello!"
[]
temp05.rb:10:in `const_get': wrong constant name ooga (NameError)
 from temp05.rb:10

But what about eval?

Today, Srihari asked me if he could load up classes using eval. I said, 'Just use Module.const_get', but of course we were curious so we tried using eval and it worked. Obviously, given that eval effectively allows you to interpret code at runtime, it also handles nested classes and/or modules. Here's a code sample showing a regular invocation, an invocation using eval and an invocation using const_get (which fails) of a nested class:

puts "Ruby #{RUBY_VERSION}, #{RUBY_RELEASE_DATE}, #{RUBY_PLATFORM}"

module Ooga
  class Booga
    def hello
      "hello!"
    end
  end
end

puts Ooga::Booga.new

puts eval('Ooga::Booga').new

puts Module.const_get('Ooga::Booga').new

Output:

temp04.rb:15:in `const_get': wrong constant name Ooga::Booga (NameError)
 from temp04.rb:15
Ruby 1.8.6, 2007-06-07, i486-linux
#<Ooga::Booga:0xb7c84774>
#<Ooga::Booga:0xb7c8465c>

Here Booga is a class nested in the module Ooga and as you can see const_get fails to fetch it because it isn't a constant (heck, it isn't even the right syntax for a constant).

Performance of const_get and eval

Let's take a look at the performance of const_get versus eval, an important factor if you're doing this inside a loop or some such.

puts "Ruby #{RUBY_VERSION}, #{RUBY_RELEASE_DATE}, #{RUBY_PLATFORM}"
require 'benchmark'

n = 1000000

Benchmark.bmbm(10) do |rpt|
rpt.report("simple invocation") do
  n.times {Array.new}
end

rpt.report("const_get invocation") do
  n.times {Kernel.const_get('Array').new}
end


rpt.report("eval invocation") do
  n.times {eval('Array').new}
end
end

Output:

Ruby 1.8.6, 2007-06-07, i486-linux
Rehearsal --------------------------------------------------------
simple invocation      0.910000   0.090000   1.000000 (  1.013326)
const_get invocation   1.400000   0.110000   1.510000 (  1.513216)
eval invocation        3.480000   0.220000   3.700000 (  3.692692)
----------------------------------------------- total: 6.210000sec

                         user     system      total        real
simple invocation      0.890000   0.100000   0.990000 (  1.000915)
const_get invocation   1.440000   0.080000   1.520000 (  1.514948)
eval invocation        3.490000   0.200000   3.690000 (  3.689998)

const_get takes 1.5 times longer, while eval takes more than 3 times as long as a simple invocation.

The Kernel#qualified_const_get alternative

Now, back to Kernel#qualified_const_get which was created by Gregory in this blog post a couple of years ago. It's looks a lot like const_get but is capable of figuring out nested classes too and it works just fine. However, it's very slow (it isn't native C code) and should probably be named something else because it has nothing to do with fetching constants any more. Kernel#fetch_class perhaps? But some numbers first:

puts "Ruby #{RUBY_VERSION}, #{RUBY_RELEASE_DATE}, #{RUBY_PLATFORM}"
require 'benchmark'
require 'qualified_const_get'

module Ooga
  class Booga
    def hello
      "hello!"
    end
  end
end

n = 1000000

Benchmark.bmbm(10) do |rpt|
rpt.report("simple invocation") do
  n.times {Ooga::Booga.new}
end

rpt.report("qualified const_get invocation") do
  n.times {Kernel.qualified_const_get('Ooga::Booga').new}
end

rpt.report("eval invocation") do
  n.times {eval('Ooga::Booga').new}
end
end

Output:

Ruby 1.8.6, 2007-06-07, i486-linux
Rehearsal -------------------------------------------------------
simple invocation     0.860000   0.120000   0.980000 (  0.991182)
qualified_const_get  16.620000   2.330000  18.950000 ( 19.052966)
eval invocation       4.500000   0.230000   4.730000 (  4.741436)
--------------------------------------------- total: 24.660000sec

                        user     system      total        real
simple invocation     0.990000   0.100000   1.090000 (  1.090303)
qualified_const_get  17.290000   2.420000  19.710000 ( 19.735733)
eval invocation       4.250000   0.170000   4.420000 (  4.429670)

See what I mean about qualified_const_get being slow because it isn't native?

Here's the implementation of Kernel#qualified_const_get quoted from his blog:

# http://redcorundum.blogspot.com/2006/05/kernelqualifiedconstget.html
module Kernel
  def qualified_const_get(str)
    path = str.to_s.split('::')
    from_root = path[0].empty?
    if from_root
      from_root = []
      path = path[1..-1]
    else
      start_ns = ((Class === self)||(Module === self)) ? self : self.class
      from_root = start_ns.to_s.split('::')
    end
    until from_root.empty?
      begin
        return (from_root+path).inject(Object) { |ns,name| ns.const_get(name) }
      rescue NameError
        from_root.delete_at(-1)
      end
    end
    path.inject(Object) { |ns,name| ns.const_get(name) }
  end
end

Summary

There are three choices when trying to convert a string to the corresponding class: Kernel#const_get, eval and Kernel#qualified_const_get
Kernel#const_get is the fastest, doesn't handle nested classes and works for all but the weirdest of scenarios (the class you're trying to get hold of isn't a named constant)
eval is significantly slower, but it works in any situation
Kernel#qualified_const_get is abysmally slow, but handles nested classes. However, until there is a native implementation, it loses to eval on every front

Looking for help with your Ruby/Rails project? Hire us!

If you liked this post, you could

subscribe to the feed

or Follow @ponnappa

Ruby's new as a factory

Ruby's new is often described as being the perfect implementation of a factory method. In C++/Java/C#, you're forced to do something like User.build() or User.create() instead of new User() because there's no way you can change the way the new keyword behaves. In Ruby on the other hand, new is simply a class method on User and can be arbitrarily overwritten. Note that I'm saying overwritten, not overridden - I don't mean override in a sub-class but actually overwrite - replace - a method. You can overwrite User.new to do just about anything - typical uses would be to implement object pooling, the Singleton pattern (overwrite new to return the same instance every time), stuff like that. What's just as important is that you're sticking to the convention (new() instead of an arbitrary choice like create()) which makes life easier for everyone because it's natural and transparent to the consumers of your classes.

Ola coincidentally happened to cover the same topic a couple of days ago when talking about Steve Yegge's post on code bloat, so I'll just link to his post and skip the introduction. Look to the second half for a description of how to use new as a factory. So, let's move on to the example which got me interested in this. I've constructed a sample problem which has roughly the same structure as what I was working with - if some bits of it look rather contrived, it's because they are ;-).

The problem is this - I have a base Operation class which has some state and some logic. There are two sub-classes of Operation, Add and Multiply. Here is the code for these classes - take a moment to look them over.

class Operation
  def initialize(a, b)
    @a = a
    @b = b
  end
  
  def to_s
    "#{self.class}(#{@a}, #{@b})"
  end
end

class Add < Operation
  def do
    @a+@b
  end
end

class Multiply < Operation
  def do
    @a*@b
  end
end

The state is represented by @a and @b and the logic, such as it is, by to_s().

Input in the form of strings like add 2 5 and multiply 3 7. These strings need to be parsed and the appropriate sub-class of Operator constructed with the numbers as its state. Operator is however never instantiated because it doesn't make sense to do so - a perfect candidate for an abstract class if such a thing existed in Ruby. The sub-classes of Operator expose a standard interface in the form of the do() method which is responsible for returning the result of that operation on the numbers it contains. Yup, you're right, what you're seeing is the command pattern.

There is a controller class (yes, all right, I admit it was a Rails app which spawned this post) which handles the bit which involves receiving commands and constructing command objects from them. It looks something like this:

class Controller
  def execute(commands)
    operations = build_operations(commands)
    operations.each{|operation| puts "#{operation}: #{operation.do()}"}
  end
  
  private
    
  # Iterate over parsed commands and use them to construct
  # appropriately initialised operation objects
  def build_operations(commands)
    parse_operations_and_values(commands).collect{|operation, a, b|
                              Kernel.const_get(operation).new(a.to_i, b.to_i)
                            }
  end
  
  # Iterate over a collection of commands and extract an
  # operation and the values on which it operates from each command
  # ['Add 2 5', 'Multiply 3 7'] when parsed returns
  # [['Add', '2', '5'], ['Multiply', '3', '7']]
  def parse_operations_and_values(commands)
    commands.collect{|command| command.split}
  end
end

If you're wondering about all the Array magic in build_operations(), remember that Ruby automatically decomposes Arrays, so if I do

operation, a, b = ['Add', '2', '5']

Ruby figures out that 'Add' goes into operation, '2' into a and '5' into b. This nifty ability (called destructuring assignment) also allows us to make it look like we're returning more than one value from a method when we're actually returning a collection and having Ruby assign elements from it automatically.

const_get() returns the value of the named constant passed to it. When invoked on Kernel (or Object) it ends up returning the class of that name. So Kernel.const_get('Add') returns the class Add (remember that classes are also objects in Ruby).

Let's try executing the lot like so:

commands = ['Add 2 5', 'Multiply 3 7']
Controller.new.execute(commands)

The output looks like this:


Add(2, 5): 7
Multiply(3, 7): 21

As you've realised, this is an excellent candidate for a factory - most of the code in Controller can be moved into Operation so that the Controller is no longer involved in the details of parsing commands and building Operations. But instead of simply adding a create() method to Operation, what I'd really like to be able to do is something like commands.collect{|command| Operation.new(command)} and get a neat little collection of Adds and Multiplys. The whole thing is completely transparent to the consumer who never really cared about whether the objects were Adds or Multiplys so long as they exposed the do() interface. Let's try to work toward this form of Operation.

If you've read Ola's post then you already know that the default implementation of new() looks something like this:

def self.new(*args, &block)
  obj = self.allocate
  obj.send :initialize, *args, &block
  obj
end

Of course, this doesn't work for us because we don't want to ever instantiate Operation. What we want is for Operation to look like this:

class Operation
  def self.new(command)
    operation, a, b = parse(command)
    operation_class = Kernel.const_get(operation)
    operation_class.new(a.to_i, b.to_i)
  end  
  
  def self.parse(command)
    command.split
  end
    
  def initialize(a, b)
    @a = a
    @b = b
  end
  
  def to_s
    "#{self.class}(#{@a}, #{@b})"
  end
end

The catch with this implementation is that overwriting new() modifies it even for the Add and Multiply sub-classes. Since the signatures of Operation's and Add/Multiply's constructors are different, this piece of code is dead in the water - not something we wanted. We could of course re-implement new() in both sub-classes to get around this, but that's tedious, repetitive and plain ugly.

So the trick here is to alias (or copy) the original new() in Operation before overwriting it. Now that we have a copy of new(), we use the inherited() object life-cycle hook to listen for points in the code where Operation is subclassed. When we detect that some class is inheriting from Operation, we simply replace the modified new() with the original.

See for yourself. This is the completed solution, so you should be able to simply copy it and run it.

class Operation
  class << self
    alias :__new__ :new
    
    def inherited(subclass)
      puts "#{subclass} has inherited #{self}"
      class << subclass
        alias :new :__new__
      end
    end
  end

  def self.new(command)
    operation, a, b = parse(command)
    operation_class = Kernel.const_get(operation)
    operation_class.new(a.to_i, b.to_i)
  end  
  
  def self.parse(command)
    command.split
  end
    
  def initialize(a, b)
    @a = a
    @b = b
  end
  
  def to_s
    "#{self.class}(#{@a}, #{@b})"
  end
end

class Add < Operation
  def do
    @a+@b
  end
end

class Multiply < Operation
  def do
    @a*@b
  end
end


class Controller
  def execute(commands)
    commands.collect{|command| 
                         Operation.new(command)
                       }.each{|operation| 
                         puts "#{operation}: #{operation.do()}"
                       }
  end
end


commands = ['Add 2 5', 'Multiply 3 7']
Controller.new.execute(commands)

The extra magic can be seen right at the beginning of Operation where we alias/copy the new method into the __new__ method. When we detect an inherited() event, we simply reverse the aliasing.

Running this produces the following output:


Add has inherited Operation
Multiply has inherited Operation
Add(2, 5): 7
Multiply(3, 7): 21

Controllers should always act as routers between the UI and the domain layer and contain as little as possible of the business logic. As you can see, the changes we have made has slimmed Controller down considerably, so that's one benefit right away. Also, consumers of Operation now deal with a much simpler (and non-arbitrary) interface when building Operations from commands.

Ruby's method_added object lifcycle hook

What started out as a small exercise in Ruby to allow me to measure the execution times of methods has led me down several interesting paths. I've already written about my experiences with Ruby's blocks and closures here. The next thing I ran into was the method_added object life-cycle hook. Using method_added is no big deal for the experienced Rubyist, but there's a fair amount of stuff in there which isn't obvious and which took some effort on my part to understand. I also ended up with a clearer understanding of the concepts which underpin Ruby's class structure.

If you are unfamiliar with Ruby terminology, a class method behaves in a manner similar to a static method in Java or C#. I also use the words metaclass and singleton interchangeably, but either way I'm referring to the anonymous class associated with every class in Ruby.

Modifying class Class to detect the addition of methods

method_added falls into the category of introspective methods provided by Ruby itself, and is invoked whenever a method is added to a class. Here's an example - remember that in Ruby, a class is itself an object.

class Class
  def one
    return 1
  end  
  
  def method_added(method_name)
    puts "#{method_name} added to #{self}"
  end
  
  def two
    return 2
  end  
end

class Hello
  def say_it
    return "Hello!"
  end
end

Output:
method_added added to Class
two added to Class
say_it added to Hello

As you can see, the addition of method one isn't detected because method_added hasn't been defined. Once it has been defined, all other method additions are detected (including the addition of method_added itself!).

Modifying specific classes to detect the addition of methods

It's rarely that you'd need to detect the addition of methods to every single class in the ObjectSpace. Repeating what we did in the previous example just for a single class demonstrates some of that non-obviousness I was talking about. Let's try to detect all method additions to class Hello; the obvious solution (given below) unfortunately doesn't work.

class Hello
  def method_added(method_name)
    puts "#{method_name} added to #{self}"
  end
  
  def say_it
    return "Hello!"
  end
end

puts Hello.new.say_it

Output:
Hello!

We would expect to see say_it was added to Hello, but we don't. Let's take this step by step and figure out what's going on.

As I mentioned earlier, Hello is an instance of Class. We can create the class Hello by simply saying Hello = Class.new. Let's prove this:

def Hello.some_class_method  # => uninitialized constant Hello (NameError)
  return "It worked!"
end  

puts Hello.some_class_method

Now let's try again after defining Hello


Hello = Class.new

def Hello.some_class_method
  return "It worked!"
end  

puts Hello.some_class_method   # => It worked!

So far so good. Now, we also know that adding method_added to Class allowed us to detect methods added to Hello. Obviously, Hello inherited method_added in some manner, but not as an instance method or our example above would have worked too. Let's go back to the first example and do some poking around by adding a couple of lines at the end.

class Class
  def method_added(method_name)
    puts "#{method_name} added to #{self}"
  end
end

class  Hello
  def say_it
    return "Hello!"
  end
end

puts Hello.new.say_it
puts (Hello.methods - Hello.instance_methods).grep(/added/)
Hello.method_added("manually_invoking")

Output:
method_added added to Class
say_it added to Hello
Hello!
method_added
manually_invoking added to Hello

The first line, (Hello.methods - Hello.instance_methods).grep(/added/) first gets a collection of all methods belonging to Hello from which it removes those which are Hello's instance methods. It then searches among what's left for methods with the word 'added' in the name.

Simply put, we find that method_added has surfaced as a static or class method on Hello. The very next (and final) line in the example verifies this by actually invoking method_added and sure enough we were right - method_added is indeed a class method of Hello.

Let's test this theory now with a quick example.

class  Hello
  def self.method_added(method_name)
    puts "#{method_name} added to #{self}"
  end
  
  def say_it
    return "Hello!"
  end
end

puts Hello.new.say_it

Output:
say_it added to Hello
Hello!

So there we go, add method_added as a class method to detect the addition of methods to to that class.

The questions now are
a) Why does this work this way?
b) How do we detect the addition of class methods?

Lets tackle them in order.

It's all in the singleton class

The answer lies in the metaclass or singleton class holding Hello's meta-data (which includes its methods). Let's try to define how they relate to each other with a bit of code. I'm creating a singleton_class helper method in Object to get hold of an instance's singleton/meta class.

class Object
  def singleton_class
    class << self;self;end;
  end
end

class  Hello
end

p Hello.singleton_class.class
p Hello.singleton_class
p Hello.new.singleton_class
p Hello.singleton_class.singleton_class

puts Hello.singleton_class.superclass == Hello.new.singleton_class.superclass.superclass

Output:
Class
#<Class:Hello>
#<Class:#<Hello:0x2924904>>
#<Class:#<Class:Hello>>
true

We see that the singleton classes is of type Class. We also see that Hello and instances of Hello have their own singletons (I know that's obvious, but I thought I'd mention it anyways).

When you add a method to a class, it is added not to the class itself, but rather to its metaclass. Therefore, method_added needs to be in the metaclass, which is precisely what happens when you define a class method. Just to illustrate the point that class methods are simple methods defined on the metaclass, here are the different ways in which you can define a class method:

class Hello
  def Hello.class_method_one
    # ...
  end
  def self.class_method_two
    # ...
  end
  class << self
    def class_method_three
      # ...
    end
  end
end

They all do the same thing - add a method to the metaclass.

Question (b), 'How do we detect the addition of class methods?', has a simpler answer: singleton_method_added. I got this answer from the internal ThoughtWorks dynamic languages list (specifically Carlos Villela and Ola Bini - thanks guys!). Use it exactly as you would method_added

class Hello
  class << self
    def method_added(method_name)
      puts "#{method_name} added to #{self}"
    end

    def singleton_method_added(method_name)
      puts "#{method_name} added to #{self}"
    end
  end
end

class Hello
  def instance_method
    "Hey!"
  end

  def self.class_method
    "Dude"
  end
end

Output:
singleton_method_added added to Hello
instance_method added to Hello
class_method added to Hello

To Summarise

method_added is an object life-cycle hook invoked whenever a method is added to a class

To listen for the addition of instance methods to a class, method_added must be added to that class' singleton/meta class, or, to put it another way, as a class method of the class.

To listen for addition of class methods, singleton_method_added must be added to the class' singleton/meta class, just like we would with method_added

Ruby blocks gotchas

New to blocks in Ruby? RubyMonk has chapters covering both introductory topics as well as more detailed lessons on blocks. Do try them out!

There's this thing they say about Ruby - everything is an object. It's true, with very few exceptions, one of them being the block. Well guess what, this little gem of an inconsistency came back to bite me when I was trying to do something involving dynamic redefinition of methods.

The context: I recently wrote a little method decorator to help me figure out the execution times of the methods in a class. Nothing complicated - for a given class, alias each method, then redefine it; the new method invokes the original method while measuring the execution time. Here's a pseudocode-ish example to clarify:

define_method method do |*args|
 t = Time.now

 result = self.send(aliased_original_method, *args)

 diff =  Time.now-t
 puts "#{klass}##{method} took #{diff} s" if diff > 0
 return result
end

You may have already noticed that the psedocode above doesn't handle methods which accept blocks - if I tried to decorate Array, then Array#each would fail to execute. My actual solution did handle this, and I'll publish that in another post, maybe others will find it useful.

Anyways, this didn't take long, but once I was done, I was intrigued by the notion of a generic method decorator. It would be pretty cool if I could include my Decorator module into a class, pass it an arbitrary block to do any of those AOP-ish things like logging or, as I said, measuring execution times, and have all the methods decorated by that block. All the decorator block should have to do to execute the original method would be to yield.

So this led me to try to figure out the whole deal with blocks. Simply put, there are two ways to handle blocks as parameters - implicitly and explicitly.

Implicitly passing and invoking blocks

This is the usual way in which blocks are passed to methods. Here's what it looks like:

def foo(*args)
 yield(args.join(' '))
end
foo('Sidu', 'Ponnappa'){|name| puts "Hello #{name}"} # => "Hello Sidu Ponnappa"

*args allows us to handle an arbitrary number of parameters - they're made available inside the method as an array, where we join them and pass them to our block via yield.
The block is passed to the method by enclosing it in curly braces and placing it after the method invocation. Only one block can be passed to a method in this manner.
Most importantly, the block is never bound, and so is not available as an object. It is implicitly invoked by calling yield within the method.

Explicitly passing, binding and invoking blocks

We go this route if we want a handle to the block. Here's a code example - it's similar to the one above, but we bind to the block and then invoke it explicitly.

def foo(*args, &blk)
 blk.call(args.join(' '))
end
foo('Sidu', 'Ponnappa'){|name| puts "Hello #{name}"} # => "Hello Sidu Ponnappa"

The & binds the block to the variable blk making it available as a Proc object.

An even more explicit style involves first binding a variable to the block and then passing it to the method as an argument (as opposed to using & and having Ruby do it automagically). This style is often used when doing functional programming - Reg Braithwaite has a beautiful article covering this style of programming in Ruby.

Anyways, here's the example:

def foo(*args)
 blk = args.delete_at(-1) # We know that the last argument 
                          # is the bound block
 blk.call(args.join(' '))  
end

the_block = lambda {|name| puts "Hello #{name}"}
foo('Sidu', 'Ponnappa', the_block) # => "Hello Sidu Ponnappa"

As you can see, we bind the block to the_block using the built in Ruby method lambda and pass it as a regular argument. No magic like the previous examples - the block (now a Proc object) is treated like any other object would be. This, to my eyes, is the most consistent way to use blocks (everything should be an object). It has a significant disadvantage, however, as we'll see in the next section.

The difference - implicit invocation is much faster

The reason why there are two approaches is simple - performance. Binding a block takes time, so we try to avoid it by going the implicit invocation route. Let's get a handle on the actual differences in performance, though, by benchmarking the examples above (modified slightly to avoid 100000 'puts'). I've renamed the three different example methods to foo, bar and ooga respectively.

require 'benchmark'

# Implicit
def foo(*args)
 yield(args.join(' '))
end
puts foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block when passed
def bar(*args, &block)
 block.call(args.join(' '))
end
puts bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"} # => "Hello Sidu Ponnappa"

# Explicitly binds block before passing
def ooga(*args)
 blk = args.delete_at(-1)
 blk.call(args.join(' '))  
end

the_block = lambda {|name| "Hello #{name}"}
puts ooga('Sidu', 'Ponnappa', the_block) # => "Hello Sidu Ponnappa" 

puts "Starting benchmark"

n = 100000

Benchmark.bmbm(10) do |rpt|
 rpt.report("foo") do
  n.times {foo('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
 end

 rpt.report("bar") do
  n.times {bar('Sidu', 'Ponnappa'){|name| "Hello #{name}"}}
 end

 rpt.report("ooga") do
  n.times {
    the_block = lambda {|name| "Hello #{name}"}
    ooga('Sidu', 'Ponnappa', the_block)
  }
 end
end

Output:

Hello Sidu Ponnappa
Hello Sidu Ponnappa
Hello Sidu Ponnappa

Starting benchmark

Rehearsal ---------------------------------------------
foo         0.781000   0.000000   0.781000 (  0.782000)
bar         1.406000   0.000000   1.406000 (  1.406000)
ooga        1.438000   0.016000   1.454000 (  1.453000)
------------------------------------ total: 3.641000sec

                user     system      total        real
foo         0.782000   0.000000   0.782000 (  0.781000)
bar         1.375000   0.015000   1.390000 (  1.406000)
ooga        1.453000   0.032000   1.485000 (  1.485000)

As you can see, bar, which uses an explicit invocation is approximately 75% slower than foo. ooga, where the block is bound right at the beginning and passed as a parameter is the slowest. TANSTAAFL, I guess.

This trick of benchmarking is borrowed from Joel VanderWerf, who posted a similar benchmark involving all permutations of implicit and explicit invocations over at the Ruby forum.

The catch - implicitly invoking a block from within another block does not work

As a direct consequence of this performance benefit, most of the Ruby code I've seen takes the implicit route. Unfortunately, it is not possible to dynamically redefine methods which expect blocks as implicit parameters - not, and have them continue to behave as before. I know that sounds weird, but read on to the example and all shall be made clear. Hah, always wanted to say that. Ahem.

Getting back to the point, if you dynamically define a method using define_method, the method body is passed to it as a block. You cannot pass a block to this dynamically defined method implicitly - at least not that I could find. If there is a way, please let me know - it would help me get a lot of stuff done neatly. In the meanwhile, here's an example demonstrating this inconsistent behaviour.

class SandBox
  def abc(*args)
    yield(*args)
  end

  define_method :xyz do |*args|
   yield(*args)
  end
end

SandBox.new.abc(1,2,3){|*args| p args}  # => [1, 2, 3]
SandBox.new.xyz(4,5,6){|*args| p args}  # => no block given (LocalJumpError)

SandBox.new.method(:abc).call(1,2,3){|*args| p args} # => [1, 2, 3]
SandBox.new.method(:xyz).call(4,5,6){|*args| p args} # => no block given (LocalJumpError)

The calls to abc succeed, but those to xyz throw a LocalJumpError. There seems to be some fundamental difference in the methods created by def and define_method, with the latter being unable to handle implicitly passed blocks. Here's something else which I tried, which didn't work either:

lmbda = lambda{|*args| yield(*args)}
prc = Proc.new{|*args| yield(*args)}

lmbda.call(7, 8, 9){|*args| p args}  # => no block given (LocalJumpError)
prc.call(10,11,12){|*args| p args}  # => no block given (LocalJumpError)

Note that while lambda and Proc.new both bind a block creating a Proc object, lamda causes the bound block to behave more like a method. It also has some differences in the scope available to the bound block. Proc.new is mildly deprecated in favour of lambda.

To Summarise

Blocks violate the 'everything is an object' rule in Ruby for performance reasons. They only become objects when bound to a variable.

Implicit invocation of a block using yield is much faster than alternatives involving binding the block to a variable.

Most Ruby code uses implicit block passing to avoid binding blocks.

Blocks cannot themselves accept a block as an implicit parameter (rather, I couldn't find any way to do this - suggestions welcome).

If you define a method using define_method, the method body is passed in as a block. This new method cannot itself make use of yield to invoke an unbound block passed to it implicitly.

This is inconsistent behaviour, which, if I haven't missed something, kinda sucks.

While searching for a solution to my problem, I came across Paul Cantrell's exhaustive documentation of the different flavours of blocks/closures in Ruby, as well as their little eccentricities. It's well worth a read.

Update 2007-11-27:
As an anonymous commenter pointed out, Ruby 1.9 will indeed fix this inconsistency. The details can be found here.

You may also want to read: Ruby blocks redux: Ruby 1.9.0, Ruby 1.8.6 and JRuby 1.0.3, which was posted after the release of 1.9.0

Looking for help with your Ruby/Rails project? Hire us!

If you liked this post, you could

subscribe to the feed

Follow @ponnappa

or simply comment on this post

Scala: initial impressions

I've gotten the lift web framework to build thanks to David Pollak's suggestion on the lists that I reduce allocated memory to 1024MB. Since then I've been ambling around, poking at things, trying to get a feel for both Scala and lift. Here are a few first impressions.

No powerful meta-programming capabilities
Biggest bummer - Scala doesn't seem to have any serious meta-programming capabilities that I could find. Nothing like Groovy and definitely nothing like Ruby. However, two approaches to extending the language are given on the website:

any method which takes a single argument may be used as an infix or postfix operator
closures are constructed automatically depending on the expected type (target typing)

The former allows us to do something like var result = x or y which is rather more readable than the Java equivalent, boolean result = x.or(y);
As you've probably guessed, or is a method defined on a user defined class of which both x and y are instances.

The second option allows us to pass a block to a method in a syntactically clean way. Here's an example from the Scala website.

object TargetTest1 extends Application {
def whileLoop(cond: => Boolean)(body: => Unit): Unit =
  if (cond) {
    body
    whileLoop(cond)(body)
  }
var i = 10
whileLoop (i > 0) {
  println(i)
  i -= 1
}
}

What we're interested in is the definition of the whileLoop construct and its usage. Note that (body: => Unit) allows the whileLoop method to be invoked with a parameterless function as the second argument. In the usage, this is the bit that's encased in curly braces right after whileLoop (i > 0). Pretty nifty. If Java supported blocks using curly braces, the method call would look like this: whileLoop((i > 0), {System.out.println(i); i-=1;})

This makes it possible to construct DSLs which are considerably more readable than those written in, say, Java or C#, because we can get rid of most of the comma/semi-colon noise as well as invoke methods in a eminently readable fashion.

There is also a section on the Scala wiki titled 'future:metaprogramming'. At this time, there are eight sub-sections - Definitions, Design Goals, Requirements, Constraints, Proposals, Examples from other languages, Research Papers and Discussion Threads. Only the last three links lead to any content, so I'm figuring that open classes, eval() and such-like can only be expected in the future.

In the course of my research, I found references to a DSL written in Scala here and discussions on achieving meta-programming on lang.scala here and here.

Functioning package repository
Scala has Scala Bazaar (a.k.a sbaz) which seems to function in a manner similar to RubyGems and Perl's CPAN. However, the list of available packages, at 243, is quite small.

Heavy emphasis on Java
Most of the libraries used for development in Scala are Java libraries. I found little written in pure Scala. It seems the two years of JVM-only development has caused some biasing - if the .Net Scala compiler had kept up with the JVM version, ther'd be a lot more pure Scala stuff available. Something I missed sorely was a pure Scala build tool, something like Ruby's rake. lift for example uses Maven. A lot of the thinking in these areas seems to be from primarily Java people - I came across a project to create a build tool in Scala, but it was based on Ant.

That's it from me on Scala thus far - more as it surfaces.

You may also want to read my previous post on why I'm messing around with Scala and lift: Scala, Lift and being cussed

Aren't dynamic inheritence changes possible in Ruby classes?

I was wondering if it was possible to make a class change superclasses dynamically - something which would allow us to make class inheritance behave along the lines of Javascript's prototype chain, perhaps. To clarify, something along the lines of

class A
  def something
  .
  .
  end
end

class B
  def someother
  .
  .
  end
end

B.inherits_from(A)

The idea being we insert a superclass into the inheritance chain dynamically.
After digging around a bit, I found this thread at the Ruby archives which says it isn't possible. The last mail on the thread demonstrates how to do it in Python, though :D. Here it is:

class Base(object):
    def meth(self):
        print 'called B.meth'

class Mixin1(object):
    def meth1(self):
        print 'called meth1'

class Mixin2(object):
    def meth2(self):
        print 'called meth2'


class C(Base, Mixin1):
    pass

c = C()

c.meth()
c.meth1()

C.__bases__ = (Base, Mixin2) # change the base classes (ick!)

print [methname for methname in dir(c) if methname.startswith('meth')]

c.meth()
c.meth2()
c.meth1() # this gives an error now

However, while that avenue is closed to us, we are still free to make use of extend and include in combination with class_eval to enhance a class and its instances respectively at runtime by adding modules - but removing the module after isn't possible. A quick example.
First, let's create a module for us to extend.

module Sneak
  def sneak_me
    return "snitch!"
  end
end

Then let's meddle with some class, say String.

String.class_eval{
  extend Sneak
}

We can now do String.sneak_me - the methods of the module have been added as class methods (or static methods, if you prefer) by executing the code in that block in the context of the class. Sort of like having

class String
  extend Sneak
end

but only at runtime.
Now to change all instances of String.

str = String.new("!")
str.respond_to?("sneak_me") # -> check if str has this method. Returns false.
String.class_eval{
  include Sneak
}
str.respond_to?("sneak_me") # -> Now returns true.
puts str.sneak_me # -> prints "snitch!"

So using include and class_eval we added a module to all instances of String, including str which we'd instantiated before actually including the module.

You may also want to read: Object Oriented and Functional: Is there a middle ground?