RSpec 2 Matcher Fun

Posted by Nick Sieger Thu, 20 Jan 2011 17:47:21 GMT

I was troubleshooting some JRuby code that transforms Java camelCase method names into Ruby snake_case form. We had a bunch of specs that did this, for example:

describe "Java instance method names" do
  it "should present javabean properties as attribute readers and writers" do
    methods = MethodNames.instance_methods

    methods.should include("getValue2")
    methods.should include("get_value2")
    methods.should include("value2")

    methods.should include("setValue2")
    methods.should include("set_value2")
    methods.should include("value2=")
  end
end

The problem comes when these specs fail. The default error message made by the #include matcher looks like:

Failures:

  1) Java instance method names should present javabean properties as attribute readers and writers
     Failure/Error: methods.should include("get_value2")
       expected [...full contents of array here...] to include "get_value2"
       Diff:
       @@ -1,2 +1,186 @@
       -get_value2
       +[...all entries, one per line here...]

That’s not a terrible message, but when your array contains over 100 entries (like an array of method names), it could be a lot better. In particular, I kept scanning the failure message’s big list, unable to clearly see why the methods I was expecting weren’t there.

What I wanted to see was how my changes to the regex which splits a Java camelCase name affected the conversion. So, what I needed was a report of which method names were the closest to the ones that were not in the list. Hey, sounds like a good reason to implement a custom matcher, and take a diversion into fuzzy string matching algorithms!

I settled on porting the pseudocode in Wikipedia for the Levenshtein distance, which calculates how close in content two strings are to each other. I looked around and there are existing Levenshtein ports for Ruby, but they use native code for performance. I don’t need performance because I’m only using the Levenshtein function when there is a failure. Of course, pure Ruby code is more portable too!.

The other change I made in the specs was to pass all strings in a single matcher rather than one name per expectation, so we can see all names that fail, not just the first.

So now, the new spec looks more like this:

describe "Java instance method names" do
  let(:members) { MethodNames.instance_methods }

  it "should present javabean properties as attribute readers and writers" do
    members.should have_strings("getValue2",
                                "get_value2",
                                "value2",
                                "setValue2",
                                "set_value2",
                                "value2=")
  end
end

The custom RSpec matcher #have_strings is declared like so:

RSpec::Matchers.define :have_strings do |*strings|
  match do |container|
    @included, @missing = [], []
    strings.flatten.each do |s|
      if container.include?(s)
        @included << s
      else
        @missing << s
      end
    end
    @missing.empty?
  end

  failure_message_for_should do |container|
    "expected array of #{container.length} elements to include #{@missing.inspect}.\n" +
      "#{closest_match_message(@missing, container)}"
  end

  failure_message_for_should_not do |container|
    "expected array of #{container.length} elements to not include #{@included.inspect}."
  end

  def closest_match_message(missing, container)
    missing.map do |m|
      groups = container.group_by {|x| levenshtein(m, x) }
      "  closest match for #{m.inspect}: #{groups[groups.keys.min].inspect}"
    end.join("\n")
  end
end

I omitted the #levenshtein function here for brevity. (You can view the full source for details.) Now our failing spec output looks like:

Failures:

  1) Java instance method names should present javabean properties as attribute readers and writers
     Failure/Error: members.should have_strings("getValue2",
       expected array of 185 elements to include ["get_my_value", "my_value", "set_my_value", "my_value="].
         closest match for "get_my_value": ["get_myvalue", "set_myvalue"]
         closest match for "my_value": ["myvalue"]
         closest match for "set_my_value": ["get_myvalue", "set_myvalue"]
         closest match for "my_value=": ["myvalue="]

Now the failure message is giving me exactly the information I need. Much better, don’t you think?

Tags ,