Ruby and XML not-so-simple?
Posted by Nick Sieger Thu, 02 Nov 2006 02:12:00 GMT
Man, I think I’ve been reading too much Sam Ruby lately (ok, that was a year ago, but not much has changed). You have to admit, though, that XML handling in Ruby is one of those things that just doesn’t feel quite right. REXML is pretty much the standard API for Ruby, yet it suffers from two showstoppers in my opinion:
In Ruby 1.8.4 it still has the glaring hole Sam mentioned last year with well-formedness. (No exception raised below!)
irb(main):001:0> require 'rexml/document' => true irb(main):002:0> d = REXML::Document.new '<div>at&t' => <UNDEFINED> ... </> irb(main):003:0> d.root => <div> ... </> irb(main):004:0> d.root.text => "at&t"
REXML::Text#to_smethod violates the principle of least surprise. In just about every other XML parser written, when you ask a text node for its contents, it returns you the value with entities resolved. Not so
Text#to_s. You have to call
Text#valueinstead. Unfortunately, this would be difficult to reverse in future versions of REXML without breaking existing apps.
irb(main):001:0> require 'rexml/document' => true irb(main):002:0> t = REXML::Text.new('at&t') => "at&t" irb(main):003:0> t.to_s => "at&t" irb(main):004:0> t.value => "at&t"
This second problem manifests itself in subtle ways. If you’re calling
Element#text (which is probably the most common way), you’re fine, because it implicitly does
self.texts.first.value under the hood. But if you want to make sure you’re grabbing all the text content, you might be inclined to write
element.texts.join('') to concatenate them together. But this method bypasses the
value method and instead uses
to_s, leaving you with unresolved entities.
It turns out this problem is exhibited in the version of XmlSimple now included with Edge Rails as of rev 4453. So if you’re living on the edge using the newly minted ActiveResource fetching XML from remote resources like a champion, you just got benched as soon as you tried to fetch XML that had normalized entities inside.
XmlSimple version 1.0.9 has a partial fix for this issue, but I submitted another patch to Maik Schmidt for review that he subsequently released as 1.0.10. I’ve attached the 1.0.10 version to ticket 6532 in hopes that it will be patched in Rails soon.