REXML a Drag...Again

Posted by Nick Sieger Thu, 17 Jan 2008 04:07:00 GMT

We’ve been here before. So here’s the scenario: You’re feeding medium-to-large chunks of XML out of one Rails app, to be consumed by another via ActiveResource. Maybe those chunks have embedded HTML, or maybe they’re an Atom feed containing several pieces of HTML with all the entities escaped. Maybe they contain entire Wikipedia pages in them. Lots of entities that need expansion when the file is parsed.

So what does ActiveResource do with this? Hash.from_xml. Which uses xml-simple. Which constructs a REXML::Document, and proceeds to navigate the entire DOM, scraping the text nodes out of it so they can be stuffed in a hash to be handed back to ActiveResource. And how does REXML expand all the entities it runs across? With this little lovely:

# Unescapes all possible entities
def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil )
  rv = string.clone
  rv.gsub!( /\r\n?/, "\n" )
  matches = rv.scan( REFERENCE )
  return rv if matches.size == 0
  rv.gsub!( NUMERICENTITY ) {|m|
    m=$1
    m = "0#{m}" if m[0] == ?x
    [Integer(m)].pack('U*')
  }
  matches.collect!{|x|x[0]}.compact!
  if matches.size > 0
    if doctype
      matches.each do |entity_reference|
        unless filter and filter.include?(entity_reference)
          entity_value = doctype.entity( entity_reference )
          re = /&#{entity_reference};/
          rv.gsub!( re, entity_value ) if entity_value
        end
      end
    else
      matches.each do |entity_reference|
        unless filter and filter.include?(entity_reference)
          entity_value = DocType::DEFAULT_ENTITIES[ entity_reference ]
          re = /&#{entity_reference};/
          rv.gsub!( re, entity_value.value ) if entity_value
        end
      end
    end
    rv.gsub!( /&/, '&' )
  end
  rv
end

Now, when you look at this, your first impression is that it just screams fast, right? Let’s run Hash.from_xml on the file I mentioned above.

# unnormalize.rb
require 'rubygems'
gem 'activesupport'
require 'active_support'

File.open("page.xml") do |f|
  Hash.from_xml(f.read)
end
$ time ruby unnormalize.rb

real    0m16.221s
user    0m14.447s
sys     0m0.346s

Whoa! Knock me over with a feather! It blows chunks! You mean calling #gsub! repeatedly in a loop with dregexps (regexp literals with interpolated strings) doesn’t go fast? It’s doubly worse on JRuby, too:

$ time jruby unnormalize.rb

real    0m33.637s
user    0m32.897s
sys     0m0.573s

All this on a paltry 393K xml file. Makes me wonder how anyone ever does any serious XML processing in Ruby.

I know, this is open source, I should be whipping up a patch for this and submitting it. Well, I did cook up a solution, but it unfortunately is only available for JRuby at the moment. (I also have much more faith in Sam Ruby than myself to get the semantics of a rewritten REXML::Text::unnormalize correct.)

A while back I cooked up JREXML because Regexp processing in JRuby was slow at the time, and the guts of REXML is driven by a Regexp-based parser. JREXML swaps out that regexp parser with a Java pull parser library, and at the time it provided a modest speedup.

So, in the context of JREXML, the solution now becomes simple, by taking advantage of the fact that Java XML parsers typically expand entities for you. The just-released JREXML 0.5.3 circumvents REXML::Text::unnormalize when constructing a document from the Java-based parser. And the results again don’t disappoint:

$ time jruby unnormalize_jrexml.rb

real    0m5.802s
user    0m5.315s
sys     0m0.345s

Update: At Sam’s request, I ran the numbers again with REXML trunk, which condenses entity expansion into a single gsub. Speed is more in line for MRI, but didn’t move much for JRuby (probably more a datapoint for JRuby developers than REXML developers).

$ time ruby -I~/Projects/ruby/rexml/src unnormalize.rb 

real    0m6.592s
user    0m0.845s
sys     0m0.345s

$ time jruby -I~/Projects/ruby/rexml/src unnormalize.rb

real    0m34.353s
user    0m33.023s
sys     0m0.714s

Tags , ,  | 6 comments

JRuby 1.0.3: No Java-based extension library backward compatibility

Posted by Nick Sieger Wed, 19 Dec 2007 04:14:00 GMT

JRuby 1.0.3 just came out a couple of days ago. It was a decent point release; a handful of good bugs fixed. Normally a 1.0.3 release would not be all that exciting, but during this cycle, trunk’s internal API (upon which several JRuby extensions depend) started to diverge. Unfortunately, this forced us to face a decision: either fork and maintain two versions of every extension (one for 1.0.x and one for 1.1 and beyond), or break backwards compatibility.

We ended up choosing the latter, prefering a single schism to parallel version hell. It’s probably going to cause some pain for us (in number of support inquiries), and especially for those who might be looking casually at JRuby and trying it for the first time, for example via NetBeans. NetBeans 6.0 recently shipped with JRuby 1.0.2, which is now incompatible with the latest versions of several high-demand gems. Look for the 6.1 nightly builds to be fixed soon, and hopefully the 6.0.1 update can include the new release as well. (If you’re using NetBeans 6 and have run into this problem, you can download and unpack JRuby 1.0.3 and show NetBeans where it is.)

So when in doubt, grab the most recent JRuby release possible to minimize compatibility issues. To attempt to be as clear as possible about which versions work with what, I’ve included a table below. I’ll fill in with updates as I receive them, and let me know if a piece of software you use isn’t mentioned, but should be.

 JRuby Version
 1.0 - 1.0.2, 1.1b1 1.0.3, 1.1b2
Library 
rubygems<= 0.9.4<= 0.9.4, = 1.0 *
rails<= 1.2.6,
>= 2.0.x †
any
activerecord-jdbc<= 0.6>= 0.7
jruby-openssl<= 0.0.5>= 0.1
goldspike1.31.4
mongrelany ‡1.1.2

* Rubygems 0.9.5 may not be compatible with any JRuby version; we won’t ship it with a release
† requires jruby-openssl (0.0.5 or earlier) to be installed
‡ combination needs testing with JRuby 1.0.2 and Mongrel 1.1.2

Other libraries not mentioned here, such as javasand (JRuby version of freaky freaky sandbox) or jparsetree (JRuby version of ParseTree) will also likely need updating for 1.0.3 and 1.1. For library authors needing a hint for which way to go, here are some pointers to our temporary bridge API.

Lessons learned? An extension API and migration strategy might be normally be a good thing to nail down before a 1.0 release. Hopefully, you’ll forgive us that blunder this one time, and we’ll make sure to get this mess cleaned up in a future 1.x release, and any pains you had to go through with version incompatibilities will be soothed by the continual high-quality releases we’ve been able to craft.

Tags  | 4 comments

ActiveRecord-JDBC 0.6 Released!

Posted by Nick Sieger Tue, 06 Nov 2007 15:00:00 GMT

Just out is ActiveRecord-JDBC 0.6, the post-RubyConf release.

The sparkly new feature is Rails 2.0 support. In the soon-to-be-released Rails 2.0 (edge), Rails will automatically look for and load an adapter gem based on the name of the adapter you specify in database.yml. Example:

development:
  adapter: funkdb
  ...

With this database configuration, Rails will attempt to load the activerecord-funkdb-adapter gem, require the active_record/connection_adapters/funkdb_adapter library, and call the method ActiveRecord::Base.funkdb_connection in order to obtain a connection to the database. (This is the mechanism used to off-load non-core adapters out of the Rails codebase.)

We can leverage this convention to make it easier than ever to get started using JRuby with your Rails application. So, the first thing new in the 0.6 release is the name. You now install activerecord-jdbc-adapter:

jruby -S gem install activerecord-jdbc-adapter

But wait, there’s more! We also have adapters for four open-source databases, including MySQL, PostgreSQL, and two embedded Java databases, Derby and HSQLDB. And, for your convenience, we’ve bundled the JDBC drivers in dependent gems, so you don’t have to go hunting them down if you don’t have them handy.

Check this out. Get a fresh copy of JRuby 1.0.2, unpack it, and add the bin directory to your path. Install the adapter:

$ jruby -S gem install activerecord-jdbcderby-adapter --include-dependencies
Successfully installed activerecord-jdbcderby-adapter-0.6
Successfully installed activerecord-jdbc-adapter-0.6
Successfully installed jdbc-derby-10.2.2.0

In your Rails application, freeze to edge Rails (soon to be Rails 2.0).

rake rails:freeze:edge

Re-run the Rails command, regenerating configuration files.

jruby ./vendor/rails/railties/bin/rails .

Currently, Rails 2.0 uses openssl for the HMAC digest used in the new cookie session store, so we have to install the jruby-openssl gem:

jruby -S gem install jruby-openssl

Now, update your config/database.yml as follows:

development:
  adapter: jdbcderby
  database: db/development

Re-run your migrations, and you should now see a Derby database footprint in the db/development directory.

$ ls -l db/development
total 24
-rw-r--r--    1 nicksieg  nicksieg    38 Nov  6 08:24 db.lck
-rw-r--r--    1 nicksieg  nicksieg     4 Nov  6 08:24 dbex.lck
drwxr-xr-x    5 nicksieg  nicksieg   170 Nov  6 08:24 log/
drwxr-xr-x   65 nicksieg  nicksieg  2210 Nov  6 08:24 seg0/
-rw-r--r--    1 nicksieg  nicksieg   882 Nov  6 08:24 service.properties
drwxr-xr-x    2 nicksieg  nicksieg    68 Nov  6 08:24 tmp/

That’s it! To re-emphasize, to make your application run under JRuby, no longer will you need to a) find and download appropriate JDBC drivers, b) wonder where they should be placed so that JRuby will find them, or c) make custom changes to config/environment.rb. All that’s taken care of you if you use one of the following adapters:

  • activerecord-jdbcmysql-adapter (MySQL)
  • activerecord-jdbcpostgresql-adapter (PostgreSQL)
  • activerecord-jdbcderby-adapter (Derby)
  • activerecord-jdbchsqldb-adapter (HSQLDB)

If you need to connect to a different database, you’ll still need to place your database’s JDBC driver jar file in the appropriate place and use the straight activerecord-jdbc-adapter. Also note that in this case, and for Rails 1.2.x in general, you’ll still need to add that pesky require statement to config/environment.rb.

As always, there are bug fixes too (though we haven’t been tracking exactly which ones are fixed). We’re starting to file ActiveRecord-JDBC bugs in the JRuby JIRA now, and will be putting in future AR-JDBC versions to target soon too. So, please file new bugs in JIRA (and select component “ActiveRecord-JDBC”) rather than in the antiquated Rubyforge tracker.

Tags , ,  | 9 comments

JRuby Performance Tweaks

Posted by Nick Sieger Thu, 25 Oct 2007 16:04:17 GMT

The back-story for my post on JRuby performance is that I was actually doing some benchmarking while applying successive tweaks to the JRuby environment to see how it affected our application performance. Only after running the final set of tweaks for longer numbers of requests did I notice that JRuby was catching up to MRI!

I thought it would be interesting to point out the tweaks themselves, to reiterate the point that JRuby and the JVM give you lots of knobs to tune your application. The notepad of tweaks, numbers and the script used to run them is here.

  1. I started by running the application with the “out of the box” setup: JRuby 1.0.1. I then started to apply tweaks to see how the numbers changed.
  2. The first tweak was to turn off ObjectSpace, which is pure overhead for JRuby.
  3. Next, I enabled JREXML since our application uses ActiveResource.
  4. Next, I tried enabling Charlie’s discovery in the rexml/source. It appears that the benefit was negated by JREXML, so I went back and ran another run with the rexml/source patch and without JREXML as 2a. 2a gave almost as much benefit by itself as JREXML, so that’s an option if you don’t want the additional dependency, but you should measure for yourself since the performance profiles of those two fixes for XML may differ depending on the amount of XML consumed. In our case it’s relatively small, a few kilobytes at most.
  5. Next, I switched to JRuby trunk instead of 1.0.1. Trunk has, among other improvements, a complete compiler, which should allow more of the application to be translated to Java bytecode.
  6. The last tweak for this study was to change to the “server” VM. The server VM is known to take longer to warm up, but is more aggressive in the optimizations it performs.

The beauty of this exercise is that all the changes made provided small performance boosts for the application. Going forward, we hope to make more of this baked-in behavior (ObjectSpace off by default, possibly the rexml/source fix), but it will still help to have knowledge of and play around with the hotspot settings for the JVM.

There are a few more things I’d like to try in the future: JDK 6 is reportedly a lot faster all by itself, and the standalone Glassfish gem may give Mongrel a run for his money. There is still plenty of work left for the impending JRuby 1.1 release, so we should see the performance improvements for Rails applications running on JRuby continue to roll in.

Tags  | no comments

JRuby on Rails: Fast Enough

Posted by Nick Sieger Thu, 25 Oct 2007 03:36:00 GMT

People have been asking for a while how fast JRuby runs Rails. (Of course, “fast” has always been a relative term.) We haven’t been quick to answer the question, because frankly we didn’t know. We hadn’t been building real Rails applications on JRuby ourselves yet, and there was no definitive word from the crowd either.

Recently, several guys from ThoughtWorks have been working on a Rails petstore application and benchmark to get to the heart of the matter. Discussion has been heated on the JRuby mailing list, but results have not been conclusive yet.

In the project I’m working on, we’ve committed to using and deploying on JRuby. Eventually we were going to reach the point where we’d need to find out how well our application runs. So today I began running a simple single request benchmark on a relatively busy page. The numbers turned out to be rather surprising:

Requests

Average

(The raw data is available here.)

Now, MRI (C Ruby) will always run about the same speed no matter how many runs you give it, but it’s well known that the JVM needs time to warm up. And indeed it does; after 250 iterations, Mongrel running on JRuby finally surpasses MRI. The JRuby/Goldspike/Glassfish combo comes close as well.

Some details about the setup:

  • I ran the tests on my MacBook Pro Core 2 Duo 2.4 GHz. I didn’t disable one of the cores for the tests, which means that JRuby has an advantage over MRI because it can use both (native threads at work). However, the test script ran the requests serially, which means that the advantage was minimal.
  • The application is indeed of the “hydra” variety; the setup is nearly identical to the second diagram on that page. So a single request is passing through not one, but two Rails applications in addition to touching the database. It rendered an HTML ERb view with data from an ActiveResource-accessed RESTful service. The applications are based on Rails 1.2.3.
  • MRI version is using Ruby 1.8.6 and Mongrel 1.0.1.
  • JRuby Mongrel is also version 1.0.1 (details on installing it here)
  • JRuby on Glassfish used Glassfish 2 and Goldspike 1.4, deployed in war files via Warbler.
  • The two JRuby setups used JDK 1.5 and were tweaked to disable ObjectSpace and use the “server” VM (-server argument to the JVM).

The main point I wish to make with these numbers is that JRuby performance is there today, and still has room to grow. There’s no longer any doubt in my mind. Yes, this is a simplistic application benchmark run on a developer’s machine, but it’s a real application. The test may not be exacting in precision, but I see enough in the numbers to believe that this will be replicable to production environments. The plot thickens!

Tags , ,  | 1 comment

Warbler, A Little Birdie To Introduce Your Rails App To Java

Posted by Nick Sieger Tue, 04 Sep 2007 02:48:40 GMT

This week I was working on integrating the latest JRuby 1.0.1 and Goldspike 1.3 releases into our environment, when my frustration hit a fever pitch.

See, I had always thought that the .war packaging side of Goldspike was a little clunky and un-ruby-like, but I didn’t see a clear path to fixing it. I had heard little complaints about it here and there: the little configuration DSL didn’t give you enough control or wasn’t documented well enough; the fact that it downloads libraries from the internet during assembly (convenient, but not safe or reproducible for production deployments).

Also, in my own opinion it took the wrong approach to packaging Rails in a .war file. It puts the Rails application directory structure into the root of the .war file where any web server or Java application server might mistakenly serve up your code as static content. The Java .war file spec has this special directory called WEB-INF expressly for the purpose of hiding that stuff away, so why not use it?

And then, suddenly Goldspike was packaging up my entire Rails application directory, .svn directories and everything. So I set out to fix this once and for all.

And so I present Warbler. A little bird who chirpily steps up to the task of assembling your Rails application into a Java Web Archive (.war). Here, get it:

gem install warbler

And then, in the top directory of your Rails application,

warble

Those two steps are all it takes to make a .war file, including your application and recent versions of JRuby and Goldspike, that’s deployable to your favorite Java application server.

There are a number of points about Warbler worth mentioning.

Does one thing, well

Warbler only packages, and doesn’t care about anything else, like how to dispatch servlet requests to Rails. This will allow for more runtime servlet binding mechanisms to take advantage of Warbler in the future.

Fast and lightweight

50% less code than the Goldspike packaging plugin, yet does the job quickly and efficiently.

Sane defaults

Warbler only packages code that you need to run the application, omitting database migrations and tests. If your application is self-sufficient (no external dependencies), then the out-of-the-box configuration will probably work for you. Public HTML/images/javascript/stylesheets go in the root of the webapp, where Java webservers expect them to be.

Documented, flexible configuration

Need to customize your configuration? Run warble config and edit config/warble.rb. All the options are there, commented and documented.

Need to change out the bundled JRuby/Goldspike versions? warble pluginize makes a copy of Warbler in the vendor/plugins area of your application, allowing you to change the .jar files in the vendor/plugins/warbler-0.9/lib directory. Warbler then makes his nest in your project’s list of rake tasks (as rake -T | grep war shows)

rake war            # Create trunk.war
rake war:app        # Copy all application files into the .war
rake war:clean      # Clean up the .war file and the staging area
rake war:gems       # Unpack all gems into WEB-INF/gems
rake war:jar        # Run the jar command to create the .war
rake war:java_libs  # Copy all java libraries into the .war
rake war:public     # Copy all public HTML files to the root of the .war
rake war:webxml     # Generate a web.xml file for the webapp

Warbler even omits himself in the .war file produced when running in plugin mode, since you won’t need him at runtime. It’s the little details that matter.

Give him a try and let me know if it makes your life deploying Rails applications to JRuby on Java appservers easier!

Tags , ,  | 13 comments

ActiveRecord JDBC 0.5

Posted by Nick Sieger Sun, 02 Sep 2007 04:47:26 GMT

This one’s a bit late -- consider it part of my get-caught-up-since-unclogging-the-clogged-blog series.

ActiveRecord JDBC 0.5 is out, so you may have heard (it went out the door a week ago Friday; c.f. Arun and Tom). The major feature you get in this version is a new database.yml configuration style:

development:
  adapter: mysql
  database: blog_development
  username: root
  password:

Ok, ok, so what’s the big deal? This is just Rails’ default database configuration. Well, that’s the point -- you no longer have to know anything about JDBC URLs, driver class names, and all that. We’ve baked it in for you. This should make it easier than ever to try out your Rails application on JRuby, as the only piece of manual configuration left for you is the ugly bit of JRuby-specific code you need to activate ActiveRecord-JDBC lurking right above the Rails::Initializer:

if RUBY_PLATFORM =~ /java/
  require 'rubygems'
  gem 'ActiveRecord-JDBC'
  require 'jdbc_adapter'
end

If we can obliterate the need for this last bit of code, and make it easy to obtain the necessary driver bits, I’ll feel good enough to call this thing a 1.0 product.

Tags ,  | 1 comment

Gig: JRuby and Glassfish Hackfest

Posted by Nick Sieger Thu, 26 Jul 2007 22:49:52 GMT

I’ll be speaking and participating at an upcoming event at the Axis Café in Potrero Hill, San Francisco on August 8, co-hosted by Sun Microsystems (my new employer as of mid-May), and Joyent.

We’ll be talking JRuby, Glassfish, Open Solaris, Joyent Accelerators, and deploying your Rails applications using those technologies. Bring your laptop, outfitted with your favorite Ruby editor/IDE, and get ready to write some code (or, if you wish, take your existing app and deploy it).

Food and drinks will be provided, and space is limited, so go register now. Look forward to seeing you!

Tags , , ,

Post-JRuby-1.0 Bits and Bobs

Posted by Nick Sieger Thu, 12 Jul 2007 08:27:00 GMT

Summer is settling in, and so is the JRuby 1.0 release. Most of the core team seemed to take some time off since the release, as the commits and lists have felt quiet compared to the frenetic lead-up to 1.0. I don’t know if that’s a good thing yet -- I’m not bold enough to suggest that it means that JRuby’s just working for everyone, and that the software is bug-free. There have been calls for a point release (probably 1.0.1) and a better roadmap -- we’re working on those and should have something in the next couple of weeks.

On the other hand, the number and quality of blog posts about JRuby seem to be steadily increasing. The number of compelling applications of JRuby, both in the Ruby/Rails and the Java worlds, are being demonstrated more and more. Here are a few examples.

JMS is turning out to be a great place to sprinkle some JRuby magic. Ola started by implementing direct JMS support for ActiveMessaging, eliminating the need for the separate poller process. Nutrun goes the other way and demonstrates how simple it is to run a broker, publisher and subscriber using ActiveMQ and a short JRuby script.

Jeff Mesnil has started jmx4r, a simple DSL leveraging JRuby to write monitoring scripts for your JMX-enabled applications.

Kyle Maxwell whipped up a compelling plugin for Rails in time for RejectConf using Lucene. Why imitate when you can use the real thing?

Zed Shaw has caught the JRuby bug, and decided to do his own take on scripted Swing applications with Profligacy. Check out the progression of examples as Zed hones in on his final product. As nap points out, Zed’s Layout Expression Language (LEL) is a refreshingly concise take on specifying UI layouts. Cleaner separation of component layout and event handling logic is also a big win. Move over Groovy SwingBuilder and JavaFX!

Last but not least, my own earlier proposal on Java interface integration has been implemented in current JRuby trunk. Along the way I found the opportunity to toss in some extra sugar and implement proc and block coercion to interfaces as well. This means you can pass a proc or block to a Java method and it will be converted to the appropriate Java interface type (e.g., Runnable and Swing/AWT listeners):

button = JButton.new
button.add_action_listener do |event|
  event.source.text = "You pressed me"
end

If closures for Java can’t do this, they will have gotten it wrong. Note that blocks will be converted and used in place of the last argument to the Java method only; if you need to pass behavior to any argument preceding the last, use a proc. Here’s one example that gets progressively better as we switch over to blocks.

Two parting thoughts -- a couple of things that aren’t quite ready, but you should keep your eye on.

Glassfish dev Jean-François Arcand demonstrates parked Rails requests with Grizzly. And you thought you had to do comet-style request handling outside of Rails? The future of scalable Rails servers looks pretty good to me.

Finally, respected object technologist Alan McKean has started looking at object serialization and persistence for JRuby. You thought you had to wait for that Gemstone-thing that Avi Bryant mentioned at RailsConf? Maybe the wait won’t be too long after all...

Tags

JRuby 1.0

Posted by Nick Sieger Sun, 10 Jun 2007 04:03:00 GMT

I’d just like to take a moment to echo what Ola has to say about the JRuby 1.0 release. This one is definitely for all of you out there. It’s been incredibly gratifying to see the growth of the community, and the increased amount of positive feedback and success stories with JRuby, and I’m honored to have been part of the team that made 1.0 happen.

We really feel strongly that we’ve put out a quality piece of software, a tool that will make your work more enjoyable, easier, and allow you to inject some creativity and innovation back into the Java stack.

We’ve got a solid base to start from. Being able to run Rails is no small feat, to be sure, but the best is yet to come. You can expect more performance, a complete compiler, support for more applications, and tighter integration with long-standing Java technologies. In addition, we’d like to push the envelope of what both Ruby and Java are capable of, including implementing (even driving) Ruby 2.0 features, leading the way for dynamic language support in the JVM, eased as well as novel ways of doing application deployment, better debugging and tooling, and experiments with new ways of doing concurrent and parallel computing.

Do join up with us -- it’s never too late to hop in and enjoy the fun!

Tags ,  | 2 comments | no trackbacks

Older posts: 1 2 3 4 5 6