<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <title>Nick Sieger: Next performance fix: Builder::XChar</title>
  <id>tag:blog.nicksieger.com,2005:Typo</id>
  <generator uri="http://www.typosphere.org" version="4.0">Typo</generator>
  <link rel="self" type="application/atom+xml" href="http://blog.nicksieger.com/xml/atom10/article/366/feed.xml"/>
  <link rel="alternate" type="text/html" href="http://blog.nicksieger.com/articles/2008/01/17/next-performance-fix-builder-xchar"/>
  <updated>2008-01-23T18:09:53+00:00</updated>
  <entry>
    <author>
      <name>Mike</name>
    </author>
    <id>urn:uuid:8eaae4d7-0b3a-4aeb-a2ad-3d70b32f5745</id>
    <published>2008-01-23T18:09:53+00:00</published>
    <updated>2008-01-23T18:09:53+00:00</updated>
    <title>Comment on Next performance fix: Builder::XChar by Mike</title>
    <link rel="alternate" type="text/html" href="http://blog.nicksieger.com/articles/2008/01/17/next-performance-fix-builder-xchar#comment-371"/>
    <content type="html">&lt;p&gt;Thank you for this interesting article, which was published here.&lt;/p&gt;</content>
  </entry>
  <entry>
    <author>
      <name>Nick</name>
    </author>
    <id>urn:uuid:57088773-6b36-4b3e-a380-fc662095ce3b</id>
    <published>2008-01-18T01:54:15+00:00</published>
    <updated>2008-01-18T01:54:16+00:00</updated>
    <title>Comment on Next performance fix: Builder::XChar by Nick</title>
    <link rel="alternate" type="text/html" href="http://blog.nicksieger.com/articles/2008/01/17/next-performance-fix-builder-xchar#comment-368"/>
    <content type="html">&lt;p&gt;Indeed, you&amp;#8217;re presciently a couple steps ahead of me, making the work a lot easier. Thanks again!&lt;/p&gt;</content>
  </entry>
  <entry>
    <author>
      <name>Sam Ruby</name>
    </author>
    <id>urn:uuid:a96cabab-c6ec-484d-9c10-fce381f2297d</id>
    <published>2008-01-18T01:02:44+00:00</published>
    <updated>2008-01-18T01:02:44+00:00</updated>
    <title>Comment on Next performance fix: Builder::XChar by Sam Ruby</title>
    <link rel="alternate" type="text/html" href="http://blog.nicksieger.com/articles/2008/01/17/next-performance-fix-builder-xchar#comment-367"/>
    <content type="html">&lt;p&gt;Chuckle.  :-)&lt;/p&gt;

&lt;p&gt;You might take a look at how I completely eliminated to_xs in my patch for Ruby 1.9.  Perhaps similar techniques could be used for JRuby, as the root cause is that 1.8 MRI doesn&amp;#8217;t grok Unicode?&lt;/p&gt;

&lt;p&gt;&lt;a href='http://intertwingly.net/blog/2008/01/04/Builder-on-1-9' rel="nofollow"&gt;http://intertwingly.net/blog/2008/01/04/Builder-on-1-9&lt;/a&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <author>
      <name>Nick Sieger</name>
    </author>
    <id>urn:uuid:fe7e8324-82de-49dc-a132-f1e514007cdd</id>
    <published>2008-01-17T23:48:00+00:00</published>
    <updated>2008-01-17T23:49:03+00:00</updated>
    <title>Next performance fix: Builder::XChar</title>
    <link rel="alternate" type="text/html" href="http://blog.nicksieger.com/articles/2008/01/17/next-performance-fix-builder-xchar"/>
    <category term="jruby" scheme="http://blog.nicksieger.com/articles/tag/jruby"/>
    <category term="performance" scheme="http://blog.nicksieger.com/articles/tag/performance"/>
    <category term="ruby" scheme="http://blog.nicksieger.com/articles/tag/ruby"/>
    <category term="rails" scheme="http://blog.nicksieger.com/articles/tag/rails"/>
    <content type="html">&lt;p&gt;Next up in our performance series: &lt;code&gt;Builder::XChar&lt;/code&gt;. (Another fine Sam Ruby production!) While this piece of code in the Builder library strikes me as perfectly fine, it also tends to slow down quite a bit with larger documents or chunks of text.&lt;/p&gt;

&lt;p&gt;Our path to the bottleneck is as follows: &lt;code&gt;ActiveRecord::Base#to_xml =&amp;gt; Builder::XMLMarkup#text! =&amp;gt; String#to_xs =&amp;gt; Fixnum#xchr&lt;/code&gt;.  Consider:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_ruby "&gt;&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;rubygems&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;gem&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;activesupport&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;active_support&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;
&lt;span class="ident"&gt;require&lt;/span&gt; &lt;span class="punct"&gt;'&lt;/span&gt;&lt;span class="string"&gt;benchmark&lt;/span&gt;&lt;span class="punct"&gt;'&lt;/span&gt;

&lt;span class="keyword"&gt;module &lt;/span&gt;&lt;span class="module"&gt;Benchmark&lt;/span&gt;
  &lt;span class="keyword"&gt;class &lt;/span&gt;&lt;span class="punct"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="constant"&gt;self&lt;/span&gt;
    &lt;span class="keyword"&gt;def &lt;/span&gt;&lt;span class="method"&gt;report&lt;/span&gt;&lt;span class="punct"&gt;(&amp;amp;&lt;/span&gt;&lt;span class="ident"&gt;block&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
      &lt;span class="ident"&gt;n&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="number"&gt;10&lt;/span&gt;
      &lt;span class="ident"&gt;times&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;..&lt;/span&gt;&lt;span class="number"&gt;10&lt;/span&gt;&lt;span class="punct"&gt;).&lt;/span&gt;&lt;span class="ident"&gt;map&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt;
        &lt;span class="ident"&gt;bm&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;measure&lt;/span&gt;&lt;span class="punct"&gt;(&amp;amp;&lt;/span&gt;&lt;span class="ident"&gt;block&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt;
        &lt;span class="ident"&gt;puts&lt;/span&gt; &lt;span class="ident"&gt;bm&lt;/span&gt;
        &lt;span class="ident"&gt;bm&lt;/span&gt;
      &lt;span class="keyword"&gt;end&lt;/span&gt;
      &lt;span class="ident"&gt;sum&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;times&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;inject&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;s&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt;&lt;span class="ident"&gt;t&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;s&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;t&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;real&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
      &lt;span class="ident"&gt;mean&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;sum&lt;/span&gt; &lt;span class="punct"&gt;/&lt;/span&gt; &lt;span class="ident"&gt;n&lt;/span&gt;
      &lt;span class="ident"&gt;sumsq&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="ident"&gt;times&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;inject&lt;/span&gt;&lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="punct"&gt;)&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;s&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt;&lt;span class="ident"&gt;t&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;s&lt;/span&gt; &lt;span class="punct"&gt;+&lt;/span&gt; &lt;span class="ident"&gt;t&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;real&lt;/span&gt; &lt;span class="punct"&gt;*&lt;/span&gt; &lt;span class="ident"&gt;t&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;real&lt;/span&gt;&lt;span class="punct"&gt;}&lt;/span&gt;
      &lt;span class="ident"&gt;sd&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;Math&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;sqrt&lt;/span&gt;&lt;span class="punct"&gt;((&lt;/span&gt;&lt;span class="ident"&gt;sumsq&lt;/span&gt; &lt;span class="punct"&gt;-&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;sum&lt;/span&gt; &lt;span class="punct"&gt;*&lt;/span&gt; &lt;span class="ident"&gt;sum&lt;/span&gt; &lt;span class="punct"&gt;/&lt;/span&gt; &lt;span class="ident"&gt;n&lt;/span&gt;&lt;span class="punct"&gt;))&lt;/span&gt; &lt;span class="punct"&gt;/&lt;/span&gt; &lt;span class="punct"&gt;(&lt;/span&gt;&lt;span class="ident"&gt;n&lt;/span&gt; &lt;span class="punct"&gt;-&lt;/span&gt; &lt;span class="number"&gt;1&lt;/span&gt;&lt;span class="punct"&gt;))&lt;/span&gt;
      &lt;span class="ident"&gt;puts&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;Mean: %0.6f SDev: %0.6f&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;&lt;/span&gt; &lt;span class="punct"&gt;%&lt;/span&gt; &lt;span class="punct"&gt;[&lt;/span&gt;&lt;span class="ident"&gt;mean&lt;/span&gt;&lt;span class="punct"&gt;,&lt;/span&gt; &lt;span class="ident"&gt;sd&lt;/span&gt;&lt;span class="punct"&gt;])&lt;/span&gt;
    &lt;span class="keyword"&gt;end&lt;/span&gt;
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;

&lt;span class="comment"&gt;# http://blog.nicksieger.com/files/page.xml&lt;/span&gt;
&lt;span class="ident"&gt;page&lt;/span&gt; &lt;span class="punct"&gt;=&lt;/span&gt; &lt;span class="constant"&gt;File&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;open&lt;/span&gt;&lt;span class="punct"&gt;(&amp;quot;&lt;/span&gt;&lt;span class="string"&gt;page.xml&lt;/span&gt;&lt;span class="punct"&gt;&amp;quot;)&lt;/span&gt; &lt;span class="punct"&gt;{|&lt;/span&gt;&lt;span class="ident"&gt;f&lt;/span&gt;&lt;span class="punct"&gt;|&lt;/span&gt; &lt;span class="ident"&gt;f&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;read&lt;/span&gt; &lt;span class="punct"&gt;}&lt;/span&gt;

&lt;span class="constant"&gt;Benchmark&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;report&lt;/span&gt; &lt;span class="keyword"&gt;do&lt;/span&gt;
  &lt;span class="number"&gt;20&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;times&lt;/span&gt; &lt;span class="punct"&gt;{&lt;/span&gt; &lt;span class="ident"&gt;page&lt;/span&gt;&lt;span class="punct"&gt;.&lt;/span&gt;&lt;span class="ident"&gt;to_xs&lt;/span&gt; &lt;span class="punct"&gt;}&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;On Ruby and JRuby, this produces:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby to_xs.rb 
 21.430000   0.400000  21.830000 ( 22.022769)
 21.530000   0.360000  21.890000 ( 22.005737)
 21.540000   0.370000  21.910000 ( 22.065165)
 21.530000   0.370000  21.900000 ( 22.028591)
 21.500000   0.350000  21.850000 ( 21.990395)
 21.550000   0.370000  21.920000 ( 22.033164)
 21.520000   0.360000  21.880000 ( 21.984129)
 21.550000   0.370000  21.920000 ( 22.116802)
 21.550000   0.370000  21.920000 ( 22.051421)
 21.520000   0.380000  21.900000 ( 22.084736)
Mean: 22.038291 SDev: 0.041985

$ jruby -J-server to_xs.rb
 79.112000   0.000000  79.112000 ( 79.112000)
 81.480000   0.000000  81.480000 ( 81.481000)
 84.745000   0.000000  84.745000 ( 84.745000)
 84.384000   0.000000  84.384000 ( 84.384000)
121.933000   0.000000 121.933000 (121.933000)
 85.533000   0.000000  85.533000 ( 85.532000)
 82.762000   0.000000  82.762000 ( 82.763000)
 82.090000   0.000000  82.090000 ( 82.090000)
 81.298000   0.000000  81.298000 ( 81.299000)
 80.774000   0.000000  80.774000 ( 80.773000)
Mean: 86.411200 SDev: 12.635700
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Hmm, I must have accidentally swapped in some large program in the middle of that JRuby run. The perils of benchmarking on a desktop machine. I don&amp;#8217;t claim that the numbers are scientific, just illustrative!)&lt;/p&gt;

&lt;p&gt;Fortunately, the fix again is very simple, and has &lt;a href="http://groups.google.com/group/rubyjam/browse_thread/thread/82a9ddb762019bcc"&gt;previously&lt;/a&gt; &lt;a href="http://dev.rubyonrails.org/changeset/7773"&gt;been acknowledged&lt;/a&gt;. The latest (unreleased?) &lt;a href="http://code.whytheluckystiff.net/hpricot/" title="Hpricot, a fast and delightful HTML parser"&gt;Hpricot&lt;/a&gt; has a new native extension, &lt;code&gt;fast_xs&lt;/code&gt;, which is an almost drop-in replacement for the pure-ruby &lt;code&gt;String#to_xs&lt;/code&gt;. (Almost, because it creates the method &lt;code&gt;String#fast_xs&lt;/code&gt; instead of &lt;code&gt;String#to_xs&lt;/code&gt;. ActiveSupport 2.0.2 and later &lt;a href="http://dev.rubyonrails.org/browser/trunk/activesupport/lib/active_support/core_ext/string/xchar.rb?rev=7773"&gt;take care of aliasing it for you&lt;/a&gt;). Unbeknownst to me, I ported &lt;code&gt;fast_xs&lt;/code&gt; recently as part of upgrading JRuby extensions that have Java code in them. And so it happens to come in handy at this time. The patch for that is &lt;a href="http://code.whytheluckystiff.net/hpricot/ticket/131"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I have the latest Hpricot gems on my server, so you can install it yourself (for either Ruby or JRuby):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;gem install hpricot --source http://caldersphere.net
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;jruby -S gem install hpricot --source http://caldersphere.net
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With that installed, the script now produces these results:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby to_xs.rb
  0.460000   0.080000   0.540000 (  0.537793)
  0.420000   0.070000   0.490000 (  0.501965)
  0.430000   0.070000   0.500000 (  0.501359)
  0.400000   0.070000   0.470000 (  0.484495)
  0.400000   0.070000   0.470000 (  0.479995)
  0.400000   0.070000   0.470000 (  0.469118)
  0.390000   0.070000   0.460000 (  0.468864)
  0.390000   0.070000   0.460000 (  0.465009)
  0.390000   0.060000   0.450000 (  0.452902)
  0.390000   0.070000   0.460000 (  0.466881)
Mean: 0.482838 SDev: 0.024926

$ jruby -J-server to_xs.rb 
  0.882000   0.000000   0.882000 (  0.883000)
  0.832000   0.000000   0.832000 (  0.832000)
  0.851000   0.000000   0.851000 (  0.850000)
  0.837000   0.000000   0.837000 (  0.837000)
  0.846000   0.000000   0.846000 (  0.846000)
  0.843000   0.000000   0.843000 (  0.843000)
  0.835000   0.000000   0.835000 (  0.835000)
  0.825000   0.000000   0.825000 (  0.826000)
  0.830000   0.000000   0.830000 (  0.830000)
  0.834000   0.000000   0.834000 (  0.833000)
Mean: 0.841500 SDev: 0.016379
&lt;/code&gt;&lt;/pre&gt;</content>
  </entry>
</feed>
