Ruby and XML not-so-simple?

Posted by Nick Sieger Thu, 02 Nov 2006 02:12:00 GMT

Update: Koz already fixed the issue in trunk, and the changes are also going into the 1.2 release as well. Thanks!

Man, I think I’ve been reading too much Sam Ruby lately (ok, that was a year ago, but not much has changed). You have to admit, though, that XML handling in Ruby is one of those things that just doesn’t feel quite right. REXML is pretty much the standard API for Ruby, yet it suffers from two showstoppers in my opinion:

  • In Ruby 1.8.4 it still has the glaring hole Sam mentioned last year with well-formedness. (No exception raised below!)

    irb(main):001:0> require 'rexml/document'
    => true
    irb(main):002:0> d = REXML::Document.new '<div>at&t'
    => <UNDEFINED> ... </>
    irb(main):003:0> d.root
    => <div> ... </>
    irb(main):004:0> d.root.text
    => "at&t"
    
  • The REXML::Text#to_s method violates the principle of least surprise. In just about every other XML parser written, when you ask a text node for its contents, it returns you the value with entities resolved. Not so Text#to_s. You have to call Text#value instead. Unfortunately, this would be difficult to reverse in future versions of REXML without breaking existing apps.

    irb(main):001:0> require 'rexml/document'
    => true
    irb(main):002:0> t = REXML::Text.new('at&t')
    => "at&t"
    irb(main):003:0> t.to_s
    => "at&amp;t"
    irb(main):004:0> t.value
    => "at&t"
    

This second problem manifests itself in subtle ways. If you’re calling Element#text (which is probably the most common way), you’re fine, because it implicitly does self.texts.first.value under the hood. But if you want to make sure you’re grabbing all the text content, you might be inclined to write element.texts.join('') to concatenate them together. But this method bypasses the value method and instead uses to_s, leaving you with unresolved entities.

It turns out this problem is exhibited in the version of XmlSimple now included with Edge Rails as of rev 4453. So if you’re living on the edge using the newly minted ActiveResource fetching XML from remote resources like a champion, you just got benched as soon as you tried to fetch XML that had normalized entities inside.

XmlSimple version 1.0.9 has a partial fix for this issue, but I submitted another patch to Maik Schmidt for review that he subsequently released as 1.0.10. I’ve attached the 1.0.10 version to ticket 6532 in hopes that it will be patched in Rails soon.

Posted in ,  | Tags ,  | no comments | no trackbacks

Visualization of Ruby's Grammar

Posted by Nick Sieger Fri, 27 Oct 2006 16:48:00 GMT

As part of the momentum surrounding the Ruby implementer’s summit, I have decided to take on a pet project to understand Ruby’s grammar better, with the goal of contributing to an implementation-independent specification of the grammar. Matz mentioned during his keynote how parse.y was one of the uglier parts of Ruby, but just how ugly?

Well, judge for yourself. Below is a grammar dependency graph generated using ANTLRWorks and GraphViz. The steps I took are as follows. I took parse.y, stripped all C definitions, code and actions from it to give a bare YACC definition. Next, I did the equivalent of gsub(/[kt]([A-Z]+)/, '1') (since ANTLR’s convention is to have lexer tokens named starting with a capital letter). I then used the Bison-to-ANTLR converter to generate an ANTLR 2.x grammar, which I hand-modified to produce a v3 grammar. Opening the resulting grammar in ANTLRWorks allows you to generate a DOT file from which GraphViz can then generate a jpeg image. I’ve also included visualizations of the Java 1.5 and Javascript (ECMAScript) grammars for comparison.

I haven’t even begun to absorb all the meanings from this picture, but one stark difference between Ruby and the other two is the node in the middle of the picture with a high concentration of outgoing edges. That node is called primary in the grammar definition, and it is probably one of the reasons that Ruby syntax is so flexible and forgiving. A primary node’s direct children apparently represent a large portion of the syntax, and explain why in Ruby a single statement can either be a literal, a method invocation (or series of them), a standalone expression (such as a < b), all the way up to larger syntactic groupings such as if ... else ... end and begin ... rescue ... end, among many others.

Ruby

Ruby 1.8.4 grammar dependency graph

Java 1.5

Generated from Java 1.5 grammar on antlr.org

Java 1.5 grammar dependency graph

Javascript

Generated from ECMAScript grammar on antlr.org

ECMAScript (Javascript) grammar dependency graph

Posted in  | Tags  | 34 comments | no trackbacks

RubyConf: Your Ruby in My CLR

Posted by Nick Sieger Mon, 23 Oct 2006 14:16:00 GMT

John Lam wanted to build a photo-flash-card application using Avalon and Indigo and Flickr, but also using Ruby as the implementation language. So along the way he decided to build an interop layer (a bridge) between Ruby and the CLR to do it.

Now that John has joined Microsoft, his new mission (bigger picture) is to further dynamic language implementations on the CLR.

Bridging type systems

  • Dynamic methods in the CLR allow you to do better than simply invoking the reflection API.

    Ruby          |  C               |  CLR
    ============================================
    shadow class  |  dynamic method  |  instance
    
  • Polymorphic inline caching -- caching method dispatches on different call sites based on the assumption that types don’t change that often

  • Generate shadow classes and method stubs using const_missing and method_missing
  • Overload resolution happens in the method shims (a one time cost) to choose, e.g., which constructor to use for System::Collections::ArrayList.new
  • Integration is done to make the CLR feel more Rubyish

Implementation

  • This changes identity (proxied object):

    ArrayList.new.as(IEnumerable)
    
  • This is less Rubyish, but identity is preserved:

    IEnumerable.get_enumerator(ArrayList.new)
    

There are trade-offs and warts to a bridge approach to Ruby integration on top of a platform such as the CLR: there is a need to inject artificial type information occasionally to be able to construct CLR objects (e.g., arrays -- Array.of(Int32).new(3)). Generics are evil! Simple stuff doesn’t seem so bad: List.of(Int32).new, but there’s more pain to be had (see John for details). John also built a RubyInline-like implementation for the CLR languages too, to allow for getting things done (even if it’s dirty). Finally, method overloading is a problem, especially when there is no equivalent Ruby type -- this gave way to instance_shim which is a sort of aliasing method that mixes in type metadata for use by the interop layer.

On the other hand, there are many places where Ruby (even in bridged mode) can make the experience of developing on the CLR better. Implementing CLR interfaces is a feature that allows Ruby objects to cross to the CLR side, (e.g., adding IEnumerable to Ruby Array). Performance across the CLR boundary (marshalling data) is ~100 times slower than C#, but still fast (3 million calls/second). Huge benefits are gained from using DSLs in Ruby to help with the implementation of the interop layer. Also, RubyCLR allows mixing in methods into CLR types, so we can re-skin APIs that feel clunky in Ruby. This is really leveraging the power of Ruby in the best possible way.

My take is that it looks like the RubyCLR project will probably not be seeing much further development, unless John finds a willing maintainer -- but this is speculation, I haven’t confirmed with John. Yet, the problem of impedance matching between type systems is a recurring theme in the dynamic language arena, and so John’s work is valuable in helping us to understand this issue.

More Info

Posted in  | Tags ,  | 3 comments | no trackbacks

RubyConf: YARV on Rails

Posted by Nick Sieger Mon, 23 Oct 2006 14:15:00 GMT

Koichi SASADA

Update: corrected performance numbers -- 20x, not 20%!

  • Got a job developing YARV at the University of Tokyo! He’s now employed at Akhihabara, Otaku City.
  • Member of Nihon-Ruby-no-Kai
  • Present at RubyKaigi 2006 (200 tickets sold in 3 hours). RubyKaigi 2007 will be June 9-10 (Saturday and Sunday).
  • Member of Nihon-Perl-no-Kai
  • Co-author of Perl book on Parrot in Japanese

DEMO

  • [Demo creating Rails app]
  • Create app with YARV (rails foobar)
  • Edit config/boot.rb, add a GC.disable line (there’s a bug to be fixed when he gets back to Japan).
  • Start up WEBrick, and it works!

Glossary

  • Rite: code name of Ruby 2.0 (a.k.a. vaporware!)
  • YARV: Yet Another Ruby VM

YARV

  • Supported by funds from IPA (now finished)
  • Simple stack machine, with VM instructions, a compiler and interpreter
  • Optimization techniques to improve performance
  • Open source

Optimizations include compile-time optimization, native threading, specialized instructions, instruction unification, inline method cache, and stack caching. YARV can build with configure/make, but doesn’t work with AC 2.6 (maybe you know why?). It passes most of the Ruby tests, but misses a few due to implementation differences.

[Koichi showed a demo controlling iTunes on Windows with Win32/OLE with YARV, and a native-threaded scenario in IRB.]

Myths of YARV: YARV is great! YARV will solve all problems! It makes Ruby programs go 50 times faster! It solves character issues! It finds your girlfriend!

Truths: YARV is for running Ruby programs, fast. It provides up to a 20x speed up for some algorithm benchmarks (Ackermann, Fib), but not for others [graphs shown]. You assemble and disassemble YARV instruction sequences, or serialize and de-serialize them. They are just Ruby literals, so they can be packed in YAML or some other human-readable format.

require 'yasm'
require 'yaml'

iseq = YASM.toplevel([:a, :b]) {|ib|
    ib.answer
    ib.leave
}

p iseq.to_a.to_yaml # => (gave a readable YAML view of the assembly)

Threading

  • Ruby thread is mapped 1:1 to a native thread
  • Supports POSIX and Win32
  • Many existing Ruby libraries are not synchronized at the C level, so many C libraries need synchronization added to them
  • Thread model 2: 1:1 mapping, with a Giant Lock (GL). Only the thread that has the lock can run. No need for sync, but no parallelism
  • Thread model 3: Ruby threads in parallel, but when thread-unsafe code is executing, GL needs to be obtained
  • Mutex class will become builtin
  • Thread.critical will vanish (not obsolete, but unsupported) [this was a controversial point for some -- the comment was that it’s a near impossibility to keep it with a native threading model though, the two are in compatible]

Matz: 1.9.1 in 2007 Christmas, but Ruby 1.9.1 is also to be merged with YARV, so Koichi hopes to complete the merge by spring or summer 2007. Thread model 2 will need to be used to begin with.

Future

  • set_trace_func hook functions -- what to do here? [It was suggested by Charles Nutter to remove it, to which Matz replied that we could as long as we have a good replacement debugging API.]
  • Catch up with Ruby 1.9
  • JIT/AOT compiler (AOT compiler started but incomplete)
  • Koichi also has a side project: high-performance Ruby, with the goal of making it easy to write performant code.

Links

More developers and testers are welcomed to the project!

Posted in  | Tags ,  | no comments | no trackbacks

RubyConf: I18n, M17n, Unicode, and all that

Posted by Nick Sieger Sun, 22 Oct 2006 00:06:00 GMT

Tim Bray is going to talk about characters and strings. He will gladly talk about text until your arm falls off, and buy half the beer.

Introduction

English is no longer the majority language on the web. It’s nonsensical to ignore i18n issues with new apps.

This is probably a bug:

    /[a-zA-Z]+/

Problems to solve to help us with i18n

  • Identifying characters
  • Byte-character mapping and storage
  • A good string API

References

  • Worlds writing systems
  • Character model for the web (W3C)
  • The Unicode 5.0 Standard (forthcoming book) (same as ISO 10646)

Unicode

  • Numbers identified by code points (> 1,000,000)
  • 17 Planes each with 64k
  • Original characters (available in any computer anywhere before Unicode was invented) in Basic Multilingual Plane (BMP, first plane)
  • Characters identified as U + (4 hex digits)
  • Unicode character database

Benefits

  • Repertoire
  • Room for growth -- lots of space left in the middle planes
  • Private use
  • Sane process --
  • Character database
  • Ubiquitous standards and tools

Difficulties

  • Combining forms -- need to normalize the characters for comparison (1/2 vs. ½) and this is not something you want to do in your String#== method
  • Awkward historical compromises with other encodings
  • Han unification (note: Tim considers wikipedia article to be biased)-- characters that might mean something different were given one codepoint by asian linguistic specialists.

Storage

  • Official: UTF-8, UTF-16, UTF-32
  • Practical: ASCII, EBCDIC, Shift-JIS, Big5, EUC-JP, EUC-KR, MS code pages, ISO-8859-*, etc.

But with Video the largest bandwidth eater, does text size really matter?

Identification

How to identify what text is coming in over the wire?

  • Guess -- browsers, python lib
  • Charset headers, which are known to be wrong
  • Trust -- two partners agree in advance for a pattern of exchange
  • XML [this last one was quite obvious right]

Language approaches

  • Java -- design flaw; characters are UTF-16, which is unfortunate. Implementation is sound and well tested, but Java is clunky.
  • Perl 5 has excellent support in theory, but practically speaking, it’s difficult to round-trip text through a DB without some breakage.
  • Python has byte arrays and strings, some string-like methods on byte arrays, binary or text data, glosses over issue of plaform file encoding

Ruby

  • Some core string methods have i18n problems due to counting, regexp, equality and whitespace concerns.
  • String#each_char seems to be a missing method; string class maybe should be aware of its encoding.
  • Behavior of String#[] seems ok for byte buffers but it probably doesn’t need to be efficient for characters. Most of the use-cases for String iteration should be for characters (exception: Expat)
  • Case-changing methods -- avoid them at all costs in a mixed language environment!
  • Regexps need unicode properties for safer matching (p{L} for lower-case letters, p{N} for numbers)
  • Does Ruby need a Character class or a Charset class?

What is next for Ruby? [Tune your divining rods toward ruby-talk for the rest of the story!] Matz has m17n; Julik, Manfred and crew now have ActiveSupport::MultiByte in Rails; JRuby is built on a platform that already has a Unicode string, so the discussion is heating up.

Update: slides available here.

Q & A

Q. What if I have a stream of bytes with no knowledge of encoding? Don’t try to impose an encoding lens above the level of a string, programmers want to treat a string as a string with associated methods.

Q. What if I need to change case of text? Get used to the fact that it won’t work reliably work. What about characterizing the finite amount of languages? Java does that, and it’s still not really possible. Shouldn’t the string class know the encoding? Couldn’t it optimize better? Couldn’t you raise an exception? Just don’t do it! Isn’t there a body of knowledge that could be acquired about case? Hmm, next question that isn’t about case!

Q. Is there a resource for edge cases of processing text in XML? Search for “xml test cases”. The decision is between ignoring the metadata provided, and choosing not to process it.

Q. What is the python library that can guess the charset? It’s in the feedvalidator suite.

Posted in  | Tags , ,  | 3 comments | no trackbacks

RubyConf: Nathaniel Talbott: Open Classes, Open Companies

Posted by Nick Sieger Sat, 21 Oct 2006 18:54:00 GMT

Update: Full text of Nathaniel’s talk.

Nathaniel is honored to be speaking at RubyConf for the sixth year in a row. His basic premise for today is this: the language you’ve chosen has certain characteristics that reflect back on you. In turn we can reflect on those characteristics and how they relate to what we value in business as well.

Dynamism

Ruby’s typing system (contrast to static typing). I’m thinking about my program, not satisfying the type checker. Static typing sounds good in theory but I haven’t found it that helpful in practice.

How about dynamism in business? How about job titles. If you stick a person in a “box”, they pretty much stay in it. It’s easy for HR, but not good for the growth of the business. In Ruby we have duck-typing, but a fixed job title is kind of like static typing. It makes it harder for you to adapt to the situation at hand, or keeps you from growing into new roles you never thought you could do. Nathaniel mentions how he, as a programmer, never thought he could be good at sales, but now he likes it.

Interpretation

The behavior of the program is order-dependent (contextual). Once you embrace that Ruby interprets statements as it encounters them, you find power in it. It allows you to delay decisions until they’re necessary, a feature that supports dynamic business.

Succinctness

A stated design goal for ruby (by Matz) is succinctness, but not just brevity for brevity’s sake (e.g., extremely compact Perl), but in a read/write sense.

A succinct legal agreement is wise use of the limited time a client has to figure out what he’s signing up for. Legal agreements should have one purpose -- to set expectations so that no one is surprised.

Maybe your employment agreement is so long and full of legalese that you’ve inadvertently signed up as a corporate slave. Be wary; ask for a summary; consider how it might affect your ability to contribute to open source projects.

Reflection

Ruby the language permits and even encourages a high degree of introspection. Does your workplace reward introspection, or penalize it? Or is it a “fire and forget” environment?

Open classes

Ruby allows any code to open any class, add/remove code, muck with state. Could we break things everywhere? Yes. But Ruby values implementor power over safety or security through obscurity. This can be scary for some.

When people close to you know a lot about you, they can hurt you. You’re vulnerable. But the other side of this is that being open builds trust, which is an essential component of a successful business relationship.

Do you know what your peers are paid? Do you know what deals are in the pipeline? Can you look at the books?

Ruby’s open nature may seem naive, but it’s wonderfully naive.

Discussion

Q. Would these techniques work outside of a technical hub? My business is not driven by locale. I’m able to assign work in a distributed manner.

Q. You assume that your peers are trustworthy and honest. What about in a more cutthroat competitive environment? You either need to change your organization or “change your organization”, i.e., switch jobs or try to effect change to find a place that reflects your values.

Q. Transparency meets the real world. Crashing software and crashing companies are two different things -- perhaps the analogy can only be taken so far? The worst that could happen is that I get nothing for the time I invest other than the learning experience. It’s more a matter of limiting risk. I’m not saying I’m broadcasting every detail to every client.

Q. Do you scope clearly and bid, or do you do ongoing hourly work? I only do time and materials. RFPs make me gag. People spend time on the piece of paper and then give it more credence than my own opinion.

Q. How does testing play into this? One thing that Ruby values is constant feedback, which is also a big tenet of testing. Constant and short feedback cycles, “testing your customer” -- don’t get caught in a long-term agreement.

Q. How do you put that in a statement of work? I make sure the client knows that we’re building a trusting relationship. I also don’t claim to have this all figured out.

Q. Can you scale this (team of trustworthy people) to a larger model? This is in a way like the evergreen question about Ruby; “does it scale?” Partly, I don’t care because it’s working for me now. On the other hand, I’m thinking about it because I like the thought of bringing more people into the fold. [A comment from the audience brought up the subject of lifestyle companies as an example of why it wouldn’t.]

Q. There are two types of contracts -- the 2-page business agreements and the 50-page CYA edge cases. True, I have the luxury of working with clients that small enough that they are not likely to sue me.

Q. If everyone is a sub-contractor, there is no career path or “carrot”. Part of the goal of my business is to catapult people beyond it by allowing them enough free time to work on their own things.

Q. How do you balance work and life? I’m not always the best at it, but now I’m working out of my house and can be with my family when things come up. Some days I shut the door and get things done. Work/life balance is messy and I work on it on a daily basis.

Posted in  | Tags ,  | 2 comments | no trackbacks

RubyConf: John Long: Radiant CMS

Posted by Nick Sieger Sat, 21 Oct 2006 02:54:00 GMT

What is Radiant?

  • No-fluff, lightweight CMS for small teams
  • Simplicity over features
  • A little more than a blogging engine
  • Made for designers and programmers (techies)
  • Tag-based template language
  • Total control over output
  • Plugin extension mechanism under development
  • Content sites, not portal software

Getting Started

  • Install: gem install radiant
  • Generate a new application: mkdir demo && cd demo && radiant .
  • Configure database: cp config/database.sqlite.yml config/database.yml

Installation types

  • Instance mode vs. application mode (whether or not you have Rails source present). Instance mode also makes it possible to clone and share customize Radiant applications.
  • Base application includes an admin interface

Pages, Snippets and Layouts

  • Hierarchical page management
  • Create page in several states (draft, reviewed, published) in Textile or Markdown with slug, breadcrumb, and layout
  • Snippets are small chunks of content that can be shared between pages (to DRY up your content)
  • Layouts that can be broken down into components of the layout (sidebar, etc.)

Tags

  • Radius
  • <r:title/>, <r:content part="sidebar" inherit="true"/>, <r:snippet name="footer"/>, <r:if_content part="extended">...</r:if_content>, <r:children:each limit="5" order="desc">...</r:children:each>
  • Tags can be embedded anywhere, not just in the layout
  • Tags are contextual -- e.g., <r:title/> picks the correct title even if it is embedded within a snippet within multiple pages
  • Custom tags possible with “behavior”

Text Filters

  • Textile (RedCloth), Markdown (BlueCloth), SmartyPants
  • Vanilla HTML (no filtering)
  • Custom filters possible. [This is my own example below, not John’s.]
        class MyFilter < TextFilter::Base
      register 'profanity filter'
      def filter(txt)
        txt.gsub(/(damn|ass|shit)/i, '####')
      end
    end
    

Radiant is powering the new ruby-lang.org site. Overall, a slick, well-thought out, polished, extendable CMS done very much in the philosophy of Ruby and Rails. Check it out!

Posted in ,  | Tags ,  | no comments | no trackbacks

RubyConf: Graphics with Ruby

Posted by Nick Sieger Fri, 20 Oct 2006 20:18:00 GMT

Geoff Grosenbach is waxing on pagefuls of numbers condensed into a small, tidy graph that increases the amount of information you can communicate on a page.

Topfunky fox
The cartoon fox makes his second appearance of the day in one of Geoff’s sparklines

It is in our hands, the hands of the programmer, to show designers what the range of visual representation capabilities are.

Libraries

  • Scruffy -- SVG graphs, no dependencies (i.e., RMagick)
  • GNUPlot -- the old standby
  • MRPlot -- scientific plots
  • PNG -- line and font drawing in pure ruby from Seattle.rb
  • Gruff -- depends on RMagick
  • Sparklines -- depends on RMagick
  • [Ploticus and RRD were mentioned during the talk as well]

Applications

  • Automatically generating image mastheads with a font mask, a gradient and a cloud image showing through the mask
  • UrbanDrinks.com -- plotting bars on a timeline showing happy hours in Portland
  • “Scene graph” -- rendering multiple layers of images/icons on the filesystem into a composite graphics “scene”
  • BillMonk.com -- rendered image with multiple components that allows you to circumvent cross-site or crippled javascript issues
  • In a Rails controller:
    • Generate and cache (with caches_page) an image with text
    • Register a new mime type and use responds_to {|type| type.jpg { ... } }
  • Requisite reference to Edward Tufte

Techniques

  • Comparisons -- show two competing trendlines on a graph
  • Multivariate analysis -- stockhive.com stock chart rendering
  • Content is king -- be judicious

Posted in  | Tags ,  | no comments | no trackbacks

RubyConf: Sydney and Rubinius

Posted by Nick Sieger Fri, 20 Oct 2006 19:07:00 GMT

Update: Evan has posted code and has a page set up for the project.

Evan Phoenix (nee Webb), of Seattle.rb, is presenting on Sydney and Rubinius, an experiment in improving the ruby interpreter. Sydney has died, and Rubinius has risen from its ashes, appropriately.

Why

  • Why would you write a new Ruby interpreter? It’s fun, it’s a good challenge.
  • What’s wrong with the existing interpreter -- are you hating on Matz? Of course not.

Today’s Ruby interpreter is like a big dump truck -- sometimes a little slow, but it works for us. YARV is like the red, shiny fire truck. Both big and complex. Rubinius, by comparison, is like a dune buggy. Fast, light, but you’re going to get sand in your eyes if you drive it a lot.

The project, admittedly, is naive.

  • Simple architecture and implementation.
  • As little background magic as possible
  • No opaque C backend
  • Leverage axiom of simple == powerful
  • Less magic means more introspection
    • More control for the developer
    • Richer introspection: Backtrace, MethodTable objects

What was Sydney?

  • Giant patch to 1.8.2 that included reentrancy and thread-safety
  • Turned out to be a major PITA
  • CRuby uses a large number of C globals, references to which had to be tracked and fixed

Transition to Rubinius

  • Ruby borrowed a lot from Smalltalk, so why not try an implementation based on the same concepts?
  • Prototype A ported the blue-book implementation to Ruby
  • It worked and validated the basic concept and approach
  • Prototype B took ideas from A but implemented a bytecode interpreter and compiler. Used RubyInline to access raw memory operations.
  • At this time the goal emerged to have a translator which could take a prototype and bootstrap itself into C code.
  • Prototype S was a manual translation of Prototype B into C code to make the implementation quicker.
  • Prototype W was created to translate parts of Prototype B so that there is a maintainable core in Ruby code itself.

Questions

Q. Since you were starting over, could you use a platform-independent library to ease the process, such as APR? Yes -- currently using String and PointerArray from glib.

Q. How is performance? Too early to tell -- I hope to know by the end of the conference. Prototype S became runnable and usable on the plane here.

Q. Can you clarify the goal? To create a Ruby interpreter in Ruby that can translate itself out into a C interpreter.

Q. Have you figured out how to link in external libraries in a platform independent way? No. My hope is that the decision will be made to write a common framework for translating to system calls, e.g., SWT.

Q. Have you looked at PyPy? (similar project for Python) Yes, and it’s f-in complicated. It worries me actually.

Q. Could you have it generate backend code in another language/platform (Java bytecode, CLR)? Yes, I certainly hope so, otherwise I’m wasting my time.

Q. How will you add native thread support in a cross-platform way? I hope I won’t have to, by leveraging external tools.

Q. If you’re building a Ruby-to-C translator, why write a Ruby interpreter at all? If I didn’t, what would I translate? You still need some core engine to translate. Would it be a subset of Ruby? Yes.

Q. Looks very similar to Squeak, have you looked at Squeak code and talked to Squeak people? Looked at code a lot, I’ve really stolen all of their ideas. I haven’t talked to the folks yet because I’m afraid they might laugh at me.

Resulting Works

  • SydneyParser: Used parser from Sydney and stole ParseTree’s algorithm for generating a sexp that represents the Ruby code.
  • SegfaultProtection: detects a segfault in an extension, saves the Ruby interpreter, and raises a memory fault exception instead.

The Nitty Gritty (Red Pill)

  • All components separated by APIs for swappability
  • Garbage collector: baker two-space copy collector, and a train GC
  • Bytecode interpreter: small set of instructions driven by tests and need, so there are no extraneous operations
  • Compiler: written completely in Ruby, using ParseTree and SexpProcessor. Intended to compile itself to be used as a base compiler for Prototype S.

Future

  • Other backends -- Java, Smalltalk

More questions

Q. Worried about fragmentation? Yes, but I really want to make it as compatible as possible with the current interpreter.

Q. Rubinius bytecode compatibile with YARV? No, but I hope to be able to write a bridge to YARV in Rubinius.

Q. Have you looked at Valgrind for the C code? Yes, I have. Good possibility for future direction.

Q. Can you demo some code? They’re incredibly boring. “Look I got a MethodTable object, I asked for one.”

Posted in  | Tags , ,  | 5 comments | no trackbacks

RubyConf: History of Ruby

Posted by Nick Sieger Fri, 20 Oct 2006 19:06:00 GMT

Takahashi-san is here to present on the history of Ruby, an apparently thankless task, because none of the other original Rubyists are historians. Takahashi is the co-author of two Japanese books on Ruby, Enjoy Ruby and Ruby Recipe Book. He also has the “Takahashi method” of presentation named after him. His talk presented an informative timeline of Ruby, the details of which were a bit tricky to capture. If I transcribed anything erroneously, please let me know.

Pre-history age

  • Born 24th of February 1993. Without code!
  • Matz and Keiju-san proposed the name first.
  • Thus one of the philosophies of Ruby came to be -- that the name of things matters. Matz: “I guess Ruby is cool”. Keiju: “I also like coral”. Matz: “oops”.

Ancient age

  • Ruby is in public -- release 21 December 1995 -- ruby-0.95.
  • ruby-list ML was launched. First mail: ruby-0.95 test failed. Subsequently 3 versions of Ruby were released in two days.
  • No CVS repository at the time. Anonymous CVS was to come in 1999.
  • 25 December 1996 -- Ruby 1.0 released.
  • 1 July 1997: Matz announces that Netlab hired him to be a full-time Ruby developer.
  • 22 Septempber 1997: an article was published on Ruby -- the first article on the web about Ruby.
  • 15 May 1998: RAA launched, maintained manually by Matz.
  • 7 December 1998: Ruby home page was in English, but very simple.

Middle

  • Ruby is spreading in Japan during this time. The community is growing around Japanese programming language designers and programmers who do not understand English. Finally they have a tool that they can embrace and establish their own opinions and choices.
  • 27 October 1999: Matz and Keiju’s book is published, the first Ruby book
  • More Ruby books would follow in 2001-2002 (~20 books -- a bubble). But the bubble popped in 2003.
  • 4 November 1999: Ruby workshop
  • There were some Perl and Ruby/Perl conferences during this time also.
  • 26 May 2001: YARPC -- Yet Another Ruby and Perl Conference
  • 9 August 2003: Lightweight language (LL) -- lightweight language workshop (LL Saturday) in 2003. PHP, Perl, Ruby and Python were present. LL Weekend, LL Day and Night, and LL Ring would follow in 2004-2006.
  • LL Ring: 300 attendees talking about LLs in a real boxing ring.

Modern

  • Ruby spreads outside of Japan
  • 16 Feb 2002 -- ruby-talk ML surpasses ruby-list ML.
  • ruby-talk was started in December of 1998, but the first posts are almost all Japanese authors writing in English.
  • SunWorld in Februrary 1999 has an article entitled “New choices for scripting” including Ruby.
  • February 2000: IBM Developerworks article on the “latest open source gem from Japan”.
  • InformIT article by Matz also in 2000.
  • 15 December 2001: Programming Ruby by the Pragprogs (1st edition of the Pickaxe).
  • RubyConf.new(2001)
  • Ruby Kaigi -- first Japanese Ruby conference didn’t happen until 2006, it turns out only because of a dinner of Japanese rubyists at RubyConf 2005 decided that it would be fun.

Contemporary

  • Rails age -- the killer application for Ruby
  • We all know what happened, so we’ll skip this part.

Posted in  | Tags ,  | 2 comments | no trackbacks

Older posts: 1 2 3 4