RubyConf: Parting Thoughts

Posted by Nick Sieger Mon, 05 Nov 2007 17:57:34 GMT

RubyConf once again was thoroughly enjoyable. I highly recommend it to any Rubyist who is on the fence about attending to make it a priority to go next year. Here are some quick, random notes that didn’t quite fit into a full post.

  • For those of you who stopped by expecting to see the blow-by-blow of every minute of the conference like last year, my apologies. I think I set the bar a little too high for myself. It takes a lot of energy to stay focused on the sessions for the whole day. Perhaps it’s appropriate to pass the baton on to James Avery or Eric Mill for their 2007 coverage.
  • Venue (Omni Hotel Charlotte): Generally speaking, thumbs up. There were a couple of annoyances, though. 1. No non-emergency staircase to get to your room, causing huge lines for the elevators at the end of the afternoon. 2. Coffee was removed from the scene before 10 am, raising speculation that it was a conspiracy to drive business to the Starbucks in the mall below. 3. Toasters blew out the sound system on Sunday morning, forcing a PA system to be brought out and throwing a wrench in the rhythm of the morning talks.
  • I have to give props to Dr. Nic for avoiding getting burnt by the toaster incident and handling it really well. To boot, he gave one of the most entertaining talks at the conference, as the RubiGen video is sure to become an instant conference classic much like Adam Keys’ one-man-one-act event from last year.
  • Werewolf: I played one game, miserably. I was a werewolf, and when cornered by another in the game, mustered up the quote “I’m not an aggressive player, I prefer to feed off of other people.” Wow, what a freudian slip. While I can sympathize with Charlie’s comments about the game (and I do really enjoy late-night hackfests), I also have to agree with Chad and the other commenters that the two are not mutually exclusive, and the Werewolf games are wonderfully inclusive of RubyConf newbies and veterans alike.
  • The two-track approach in the afternoon this year seemed to go well, despite making it impossible to see all the talks. I would have liked to have seen Erik Hatcher’s Solr talk, but instead decided to give moral support to Kyle Maxwell’s JRuby in the Wild talk. I also missed the Saturday afternoon tracks to hang out in Stu’s Refactotum session.
  • Lots of good quotables: check out Nihilist and Twitter for some of the back-channel chatter.

See you next year!

Tags ,  | no comments

RubyConf Day 3: Behaviour-Driven Development with RSpec

Posted by Nick Sieger Sun, 04 Nov 2007 16:26:00 GMT

David Chelimsky and Dave Astels: RSpec

describe TestDriverDevelopment do
  it "is an incremental process"
  it "drives the implementation" 
  it "results in an exhaustive test suite"
  # but also...
  it "should focus on design"
  it "should focus on documentation"
  it "should focus on behaviour"
end

class BehaviourDrivenDevelopment < TestDrivenDevelopment
  include FocusOnDesign
  include FocusOnDocumentation
  include FocusOnBehavior
end

When doing test-driven development:

  • Write your intent first. The smallest test you can that fails.
  • Next, write the implementation. The simplest thing that could possibly work.
  • Even though you may be tempted to think about additional edge cases, multiple requirements, etc., you should try to be disciplined and focus only on the immediate tests. Only after you’ve made one test fail, then pass, can you continue on to other tests.

RSpec history

Initially BDD was just a discussion among Aslak Hellesoy and Dan North in the ThoughtWorks London office. Dave Astels joined the conversation with a blog post stating that he thought these ideas could be easily implemented in Smalltalk or Ruby. Steven Baker jumped in with an initial implementation, and released RSpec 0.1. Later in 2006, maintenance was handed over to David Chelimsky. RSpec has evolved through a dog-fooding phase up to the present 1.0 product.

BDD is no longer just about “should instead of assert”, it’s evolving into a process. Emphasizing central concepts from extreme programming and domain-driven design, it’s moving toward focusing on customer stories and acceptance testing. It’s outside-in, starting at high levels of detail, rather than low-level like RSpec or Test::Unit.

Story Runner

Story Runner is a new feature intended for RSpec 1.1. Each story is supposed to capture a customer requirement in the following general template:

As a (role) ... I want to (some function) ... so that (some business value).

It uses a “Scenario … Given … When … Then …” format to express the high level stories. Scenarios are a series of given items, steps, and behaviour validations. Once the basic steps are established, they can be re-used. David even demonstrated a preview of an in-browser story runner that would allow the customer to play with the implementation and create new scenarios.

Pending

Pending is a nice way to mark specs as “in-progress”. You can either omit a block for your spec, or use pending inside the block to leave a placeholder to come back to.

describe Pending do
  it "doesn't need a block to be pending"
  it "could also be specified inside the block" do
    pending("TODO")
    this.should_not be_a_failure
  end
  it "could also use a block with pending, and you will be notified when it starts to succeed" do
    pending("TODO") do
      this.should_not be_a_failure
    end
  end
end

Behaviour-Driven Development in Ruby with RSpec is a new book David and Aslak are working on, due out early next year.

Update: David has posted his slides.

Tags , ,  | no comments

RubyConf Day 2: Morning Sessions

Posted by Nick Sieger Sun, 04 Nov 2007 02:12:00 GMT

John Lam: IronRuby

Why IronRuby? John started with RubyCLR, which was a bridge between two languages/environments (.NET CLR and Ruby). Last year he didn’t know he’d be uprooting his family from Toronto and moving to Seattle. Now he finds himself in Microsoft trying to make sense of his new position. He describes a number of higher level goals for himself and IronRuby at Microsoft.

Change or die. Involvement in open source can only go up, right? The challenge is that the company is already doing well, so it’s hard to convince middle management that anything should change.

Open source. To their credit, the IronRuby team appears to be on the leading edge of open source at Microsoft (c.f Microsoft Public License). They also had planned all along to take external contributions, and have in fact started to receive them

Rails. One of the key goals is to be true to the language, and that includes being able to Run Rails.

Performance. Use IronRuby as a testbed for DLR performance testing.

John is showing the REPL now (running under Mono actually), pointing out that “integer math is now supported” (apparently early on someone pointed out that subtraction didn’t work) and that CLR list types automatically appear like Ruby arrays.

Heavy DLR pitch ahead. Performance history, how the CLR used to be slow for dynamic languages, and how it’s better now.

John is running the Rubinius specs now, and showing only 373 out of 1030 failing. (It looked like he was running the core specs only.) Praise for the Rubinius team!

It’s possible to bind C# types to Ruby using annotations. Lots of C# code being shown, including a mess of generated code.

John also showed a XAML/Silverlight demo that was scripted by Ruby.

Charles Nutter and Thomas Enebo: JRuby

JRuby: “Not Just” JRuby for the JVM. I found it hard to take notes for this talk since I’m so close to it. Fortunately, their slides were pretty verbose and comprehensive, and hopefully will be posted shortly.

Evan Phoenix: Rubinius

Rubinius talk in roller derby mode. Ask questions early and often.

What is the end game of Rubinius (or JRuby, or IronRuby)? Total. World. Domination. For Ruby!

Rubinius is 3 things: form, function, and elbow grease. Ruby::Syntax, Ruby::Behavior, and Google.search("crazy cs papers").

Rapid fire CS Nerd attack mode coming. Generational collection, bytecode execution, stackless, bytecode represenation, .rba archives.

Who would rather program C than Ruby? Java? C#? (Only one guy raised his hand that he’d rather code C.)

Hard-hitting portion of the talk. The kernel, broken down.

  • 1.8

    • 84,516 lines of C
    • 0 lines of Ruby
  • 1.9

    • 128,786 lines of C
    • 0 lines of Ruby
  • IronRuby

    • 48,282 lines of C#
    • 0 lines of Ruby
  • JRuby

    • 114,507 lines of Java
    • 0 lines of Ruby*

(*Even though I got heckled for saying it, JRuby does actually have some code written in Ruby that’s not the standard library.)

  • Rubinius
    • 25,398 lines of C
    • 13,946 lines of Ruby

1.8 and 1.9 are really Ruby for C programmers. JRuby is Ruby for Java programmers. IronRuby is Ruby for C# programmers. But Rubinius is Ruby for Ruby programmers.

Dogfooding. Gives feedback, which enables tighter loops, improves the kernel, makes life better for everyone on the platform.

Road, rubber, all that jazz. Evan mentions that Rubinius runs 24 of 31 benchmarks faster than Ruby 1.8, but the numbers are shifting rapidly. Evan wanted a 1.0 for RubyConf, but he has come to realize that several things are more important than a milestone. Design, and the technical challenges, certainly. But more importantly, the community.

Taking a cue from the Perl 6 community, -Ofun. The free-flowing commit bit, where patch sumbitters whose patches are accepted are immediately entitled commit rights, has given rise to 57 committers. 17 of these have changed more than 400 lines of code.

Tags ,  | 3 comments

RubyConf Day 1: Morning Sessions

Posted by Nick Sieger Fri, 02 Nov 2007 15:35:00 GMT

Marcel Molina: What Makes Code Beautiful?

What is beauty? Marcel explores this topic, starting with posing the question to the audience. “My wife!” Marcel: Why is she beautiful? “Longer answer than you want!”

Marcel comes from a literature/linguistic background, and is interested in how meaning is conveyed, but even beyond the basic words themselves, but the context and expressivity as well.

Note: Marcel has given this talk before.

History of beauty

Pythagoras: was out in the street, heard the blacksmith’s clanging hammer, and was drawn to the noise. He recognized, through closer inspection, that the different sounds that came from the different hammers had relationships, and eventually saw similar relationships in other parts of nature, architecture, and so on.

Thomas Aquinas: Three things that define beauty: 1. Proportion. The economy of size and ratio of parts. The smallest thing that works. 2. Integrity. Well-suited for the purpose. 3. Clarity. Clear and simple.

Each of the qualities are necessary, but none are sufficent. For example proportion (economy) will often clash with clarity. This is especially true in code.

Applied to software

Case study: coercion. Converting XML strings into rich Ruby equivalents. Marcel’s initial solution was a CoercibleString < String, which used a generator to iteratively try to coerce XML attributes to a number of types, and return the results. ~20 lines of code to convert to 4 types. His second version was a simple class method on String with a case statement.

Kent Beck, in his book Smalltalk: Best Practice Patterns, writes a book about writing good software, but in Marcel’s opinion, arrives at a definition of beauty by describing aspects of code that reflect proportion, integrity, and clarity.

Niels Bohr: “An expert is a person who has made all the mistakes that can be made in a very narrow field.” Marcel calls his CoercibleString a mistake, but one that helped him learn more about coding.

Luckily for us, Ruby is optimized for beauty.

Jim Weirich: Advanced Ruby Class Design

Emphasizing “Ruby” more so than “Advanced”, through three examples that illustrate techniques not commonly found in statically-typed OO languages (Java/C++/Eiffel).

Rake::FileList

FileList['lib/**/*.rb']

FileList sports globbing, a specialized to_s, and lazy evaluation. First version: class FileList < Array; end. Good idea, right? Well, with lazy evaluation, resolution of filenames happens only when the list is accessed, not created, so a lot of methods need to be overloaded:

def [](index)
  resolve unless @resolved
  super
end

The problem becomes that FileList too closely mimics Array, and cannot distinguish itself in the case that matters. So it was changed to delegate to array rather than inherit.

Moral: when you want to mimic built-in classes, it might be better to implement #to_ary or #to_str rather than inherit.

Builder::XmlMarkup

What’s the problem here?

  b = Builder::XmlMarkup.new
  b.student do
    b.name "Jim"
    b.phone_number "555-1234"
    b.class "Intro to Ruby"
  end
 end

class is already a method on Object. This begat BlankSlate, which removes unnecessary methods from Object. Several techniques were applied to eventually arrive at the latest version:

  • Use undef_method to hide methods that we don’t want. Except, leave methods beginning with double-underscore alone (__id__ and __send__).
  • Catch new methods added via a method_added hook on Kernel, and an append_features hook on Object, to deal with methods defined and modules included after BlankSlate was created

TableNode

Problem: magic conversion of Rails conditions to SQL. An example: User.find(:all).select{|u| u.name == "jim"}. We don’t really want to load the entire database to do this, but we don’t like writing SQL either.

Solution: Record the actions in the select block by yielding a special TableNode object that captures the method calls and translates to SQL on the fly. Now we can write User.select {|u| u.name == "Jim"} and have it still execute SQL

  • Capture methods called and wrap in a MethodNode to convert to SQL column references
  • Capture operators and wrap in a BinaryOpNode to handle ==, <, etc.

Clever! Will this work? Here are some issues:

  • Small issue – ordering: User.select {|u| "Jim" == u.name} will not work without messing with String#==.
  • Bigger issues: && and || are not override-able in Ruby. What’s worse, ! has pre-defined semantics (in the parser) and cannot be captured.

Lessons learned

  • Don’t be afraid to think beyond prior experiences to come up with new ways of solving problems in code.

Tags ,  | no comments

Rubyconf Wrap-Up

Posted by Nick Sieger Tue, 24 Oct 2006 22:51:55 GMT

Whew! Back home from my first RubyConf, it’s taken me a couple days to collect some parting thoughts. As you might have noticed, I was pretty busy last weekend.

First of all, what an awesome and welcoming community. It’s going to sound cliché, but there are so many intelligent and motivated people walking around that you can’t help but be inspired to roll up your sleeves and get your hands dirty.

There were definitely some high points for me. The beauty and power of the language, even after using it for almost two years, still amazes me. Pretty much every piece of code I saw, whether in a presentation or looking over someone’s shoulder, had a clear purpose and communicated its intent better than any general-purpose machine language I have seen. The simplicity of Evan’s new Ruby-in-ruby VM, the syntax integration tricks of John’s RubyCLR project, the forthcoming RubyOSA APIs, and Geoffrey’s graphics programs, are all great testaments to Ruby’s power.

There was an implementer’s summit on Friday night, which I attended (see also coverage here and here). There are now at least 8 active implementations of Ruby (Ruby, Yarv, JRuby, Cardinal, Rubinius, MetaRuby, Ruby.NET, IronRuby), and two interop bridges (RubyCLR and RubyCocoa)! The biggest news was that there are plans to revive the Ruby testing project (formerly the Rubicon) and share as many tests as possible among the implementations.

RejectConf was a huge success, due largely to the indefatigable Adam Keys. Kevin Tew has a decent wrap-up of the talks that occurred. Charlie’s demo of NetBeans in-place refactoring feature drew a couple oohs and ahs and even one f-bomb. Heckle, in time, should be an awesome tool as well. Big thanks to zenspider for coordinating it. It’s destined to become an annual tradition. Perhaps the organizers of future RubyConfs could account for it in the budget?

On a lighter note, there were quite a few humorous moments that kept popping up. A summary may read like a list of inside jokes, so here’s some context. THAT GUY is a reference to a disclaimer in Zed’s talk about the know-it-all guy who always pipes up during your talk with skepticism. THAT GUY kept getting called out during the rest of the conference. Ani, the developer evangelist from Microsoft was pretty thick-skinned. She was heckled constantly about MS, Vista, and everything else, and still kept a smile on her face. And of course you already watched Adam’s one-act play, right?

My note-taking streak wasn’t quite perfect; I didn’t take notes Kevin’s mkmf talk nor Rich’s talk about indi, and I slept in and missed Justin’s Streamlined talk. Also, the beer was flowing for RejectConf, and despite the quality summer of code talks, I was spent. Fortunately, you can fill in the blanks by following along with Curt Hibbs and the rest of the blogosphere. Thanks for tuning in, and I hope you got something worthwhile here. See you next year!

Tags ,  | 2 comments | no trackbacks

RubyConf: Your Ruby in My CLR

Posted by Nick Sieger Mon, 23 Oct 2006 14:16:00 GMT

John Lam wanted to build a photo-flash-card application using Avalon and Indigo and Flickr, but also using Ruby as the implementation language. So along the way he decided to build an interop layer (a bridge) between Ruby and the CLR to do it.

Now that John has joined Microsoft, his new mission (bigger picture) is to further dynamic language implementations on the CLR.

Bridging type systems

  • Dynamic methods in the CLR allow you to do better than simply invoking the reflection API.

    Ruby          |  C               |  CLR
    ============================================
    shadow class  |  dynamic method  |  instance
    
  • Polymorphic inline caching – caching method dispatches on different call sites based on the assumption that types don’t change that often

  • Generate shadow classes and method stubs using const_missing and method_missing
  • Overload resolution happens in the method shims (a one time cost) to choose, e.g., which constructor to use for System::Collections::ArrayList.new
  • Integration is done to make the CLR feel more Rubyish

Implementation

  • This changes identity (proxied object):

    ArrayList.new.as(IEnumerable)
    
  • This is less Rubyish, but identity is preserved:

    IEnumerable.get_enumerator(ArrayList.new)
    

There are trade-offs and warts to a bridge approach to Ruby integration on top of a platform such as the CLR: there is a need to inject artificial type information occasionally to be able to construct CLR objects (e.g., arrays – Array.of(Int32).new(3)). Generics are evil! Simple stuff doesn’t seem so bad: List.of(Int32).new, but there’s more pain to be had (see John for details). John also built a RubyInline-like implementation for the CLR languages too, to allow for getting things done (even if it’s dirty). Finally, method overloading is a problem, especially when there is no equivalent Ruby type – this gave way to instance_shim which is a sort of aliasing method that mixes in type metadata for use by the interop layer.

On the other hand, there are many places where Ruby (even in bridged mode) can make the experience of developing on the CLR better. Implementing CLR interfaces is a feature that allows Ruby objects to cross to the CLR side, (e.g., adding IEnumerable to Ruby Array). Performance across the CLR boundary (marshalling data) is ~100 times slower than C#, but still fast (3 million calls/second). Huge benefits are gained from using DSLs in Ruby to help with the implementation of the interop layer. Also, RubyCLR allows mixing in methods into CLR types, so we can re-skin APIs that feel clunky in Ruby. This is really leveraging the power of Ruby in the best possible way.

My take is that it looks like the RubyCLR project will probably not be seeing much further development, unless John finds a willing maintainer – but this is speculation, I haven’t confirmed with John. Yet, the problem of impedance matching between type systems is a recurring theme in the dynamic language arena, and so John’s work is valuable in helping us to understand this issue.

More Info

Posted in  | Tags ,  | 3 comments | no trackbacks

RubyConf: YARV on Rails

Posted by Nick Sieger Mon, 23 Oct 2006 14:15:00 GMT

Koichi SASADA

Update: corrected performance numbers – 20x, not 20%!

  • Got a job developing YARV at the University of Tokyo! He’s now employed at Akhihabara, Otaku City.
  • Member of Nihon-Ruby-no-Kai
  • Present at RubyKaigi 2006 (200 tickets sold in 3 hours). RubyKaigi 2007 will be June 9-10 (Saturday and Sunday).
  • Member of Nihon-Perl-no-Kai
  • Co-author of Perl book on Parrot in Japanese

DEMO

  • [Demo creating Rails app]
  • Create app with YARV (rails foobar)
  • Edit config/boot.rb, add a GC.disable line (there’s a bug to be fixed when he gets back to Japan).
  • Start up WEBrick, and it works!

Glossary

  • Rite: code name of Ruby 2.0 (a.k.a. vaporware!)
  • YARV: Yet Another Ruby VM

YARV

  • Supported by funds from IPA (now finished)
  • Simple stack machine, with VM instructions, a compiler and interpreter
  • Optimization techniques to improve performance
  • Open source

Optimizations include compile-time optimization, native threading, specialized instructions, instruction unification, inline method cache, and stack caching. YARV can build with configure/make, but doesn’t work with AC 2.6 (maybe you know why?). It passes most of the Ruby tests, but misses a few due to implementation differences.

[Koichi showed a demo controlling iTunes on Windows with Win32/OLE with YARV, and a native-threaded scenario in IRB.]

Myths of YARV: YARV is great! YARV will solve all problems! It makes Ruby programs go 50 times faster! It solves character issues! It finds your girlfriend!

Truths: YARV is for running Ruby programs, fast. It provides up to a 20x speed up for some algorithm benchmarks (Ackermann, Fib), but not for others [graphs shown]. You assemble and disassemble YARV instruction sequences, or serialize and de-serialize them. They are just Ruby literals, so they can be packed in YAML or some other human-readable format.

require 'yasm'
require 'yaml'

iseq = YASM.toplevel([:a, :b]) {|ib|
    ib.answer
    ib.leave
}

p iseq.to_a.to_yaml # => (gave a readable YAML view of the assembly)

Threading

  • Ruby thread is mapped 1:1 to a native thread
  • Supports POSIX and Win32
  • Many existing Ruby libraries are not synchronized at the C level, so many C libraries need synchronization added to them
  • Thread model 2: 1:1 mapping, with a Giant Lock (GL). Only the thread that has the lock can run. No need for sync, but no parallelism
  • Thread model 3: Ruby threads in parallel, but when thread-unsafe code is executing, GL needs to be obtained
  • Mutex class will become builtin
  • Thread.critical will vanish (not obsolete, but unsupported) [this was a controversial point for some – the comment was that it’s a near impossibility to keep it with a native threading model though, the two are in compatible]

Matz: 1.9.1 in 2007 Christmas, but Ruby 1.9.1 is also to be merged with YARV, so Koichi hopes to complete the merge by spring or summer 2007. Thread model 2 will need to be used to begin with.

Future

  • set_trace_func hook functions – what to do here? [It was suggested by Charles Nutter to remove it, to which Matz replied that we could as long as we have a good replacement debugging API.]
  • Catch up with Ruby 1.9
  • JIT/AOT compiler (AOT compiler started but incomplete)
  • Koichi also has a side project: high-performance Ruby, with the goal of making it easy to write performant code.

Links

More developers and testers are welcomed to the project!

Posted in  | Tags ,  | no comments | no trackbacks

RubyConf: Matz Keynote

Posted by Nick Sieger Sun, 22 Oct 2006 03:41:22 GMT

The Return of the Bikeshed

or Nuclear Plant in the Backyard

Ruby is

  • Scripting (some people outside of Japan don’t like this one)
  • Programming (sure, it is)
  • Lightweight (see Takahashi-san’s history – LL is a popular term in Japan)
  • Dynamic

The agile manifesto claims 4 values. It encourages developers to act responsibly. What if these values were applied to programming languages?

  • Individuals and interactions: language design should focus on users (i.e., developers).
  • Working software: language should encourage readability
  • Collaboration over contracts: expressive language, which helps communication.
  • Responding to change: language should embrace change

Thus, Ruby is:

The Agile Language

The Good, Bad, Ugly of Ruby

Good

  • Sweet language
  • RoR
  • Ruby people are nice (Martin Fowler – “because Matz is nice”)

Ugly

  • eval.c
  • parse.y

Bad

  • Ruby 2 vaporware – close to longest in open source (Rite > Perl6)

Bikeshed

What is the bikeshed? (FreeBSD people hate the bikeshed.)

People tend to argue about little things that they know enough to do so. The amount of noise related to change is inversely proportional to the complexity of the change. Thus leaving important things behind.

The nuclear plant is too complex, so we leave it up to experts to think about it. As a result, we spend time on unimportant things.

  • Symbol < String
  • #lines
  • removing private and protected

Thus leading to suggest that Ruby is a…

Fragile language

  • Ruby 1.8 is good enough! We’re not in a hurry.
  • Extreme arguing – why not make everything easy enough to be argued by everyone?

Design game

  • Gather wild ideas
  • Try to make it the best language ever? (We feel good about Ruby, but I don’t know why.)
  • Shed light on undefined corners of Ruby
  • Document a specification

RCRchive so far hasn’t worked out, because a few people took it too seriously.

  • Ruby will stay Ruby, but it shouldn’t be a vague idea. Rationale, analysis, discussion, with a prototype implementation
  • I (Matz) will stay the benevolent dictator, but will promise to be as open as possible.
  • 80-90% compatibility. It can break, but we have to keep the same philosophy that we love.

Hey, we need optional explicit typing in Ruby! So what!

Look at Python, it has a very organized (ahem, strict, something [laughter]) PEP system that seems to be working well. Mailing lists are more suitable for discussion. Use RCRchive as a starting point, but use a new system/channel for each proposal. Prototypes are needed. Running code accelerates the discussion.

Why should we start this game? I want to share language design with the community. I’m tired of the slow evolution of Ruby. We’re using 3-year-old technology. We need to catch up, so that people aren’t saying “Ruby, we had that language a long time ago!” Educating the developer community is a great venue for learning about language design.

I may be hit by a truck someday, in which case Ruby will be Ruby 1.8 forever! In that case you can do whatever you want with 1.9.

Considering a submission deadline – 2007-04-30, but it might not be needed. Classify proposals under GBU (Good, bad ugly), plus version (1.9 or 2.0). Implement them. Merge them. If it doesn’t work out, we’ll try another thing. We lose nothing but time. See you next year, hopefully with good news! Ruby 1.9.1 stable (but bleeding edge) will be out Christmas 2007.

Q & A

Q. Following the Agile tenets, perhaps we should submit tests and specs that document the feature? [Missed the answer, perhaps it was agreement on the part of Matz?]

Q. Maybe you should leave 1.8 for others to maintain? I’ll stay on 1.8 and there will be contributors to help with the new features going forward.

Q. You mentioned parse.y as ugly – how would the parser become more accessible? I have no concrete plan, but I’d like to create a simpler parser. Would you be open to non-BC parser changes? If someone raises their hand to make a new parser, I’ll discuss the compromises needed. What about participation of the VM design? (e.g., I like Smalltalk can I try that out?) I have no opposition to alternate implementations of 1.8 or 1.9, but it’s difficult to target 1.9 (moving target). Koichi took his role as chasing the moving target, and he suffers a lot. I’m sorry for him.

Q. With corporate support and alternative implementations, how worried are you about fragmentation? I don’t worry about it. I don’t have any trademark on Ruby or any business on it. It’s more like a competition – may the best interpreter win. Some will be fast, some will be stable, we can compare them.

Q. I like the aesthetic in Ruby. Can you push forward the beauty of the language, or are you happy as it is? I’m not sure how much room we have left. 1.8 may be good enough. During the discussion, we’ll see.

Tags ,  | no comments | no trackbacks

RubyConf: Natural language generation and processing in Ruby

Posted by Nick Sieger Sun, 22 Oct 2006 00:07:34 GMT

Speaker: Michael Granger

Michael’s talk was full of excellend pre-recorded video demos, and thus was difficult to note-take. Instead, here are links to most of the pieces of software he discussed for your perusal:

Tags ,  | 1 comment | no trackbacks

RubyConf: I18n, M17n, Unicode, and all that

Posted by Nick Sieger Sun, 22 Oct 2006 00:06:00 GMT

Tim Bray is going to talk about characters and strings. He will gladly talk about text until your arm falls off, and buy half the beer.

Introduction

English is no longer the majority language on the web. It’s nonsensical to ignore i18n issues with new apps.

This is probably a bug:

    /[a-zA-Z]+/

Problems to solve to help us with i18n

  • Identifying characters
  • Byte-character mapping and storage
  • A good string API

References

  • Worlds writing systems
  • Character model for the web (W3C)
  • The Unicode 5.0 Standard (forthcoming book) (same as ISO 10646)

Unicode

  • Numbers identified by code points (> 1,000,000)
  • 17 Planes each with 64k
  • Original characters (available in any computer anywhere before Unicode was invented) in Basic Multilingual Plane (BMP, first plane)
  • Characters identified as U + (4 hex digits)
  • Unicode character database

Benefits

  • Repertoire
  • Room for growth – lots of space left in the middle planes
  • Private use
  • Sane process –
  • Character database
  • Ubiquitous standards and tools

Difficulties

  • Combining forms – need to normalize the characters for comparison (1/2 vs. ½) and this is not something you want to do in your String#== method
  • Awkward historical compromises with other encodings
  • Han unification (note: Tim considers wikipedia article to be biased)– characters that might mean something different were given one codepoint by asian linguistic specialists.

Storage

  • Official: UTF-8, UTF-16, UTF-32
  • Practical: ASCII, EBCDIC, Shift-JIS, Big5, EUC-JP, EUC-KR, MS code pages, ISO-8859-*, etc.

But with Video the largest bandwidth eater, does text size really matter?

Identification

How to identify what text is coming in over the wire?

  • Guess – browsers, python lib
  • Charset headers, which are known to be wrong
  • Trust – two partners agree in advance for a pattern of exchange
  • XML [this last one was quite obvious right]

Language approaches

  • Java – design flaw; characters are UTF-16, which is unfortunate. Implementation is sound and well tested, but Java is clunky.
  • Perl 5 has excellent support in theory, but practically speaking, it’s difficult to round-trip text through a DB without some breakage.
  • Python has byte arrays and strings, some string-like methods on byte arrays, binary or text data, glosses over issue of plaform file encoding

Ruby

  • Some core string methods have i18n problems due to counting, regexp, equality and whitespace concerns.
  • String#each_char seems to be a missing method; string class maybe should be aware of its encoding.
  • Behavior of String#[] seems ok for byte buffers but it probably doesn’t need to be efficient for characters. Most of the use-cases for String iteration should be for characters (exception: Expat)
  • Case-changing methods – avoid them at all costs in a mixed language environment!
  • Regexps need unicode properties for safer matching (p{L} for lower-case letters, p{N} for numbers)
  • Does Ruby need a Character class or a Charset class?

What is next for Ruby? [Tune your divining rods toward ruby-talk for the rest of the story!] Matz has m17n; Julik, Manfred and crew now have ActiveSupport::MultiByte in Rails; JRuby is built on a platform that already has a Unicode string, so the discussion is heating up.

Update: slides available here.

Q & A

Q. What if I have a stream of bytes with no knowledge of encoding? Don’t try to impose an encoding lens above the level of a string, programmers want to treat a string as a string with associated methods.

Q. What if I need to change case of text? Get used to the fact that it won’t work reliably work. What about characterizing the finite amount of languages? Java does that, and it’s still not really possible. Shouldn’t the string class know the encoding? Couldn’t it optimize better? Couldn’t you raise an exception? Just don’t do it! Isn’t there a body of knowledge that could be acquired about case? Hmm, next question that isn’t about case!

Q. Is there a resource for edge cases of processing text in XML? Search for “xml test cases”. The decision is between ignoring the metadata provided, and choosing not to process it.

Q. What is the python library that can guess the charset? It’s in the feedvalidator suite.

Posted in  | Tags , ,  | 3 comments | no trackbacks

Older posts: 1 2