Nick Sieger: RailsConf 2007: Evan Weaver: Going Off Griddo what you lovetag:blog.nicksieger.com,2005:TypoTypo2007-07-13T09:45:27+00:00Nick Siegerurn:uuid:f408e374-3a8c-4436-b31f-cf1d3fe7e4be2007-05-18T19:33:31+00:002007-07-13T09:45:27+00:00RailsConf 2007: Evan Weaver: Going Off Grid<p>Evan is talking about leaving Rails as a full-stack framework and remixing bits and pieces for integration projects. He’s doing it in the context of a case study on Bio: a project at the University of Delaware working with DNA data in large SQL databases. Evan states that all of bioinformatics is an integration problem. (Me: That’s probably true of any research project where data is coming from multiple, varied sources. So where does Rails fit in this?)</p>
<p>So how do you cope with this? Use the Rails console as an admin interface, mapping AR onto the legacy schema.</p>
<p>Shadow (<code>gem install shadow</code>) is a REST-ful record server – a small Mongrel handler that allows you to manipulate the database remotely. It uses dynamic ActiveRecord classes that are created and trashed for each request.</p>
<p>Parallelization – uses the Sun 1 grid engine that distributes shell scripts across 128 nodes. Used for job and backend processing.</p>
<p>bioruby/bioperl/biopython – bioinformatics libraries in other languages – bioruby is not complete, but we still want to use Ruby, so he looked at ways of integrating Ruby with other languages. No RubyInline for Perl or Python, no up-to-date direct/C bindings. He ended up building a socket-level interface into python.</p>
<p>Admin tools to consider – streamlined, active_scaffold, autoadmin, Django (<code>manage.py inspectdb; manage.py syncdb; manage.py runserver</code>). (Wow, come to RailsConf, get a Django demo. Unexpected surprise!)</p>
<p>Extending Rails – <code>has_many_polymorphs</code> for easy creation directed graphs</p>
<p>Frustrating AR tidbits: <code>has_many_through</code> has a huge case statement, with sql strings everywhere, and tightly intertwined classes. Ugh.</p>
<p>Scaling big webapps: AR/SQL is not the way. Instead, go to a hyper-denormalized model, where the DB is just a big hash. This leads to things like berkeleydb, memcached, madeleine, etc. and MySQL just becomes a persistence store for memcache. One key is moving joins at write-time, so that reads don’t need to re-join associations. You’re essentially duplicating/caching the data out to each association, but this makes sharding/splitting of data easier. Example: Flickr user photos vs. photos placed in a group.</p>
<p>Evan doesn’t believe that SQL is a viable data store for webapps – I think he means large-scale webapps. Not everyone who’s trying to build a web application will run into these kinds of issues, so your mileage may vary. Still, it’s refreshing to see more people rebel against the incumbent 30-year gorilla of SQL.</p>