RailsConf 2007: Evan Weaver: Going Off Grid

Posted by Nick Sieger Fri, 18 May 2007 19:33:31 GMT

Evan is talking about leaving Rails as a full-stack framework and remixing bits and pieces for integration projects. He’s doing it in the context of a case study on Bio: a project at the University of Delaware working with DNA data in large SQL databases. Evan states that all of bioinformatics is an integration problem. (Me: That’s probably true of any research project where data is coming from multiple, varied sources. So where does Rails fit in this?)

So how do you cope with this? Use the Rails console as an admin interface, mapping AR onto the legacy schema.

Shadow (gem install shadow) is a REST-ful record server -- a small Mongrel handler that allows you to manipulate the database remotely. It uses dynamic ActiveRecord classes that are created and trashed for each request.

Parallelization -- uses the Sun 1 grid engine that distributes shell scripts across 128 nodes. Used for job and backend processing.

bioruby/bioperl/biopython -- bioinformatics libraries in other languages -- bioruby is not complete, but we still want to use Ruby, so he looked at ways of integrating Ruby with other languages. No RubyInline for Perl or Python, no up-to-date direct/C bindings. He ended up building a socket-level interface into python.

Admin tools to consider -- streamlined, active_scaffold, autoadmin, Django (manage.py inspectdb; manage.py syncdb; manage.py runserver). (Wow, come to RailsConf, get a Django demo. Unexpected surprise!)

Extending Rails -- has_many_polymorphs for easy creation directed graphs

Frustrating AR tidbits: has_many_through has a huge case statement, with sql strings everywhere, and tightly intertwined classes. Ugh.

Scaling big webapps: AR/SQL is not the way. Instead, go to a hyper-denormalized model, where the DB is just a big hash. This leads to things like berkeleydb, memcached, madeleine, etc. and MySQL just becomes a persistence store for memcache. One key is moving joins at write-time, so that reads don’t need to re-join associations. You’re essentially duplicating/caching the data out to each association, but this makes sharding/splitting of data easier. Example: Flickr user photos vs. photos placed in a group.

Evan doesn’t believe that SQL is a viable data store for webapps -- I think he means large-scale webapps. Not everyone who’s trying to build a web application will run into these kinds of issues, so your mileage may vary. Still, it’s refreshing to see more people rebel against the incumbent 30-year gorilla of SQL.

Tags ,  | no comments | no trackbacks

Comments

Trackbacks

Use the following link to trackback from your own site:
http://blog.nicksieger.com/articles/trackback/243