geekSessions I: Ruby on Rails: To Scale or Not to Scale
Posted by Nick Sieger Wed, 23 May 2007 05:51:36 GMT
I was fortunate to be in town right after RailsConf and attended the inaugural geekSessions event on Rails scalibility. The event went off without a hitch: it was well attended, City Club is a classy place, and there was decent food and an open bar. I don’t know the SF geek/startup scene, but pretty much all of the few guys I know were there along with a ton of other folks. My only complaint would have been to let it run at least 30 minutes longer. Socializing was good too, but it seemed like the conversation was just getting started.
Here are some notes for you in my typical rapid-fire style -- hope they’re useful to you.
Case study: divine caroline
- Load balancer
- Apache + mongrel
Ruby is slow. Rails is slow. Unoptimized app was slow -- 7 pages/sec with
ab. So how can Rails possibly be? 150 pv/s with a simple text render. This formed a sort of upper-bound, that ruled out fragment/action/partial caching, etc. This brought the throughput to 3500 pv/s. Except for page caching limitations:
- Cache coherency
- Writes are more expensive
- Page caching is not applicable to as many pages as you think
But measure first. Pivotal built a drop-in page caching extension to deal with cache coherency issues (soon to be at http://rubyforge.org/projects/pivotalrb)
Jason somehow has the distinction of the first four commits in the Rails repository. Joyent/TextDrive/Strongspace.
If your application is successful, you’re going to have a lot of machines. What happens when you have 1000s of machines, 100s of TB, 4 locations, etc. Is this really a Rails issue? In a typical Joyent setup, Rails is only one of 26+ processes on the server stack. So scaling it really doesn’t mean much more than scaling any application. Object creation in Ruby is fast, sockets and threads are slow. So forget sockets and threads.
Instead, use DNS, load balancers, evented mongrels, JRuby/Java, DBMSes (not just RDBMS; LDAP, filesystem, etc.), Rails process doing Rails only, static assets going through a static server, federate and separate as much as you can.
Jeremy’s job is about safety nets; about knowing the underlying infrastructure. Is the hardware/OS/stack important? Can you build safety nets around those so that you can spare cycles when you need to intrude into the system to troubleshoot?
Twitter is in a unique position with the volume of traffic to be able to find some pretty tough bugs, like the recent backtrace issue.
Measure first! Like Ian said. Is software information? Or a machine? It’s both. Nothing else in human existence can claim this. 3 weeks after Bryan joined Sun, he was working with Jeff (ZFS architect) debugging an issue when Jeff retorted, “Does it bother you that none of this exists? It’s just a representation of some plastic and metal morass in a backroom” (slightly paraphrased).
We’ve been living with bifurcated code -- “if DEBUG; print something” ad nauseum. But this has a cost. So dev code deviates from production code. But we can’t get the data we want, where it matters, in production. Bryan goes on to describe the aforementioned backtrace issue and how it saved Twitter 33% CPU. So don’t pre-optimize, but you’ve got to be prepared to go get the data. In production.
Q & A
What’s the best way to move from one database to two databases (MySQL), when you scale past the volume of reads that overwhelms one?
Jason doesn’t like the replication approach, it’s not fault tolerant. Reference to Dr Nic’s magic multi-connections gem. Reference to acts_as_readonly. Don’t rely on things that are out of your control, start reading/writing to multiple locations, at the application level. Jeremy: So do you want to be in the business of writing SQL or C extensions to Rails? What about MySQL proxy? Seems ok, but I might not trust it in production. MyTop/InnoTop will tell you about your query volume.
Virtualization: 4 virtual servers w/ web servers on top of a single physical server? Why?
Jason: Free BSD 4.9 on early pentium was the perfect balance of utilization. 18 CPUs by 64G RAM with virtual servers gets us back to that level of utilization. Bryan: Not all virtualization solutions are equivalent! (Solaris containers/zones plug.)
RDBMSes are not good for web applications? Why? Can you give some examples?
Jason: It depends on when you want to join. When people are clicking, or pre-assembled. Look at your application and put the data together before people request it. Why does YouTube need an RDBMS? It serves a file that people can comment on.
Mention of Dabble DB, ZFS, Jabber, Atom, Atom over Jabber, etc. as ways of innovative ways of storing objects, data, etc. GData/GCal most certainly does not store its Atom files in an RDBMS.
Sell Rails apps and have the customer deploy it? What options are available?
Ian: JRuby on Rails with a .war file is an interesting approach. What operational issues/ways to help with scaling remote deployments? Jeremy: Log files are the first line of defense. Jason: Corporate IT are comfortable with Java.
The pessimist in me says that my servers are going to fall over after 5 users. How can I be prepared/not be optimistic about a traffic spike?
Ian: Load test the crap out of the app. Find out the horizontal scaling point. Use solutions like S3 for images. Make sure you can scale by throwing hardware at it. Eventually single points of failure will overcome you (such as a single database), but you can wait until you get to that point before doing something about it.
Jason: You can benchmark your processes, and get an idea of what they can do. Most people that want to do something will be look at your stuff, and maybe signup. So front-load and optimize your signup process, possibly by taking it out of Rails.
Jeremy: Conversations with Zed, DHH, etc. have pointed out that sometimes “Rails isn’t good at that, take it out of Rails.” Same thing for the database. Split those things out into a different application.
Bryan: Do your dry land work, know your toolchain, so that when the moment comes, you can dive in and find the problem.
We have a migration that takes a week to run because of text processing. GC was running after every 10th DB statement. Used Rails bench GC patch to overcome the issue with the migration. Any issue running these?
Jason: We run those GC modifications and a few more in production, and they’re fine.
Most comversations revolve around items like database is slow, or Ruby is slow. How can we use DTrace to streamline the process?
Jeremy: We spent 20 minutes over lunch (plus some preparation) to find a Memcache issue. It’s worth it to spend a little time to learn the tool.
Bryan: “Awk is God’s gift to all of us.” When DTrace was being reviewed inside of Sun, folks commented “This reminds us of awk.” “Thanks!”
Jason: We’re putting a tracing plugin in Rails as a remote process to collect data from a running app. Apple has shown a commitment to get this in Leopard. Textual and graphical output are possible. I believe in DTrace a lot, and the tooling and documentation will go beyond its current state of an experts tool.
Lastly, what one closing thing would you like to say about Rails scalability?
Jason: Don’t use relational databases.
Jeremy: I thought it was a Joyent sales pitch.
Bryan: Use DTrace (with Joyent accelerators of course).