Nick Sieger: geekSessions I: Ruby on Rails: To Scale or Not to Scale http://blog.nicksieger.com/articles/2007/05/23/geeksessions-i-ruby-on-rails-to-scale-or-not-to-scale en-us 40 "geekSessions I: Ruby on Rails: To Scale or Not to Scale" by budowa domów <p>Good article</p> Tue, 29 May 2007 11:55:29 +0000 urn:uuid:2f0f7d2e-0ae3-4394-984e-cfd4efd03492 http://blog.nicksieger.com/articles/2007/05/23/geeksessions-i-ruby-on-rails-to-scale-or-not-to-scale#comment-257 "geekSessions I: Ruby on Rails: To Scale or Not to Scale" by Tomasz Gorski <p>Thanks for very interesting article&#46; btw&#46; I really enjoyed reading all of your posts&#46; It’s interesting to read ideas, and observations from someone else’s point of view… makes you think more&#46; So please keep up the great work&#46; Greetings</p> Sun, 27 May 2007 12:49:49 +0000 urn:uuid:3b12c94e-6890-42e2-8b73-303a19cadeb4 http://blog.nicksieger.com/articles/2007/05/23/geeksessions-i-ruby-on-rails-to-scale-or-not-to-scale#comment-256 geekSessions I: Ruby on Rails: To Scale or Not to Scale <p>I was fortunate to be in town right after RailsConf and attended the inaugural <a href="http://www.geeksessions.com/">geekSessions</a> event on Rails scalibility&#46; The event went off without a hitch: it was well attended, City Club is a classy place, and there was decent food and an open bar&#46; I don&#8217;t know the SF geek/startup scene, but pretty much all of the few guys I know were there along with a ton of other folks&#46; My only complaint would have been to let it run at least 30 minutes longer&#46; Socializing was good too, but it seemed like the conversation was just getting started&#46;</p> <p>Here are some notes for you in my typical rapid&#45;fire style &#45;&#45; hope they&#8217;re useful to you&#46;</p> <h2>Ian McFarland</h2> <p>Case study: divine caroline</p> <p>Servers:</p> <ul> <li>Load balancer</li> <li>Apache + mongrel</li> <li>MySQL</li> <li>SOLR</li> </ul> <p>Ruby is slow&#46; Rails is slow&#46; Unoptimized app was slow &#45;&#45; 7 pages/sec with <code>ab</code>&#46; So how can Rails possibly be? 150 pv/s with a simple text render&#46; This formed a sort of upper&#45;bound, that ruled out fragment/action/partial caching, etc&#46; This brought the throughput to 3500 pv/s&#46; Except for page caching limitations:</p> <ul> <li>Cache coherency</li> <li>Writes are more expensive</li> <li>Page caching is not applicable to as many pages as you think</li> </ul> <p>But measure first&#46; Pivotal built a drop&#45;in page caching extension to deal with cache coherency issues (soon to be at http://rubyforge&#46;org/projects/pivotalrb)</p> <h2>Jason Hoffman</h2> <p>Jason somehow has the distinction of the first four commits in the Rails repository&#46; Joyent/TextDrive/Strongspace&#46;</p> <p>If your application is successful, you&#8217;re going to have a lot of machines&#46; What happens when you have 1000s of machines, 100s of TB, 4 locations, etc&#46; Is this really a <em>Rails</em> issue? In a typical Joyent setup, Rails is only one of 26+ processes on the server stack&#46; So scaling it really doesn&#8217;t mean much more than scaling any application&#46; Object creation in Ruby is fast, sockets and threads are slow&#46; So forget sockets and threads&#46;</p> <p>Instead, use DNS, load balancers, evented mongrels, JRuby/Java, DBMSes (not just RDBMS; LDAP, filesystem, etc&#46;), Rails process doing Rails only, static assets going through a static server, federate and separate as much as you can&#46;</p> <h2>Jeremy LaTrasse</h2> <p>Jeremy&#8217;s job is about safety nets; about knowing the underlying infrastructure&#46; Is the hardware/OS/stack important? Can you build safety nets around those so that you can spare cycles when you need to intrude into the system to troubleshoot?</p> <p>Twitter is in a unique position with the volume of traffic to be able to find some pretty tough bugs, like the recent <a href="http://dev.rubyonrails.org/changeset/6571">backtrace issue</a>&#46;</p> <h2>Bryan Cantrill</h2> <p>Measure first! Like Ian said&#46; Is software information? Or a machine? It&#8217;s both&#46; Nothing else in human existence can claim this&#46; 3 weeks after Bryan joined Sun, he was working with Jeff (ZFS architect) debugging an issue when Jeff retorted, &#8220;Does it bother you that none of this exists? It&#8217;s just a representation of some plastic and metal morass in a backroom&#8221; (slightly paraphrased)&#46;</p> <p>We&#8217;ve been living with bifurcated code &#45;&#45; &#8220;if DEBUG; print something&#8221; ad nauseum&#46; But this has a cost&#46; So dev code deviates from production code&#46; But we can&#8217;t get the data we want, where it matters, in production&#46; Bryan goes on to describe the aforementioned <a href="http://dev.rubyonrails.org/changeset/6571">backtrace issue</a> and how it saved Twitter 33% CPU&#46; So don&#8217;t pre&#45;optimize, but you&#8217;ve got to be prepared to go get the data&#46; In production&#46;</p> <h2>Q &amp; A</h2> <p><em>What&#8217;s the best way to move from one database to two databases (MySQL), when you scale past the volume of reads that overwhelms one?</em></p> <p><strong>Jason</strong> doesn&#8217;t like the replication approach, it&#8217;s not fault tolerant&#46; Reference to <a href="http://drnicwilliams.com/2007/04/12/magic-multi-connections-a-facility-in-rails-to-talk-to-more-than-one-database-at-a-time/">Dr Nic&#8217;s magic multi&#45;connections gem</a>&#46; Reference to <a href="http://revolutiononrails.blogspot.com/2007/04/plugin-release-actsasreadonlyable.html">acts_as_readonly</a>&#46; Don&#8217;t rely on things that are out of your control, start reading/writing to multiple locations, at the application level&#46; <strong>Jeremy</strong>: So do you want to be in the business of writing SQL or C extensions to Rails? What about <a href="http://freshmeat.net/projects/mysql_proxy/">MySQL proxy</a>? Seems ok, but I might not trust it in production&#46; <a href="http://jeremy.zawodny.com/mysql/mytop/" title="mytop - a top clone for MySQL">MyTop</a>/<a href="http://www.xaprb.com/blog/2006/07/02/innotop-mysql-innodb-monitor/">InnoTop</a> will tell you about your query volume&#46;</p> <p><em>Virtualization: 4 virtual servers w/ web servers on top of a single physical server? Why?</em></p> <p><strong>Jason</strong>: Free BSD 4&#46;9 on early pentium was the perfect balance of utilization&#46; 18 CPUs by 64G RAM with virtual servers gets us back to that level of utilization&#46; <strong>Bryan</strong>: Not all virtualization solutions are equivalent! (Solaris containers/zones plug&#46;)</p> <p><em>RDBMSes are not good for web applications? Why? Can you give some examples?</em></p> <p><strong>Jason</strong>: It depends on when you want to join&#46; When people are clicking, or pre&#45;assembled&#46; Look at your application and put the data together before people request it&#46; Why does YouTube need an RDBMS? It serves a file that people can comment on&#46;</p> <p>Mention of Dabble DB, ZFS, Jabber, Atom, Atom over Jabber, etc&#46; as ways of innovative ways of storing objects, data, etc&#46; GData/GCal most certainly does not store its Atom files in an RDBMS&#46;</p> <p><em>Sell Rails apps and have the customer deploy it? What options are available?</em></p> <p><strong>Ian</strong>: JRuby on Rails with a &#46;war file is an interesting approach&#46; <em>What operational issues/ways to help with scaling remote deployments?</em> <strong>Jeremy</strong>: Log files are the first line of defense&#46; <strong>Jason</strong>: Corporate IT are comfortable with Java&#46;</p> <p><em>The pessimist in me says that my servers are going to fall over after 5 users&#46; How can I be prepared/not be optimistic about a traffic spike?</em></p> <p><strong>Ian</strong>: Load test the crap out of the app&#46; Find out the horizontal scaling point&#46; Use solutions like S3 for images&#46; Make sure you can scale by throwing hardware at it&#46; Eventually single points of failure will overcome you (such as a single database), but you can wait until you get to that point before doing something about it&#46;</p> <p><strong>Jason</strong>: You can benchmark your processes, and get an idea of what they can do&#46; Most people that want to do something will be look at your stuff, and maybe signup&#46; So front&#45;load and optimize your signup process, possibly by taking it out of Rails&#46;</p> <p><strong>Jeremy</strong>: Conversations with Zed, DHH, etc&#46; have pointed out that sometimes &#8220;Rails isn&#8217;t good at that, take it out of Rails&#46;&#8221; Same thing for the database&#46; Split those things out into a different application&#46;</p> <p><strong>Bryan</strong>: Do your dry land work, know your toolchain, so that when the moment comes, you can dive in and find the problem&#46;</p> <p><em>We have a migration that takes a week to run because of text processing&#46; GC was running after every 10th DB statement&#46; Used Rails bench GC patch to overcome the issue with the migration&#46; Any issue running these?</em></p> <p><strong>Jason</strong>: We run those GC modifications and a few more in production, and they&#8217;re fine&#46;</p> <p><em>Most comversations revolve around items like database is slow, or Ruby is slow&#46; How can we use DTrace to streamline the process?</em></p> <p><strong>Jeremy</strong>: We spent 20 minutes over lunch (plus some preparation) to find a Memcache issue&#46; It&#8217;s worth it to spend a little time to learn the tool&#46;</p> <p><strong>Bryan</strong>: &#8220;Awk is God&#8217;s gift to all of us&#46;&#8221; When DTrace was being reviewed inside of Sun, folks commented &#8220;This reminds us of awk&#46;&#8221; &#8220;Thanks!&#8221;</p> <p><strong>Jason</strong>: We&#8217;re putting a tracing plugin in Rails as a remote process to collect data from a running app&#46; Apple has shown a commitment to get this in Leopard&#46; Textual and graphical output are possible&#46; I believe in DTrace a lot, and the tooling and documentation will go beyond its current state of an experts tool&#46;</p> <p><em>Lastly, what one closing thing would you like to say about Rails scalability?</em></p> <p><strong>Ian</strong>: Measure&#46;<br/> <strong>Jason</strong>: Don&#8217;t use relational databases&#46;<br/> <strong>Jeremy</strong>: I thought it was a Joyent sales pitch&#46;<br/> <strong>Bryan</strong>: Use DTrace (with Joyent accelerators of course)&#46;<br/></p> Wed, 23 May 2007 05:51:36 +0000 urn:uuid:ab53b976-cff8-410d-8047-793abbb363a2 Nick Sieger http://blog.nicksieger.com/articles/2007/05/23/geeksessions-i-ruby-on-rails-to-scale-or-not-to-scale rails ruby http://blog.nicksieger.com/articles/trackback/254