Nick Sieger: Tag dtrace tag:blog.nicksieger.com,2005:Typo Typo 2010-11-22T18:20:33+00:00 Nick Sieger urn:uuid:d7363cef-e4ec-4d34-9480-d35e841ed8bf 2008-02-05T21:12:00+00:00 2010-11-22T18:20:33+00:00 Why DTrace Makes Leopard a Must-Have Upgrade <p>I feel like I&#8217;m actually a relative late&#45;comer to Leopard, at least in my social circle&#46; A lot of the folks in the Ruby community already had it installed the week after it was out, and were showing it off at RubyConf back in November&#46; I just didn&#8217;t have a compelling reason to upgrade and disrupt my workflow at the time&#46; Plus, mixed reports were coming out about data loss, UI nits, and other instabilities&#46;</p> <p>By the time I went out to purchase, 10&#46;5&#46;1 was already the version boxed in the stores, and in retrospect, it seemed worth the wait&#46; I haven&#8217;t had a single complaint or major issue with the upgrade so far, and have been enjoying the noticeable zippiness of a freshly&#45;installed system&#46;</p> <p><a href="http://www.apple.com/macosx/features/timemachine.html">Time Machine</a> has been a widely&#45;publicized feature, and has been <a href="http://www.downloadsquad.com/2007/10/20/time-machine-leopards-best-feature/">touted as one of the top reasons to upgrade</a>&#46; So I bought a small portable drive with some leftover holiday gift cards and set out to try it&#46; Initially it seemed promising, except after a day or two of backups the process would stall out during the &#8220;preparing&#8221; stage&#46; Eventually I noticed that the TM background process, <code>backupd</code>, was eating up 0&#46;5GB of memory and up to 100% of one of the CPUs&#46;</p> <p>If I wasn&#8217;t a nerd making my living having my way with computers, I probably would have given up on Time Machine at this point, after a couple of hours scouring Google and the Apple discussion boards searching for <a href="http://discussions.apple.com/thread.jspa?messageID=6509623">similar</a> <a href="http://discussions.apple.com/thread.jspa?messageID=6386478">problems</a>&#46; But I knew that <code>backupd</code> had to be doing something pathological, and I was compelled to find out what&#46;</p> <p>On Solaris systems, <code>truss</code> is usually the order of the day for problems like this&#46; It literally vomits an endless listing of system calls invoked by a process into your terminal window&#46; Except there&#8217;s no <code>truss</code> on OS X&#46; Is there a replacement? Google mentioned <code>ktrace</code>, present on Tiger systems and earlier, but it&#8217;s gone in Leopard&#46; Replaced by? DTrace&#46;</p> <p>Ahhh, <em>DTrace</em>! Another geeky Leopard&#45;only feature&#46; Certainly DTrace will be able to trace system calls in the same manner as <code>truss</code>&#46; But being a complete DTrace newb, I had no idea where to start&#46; So, like any lazy programmer does, I started shopping around for examples to get me started&#46; Looking around, <a href="http://www.mactech.com/articles/mactech/Vol.23/23.11/ExploringLeopardwithDTrace/index.html">this article on MacTech</a> looked promising, but didn&#8217;t have what I needed&#46; Eventually, I ended up finding the <a href="http://www.opensolaris.org/os/community/dtrace/dtracetoolkit/">DTrace Toolkit</a> on the OpenSolaris site&#46;</p> <p>The DTrace Toolkit appears to be your one&#45;stop shop for all things DTrace&#46; If you need a kick&#45;start reason to take a look at DTrace and get you going, this is it&#46; In my case, lo and behold, one of the scripts included in the toolkit is called <code>dtruss</code>!</p> <p>Many of the scripts in the toolkit are tailored towards a Solaris system, and dtruss is a prime example&#46; It won&#8217;t quite work out of the box on Leopard, because a few of the system calls mentioned in the script are non&#45;existent there&#46; Changing the shebang line at the top of the script to <code>#!/bin/bash</code>, and repeatedly running it a few times with <code>sudo ./dtruss -p &lt;pid&gt;</code> will give you an idea of which ones; I simply commented these out until I was successfully able to trace a process&#46;</p> <p>Now, finally we can pop the stack back to my original problem with Time Machine and <code>backupd</code>&#46; I launched a backup run and waited for the process to start consuming large amounts of CPU and memory&#46; I located the PID of the process in Activity Monitor, and started tracing it with my modified dtruss script&#46; And, sure enough, I saw the following output scrolling by endlessly in my terminal:</p> <pre><code>mmap(0x0, 0x5000, 0x3) = 958464 0 getdirentriesattr(..., ..., ...) = .... munmap(0xEA000, 0x5000) = 0 0 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 getxattr(..., ..., 0x0) = -1 Err#93 </code></pre> <p>(The ellipsis were actual memory addresses, I didn&#8217;t save the output&#46;) What was interesting is that the same chunk of memory (the first argument to <code>getxattr</code>) was floating by repeatedly&#46; Looking at the man page for <code>getxattr</code>, the signature is:</p> <pre><code>ssize_t getxattr(const char *path, const char *name, void *value, size_t size, u_int32_t position, int options); </code></pre> <p>So, the first argument contains the path&#46; Now, how can I get the contents of that memory address? The answer is inside <code>dtruss</code>&#46; Closer to the top of the script is this DTrace code:</p> <pre><code>/* print 3 args, arg0 as a string */ syscall::stat*:return, syscall::lstat*:return, syscall::open*:return /* not on leopard -- syscall::resolvepath:return */ /self-&gt;start/ { /* calculate elapsed time */ this-&gt;elapsed = timestamp - self-&gt;start; self-&gt;start = 0; this-&gt;cpu = vtimestamp - self-&gt;vstart; self-&gt;vstart = 0; self-&gt;code = errno == 0 ? "" : "Err#"; /* print optional fields */ OPT_printid ? printf("%6d/%d: ", pid, tid) : 1; OPT_relative ? printf("%8d ", vtimestamp/1000) : 1; OPT_elapsed ? printf("%7d ", this-&gt;elapsed/1000) : 1; OPT_cpu ? printf("%6d ", this-&gt;cpu/1000) : 1; /* print main data */ printf("%s(\"%S\", 0x%X, 0x%X)\t\t = %d %s%d\n", probefunc, copyinstr(self-&gt;arg0), self-&gt;arg1, self-&gt;arg2, (int)arg0, self-&gt;code, (int)errno); OPT_stack ? ustack() : 1; OPT_stack ? trace("\n") : 1; self-&gt;arg0 = 0; self-&gt;arg1 = 0; self-&gt;arg2 = 0; } </code></pre> <p>I only had to add <code>syscall::getxattr:return</code> to the list of matched probes, and now I could finally inspect the path argument to getxattr:</p> <pre><code>munmap(0xEA000, 0x5000) = 0 0 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002680.emlx\0", 0x967288D4, 0x0) = -1 Err#93 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002681.emlx\0", 0x967288D4, 0x0) = -1 Err#93 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002682.emlx\0", 0x967288D4, 0x0) = -1 Err#93 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002683.emlx\0", 0x967288D4, 0x0) = -1 Err#93 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002684.emlx\0", 0x967288D4, 0x0) = -1 Err#93 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002685.emlx\0", 0x967288D4, 0x0) = -1 Err#93 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002686.emlx\0", 0x967288D4, 0x0) = -1 Err#93 getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com @imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002687.emlx\0", 0x967288D4, 0x0) = -1 Err#93 </code></pre> <p>D&#8217;oh! GMail! That directory has so many files, it took over two minutes just for <code>ls</code> to list all 487356 of them&#46; Hundreds of thousands of email messages, all being re&#45;inspected by Time Machine every time some new messages are added to the directory&#46; I&#8217;ll leave it to someone else to point fingers at what the actual problem is here (a side&#45;effect of TM&#8217;s usage of hard links? Mail&#46;app inefficiently storing too many messages in a single directory?), but after all this I just decided that I didn&#8217;t want a backup of my GMail messages since they&#8217;re stored on the server&#46; So I added the directory to the list of excluded directories in TM, wiped my backup, and started over&#46; (TM had similar problems trying to complete an incremental backup with the existing backed&#45;up copy of my mail on the backup disk, so I decided to wipe it and start fresh&#46;) I won&#8217;t declare the problem completely solved yet, but if it happens again, I&#8217;ll just repeat this process to find the new culprit&#46; Hopefully I don&#8217;t end up excluding my entire home directory!</p> <p>This whole process was a revelation to me &#45;&#45; the fact that I could pinpoint the exact problem in a piece of system software despite having few notions of the internals of that software&#46; The next time I have a nagging issue in Leopard, DTrace will be my tool of choice in tracking it down&#46; Let&#8217;s just hope it&#8217;s not a <a href="http://blogs.sun.com/ahl/entry/mac_os_x_and_the">problem with iTunes</a>!</p>