Why DTrace Makes Leopard a Must-Have Upgrade

Posted by Nick Sieger Tue, 05 Feb 2008 21:12:00 GMT

I feel like I’m actually a relative late-comer to Leopard, at least in my social circle. A lot of the folks in the Ruby community already had it installed the week after it was out, and were showing it off at RubyConf back in November. I just didn’t have a compelling reason to upgrade and disrupt my workflow at the time. Plus, mixed reports were coming out about data loss, UI nits, and other instabilities.

By the time I went out to purchase, 10.5.1 was already the version boxed in the stores, and in retrospect, it seemed worth the wait. I haven’t had a single complaint or major issue with the upgrade so far, and have been enjoying the noticeable zippiness of a freshly-installed system.

Time Machine has been a widely-publicized feature, and has been touted as one of the top reasons to upgrade. So I bought a small portable drive with some leftover holiday gift cards and set out to try it. Initially it seemed promising, except after a day or two of backups the process would stall out during the “preparing” stage. Eventually I noticed that the TM background process, backupd, was eating up 0.5GB of memory and up to 100% of one of the CPUs.

If I wasn’t a nerd making my living having my way with computers, I probably would have given up on Time Machine at this point, after a couple of hours scouring Google and the Apple discussion boards searching for similar problems. But I knew that backupd had to be doing something pathological, and I was compelled to find out what.

On Solaris systems, truss is usually the order of the day for problems like this. It literally vomits an endless listing of system calls invoked by a process into your terminal window. Except there’s no truss on OS X. Is there a replacement? Google mentioned ktrace, present on Tiger systems and earlier, but it’s gone in Leopard. Replaced by? DTrace.

Ahhh, DTrace! Another geeky Leopard-only feature. Certainly DTrace will be able to trace system calls in the same manner as truss. But being a complete DTrace newb, I had no idea where to start. So, like any lazy programmer does, I started shopping around for examples to get me started. Looking around, this article on MacTech looked promising, but didn’t have what I needed. Eventually, I ended up finding the DTrace Toolkit on the OpenSolaris site.

The DTrace Toolkit appears to be your one-stop shop for all things DTrace. If you need a kick-start reason to take a look at DTrace and get you going, this is it. In my case, lo and behold, one of the scripts included in the toolkit is called dtruss!

Many of the scripts in the toolkit are tailored towards a Solaris system, and dtruss is a prime example. It won’t quite work out of the box on Leopard, because a few of the system calls mentioned in the script are non-existent there. Changing the shebang line at the top of the script to #!/bin/bash, and repeatedly running it a few times with sudo ./dtruss -p <pid> will give you an idea of which ones; I simply commented these out until I was successfully able to trace a process.

Now, finally we can pop the stack back to my original problem with Time Machine and backupd. I launched a backup run and waited for the process to start consuming large amounts of CPU and memory. I located the PID of the process in Activity Monitor, and started tracing it with my modified dtruss script. And, sure enough, I saw the following output scrolling by endlessly in my terminal:

mmap(0x0, 0x5000, 0x3)     = 958464 0
getdirentriesattr(..., ..., ...)    = ....
munmap(0xEA000, 0x5000)    = 0 0
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93
getxattr(..., ..., 0x0)     = -1 Err#93

(The ellipsis were actual memory addresses, I didn’t save the output.) What was interesting is that the same chunk of memory (the first argument to getxattr) was floating by repeatedly. Looking at the man page for getxattr, the signature is:

ssize_t
getxattr(const char *path, const char *name, void *value,
         size_t size, u_int32_t position, int options);

So, the first argument contains the path. Now, how can I get the contents of that memory address? The answer is inside dtruss. Closer to the top of the script is this DTrace code:

/* print 3 args, arg0 as a string */
syscall::stat*:return, 
syscall::lstat*:return, 
syscall::open*:return
/* not on leopard -- syscall::resolvepath:return */
/self->start/
{
    /* calculate elapsed time */
    this->elapsed = timestamp - self->start;
    self->start = 0;
    this->cpu = vtimestamp - self->vstart;
    self->vstart = 0;
    self->code = errno == 0 ? "" : "Err#";

    /* print optional fields */
    OPT_printid  ? printf("%6d/%d:  ", pid, tid) : 1;
    OPT_relative ? printf("%8d ", vtimestamp/1000) : 1;
    OPT_elapsed  ? printf("%7d ", this->elapsed/1000) : 1;
    OPT_cpu      ? printf("%6d ", this->cpu/1000) : 1;

    /* print main data */
    printf("%s(\"%S\", 0x%X, 0x%X)\t\t = %d %s%d\n", probefunc,
        copyinstr(self->arg0), self->arg1, self->arg2, (int)arg0,
        self->code, (int)errno);
    OPT_stack ? ustack()    : 1;
    OPT_stack ? trace("\n") : 1;
    self->arg0 = 0;
    self->arg1 = 0;
    self->arg2 = 0;
}

I only had to add syscall::getxattr:return to the list of matched probes, and now I could finally inspect the path argument to getxattr:

munmap(0xEA000, 0x5000)      = 0 0
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002680.emlx\0",
0x967288D4, 0x0)         = -1 Err#93
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002681.emlx\0",
0x967288D4, 0x0)         = -1 Err#93
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002682.emlx\0",
0x967288D4, 0x0)         = -1 Err#93
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002683.emlx\0",
0x967288D4, 0x0)         = -1 Err#93
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002684.emlx\0",
0x967288D4, 0x0)         = -1 Err#93
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002685.emlx\0",
0x967288D4, 0x0)         = -1 Err#93
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002686.emlx\0",
0x967288D4, 0x0)         = -1 Err#93
getxattr("/Users/nicksieger/Library/Mail/IMAP-nicksieger@gmail.com
@imap.gmail.com/[Gmail]/All Mail.imapmbox/Messages/1002687.emlx\0",
0x967288D4, 0x0)         = -1 Err#93

D’oh! GMail! That directory has so many files, it took over two minutes just for ls to list all 487356 of them. Hundreds of thousands of email messages, all being re-inspected by Time Machine every time some new messages are added to the directory. I’ll leave it to someone else to point fingers at what the actual problem is here (a side-effect of TM’s usage of hard links? Mail.app inefficiently storing too many messages in a single directory?), but after all this I just decided that I didn’t want a backup of my GMail messages since they’re stored on the server. So I added the directory to the list of excluded directories in TM, wiped my backup, and started over. (TM had similar problems trying to complete an incremental backup with the existing backed-up copy of my mail on the backup disk, so I decided to wipe it and start fresh.) I won’t declare the problem completely solved yet, but if it happens again, I’ll just repeat this process to find the new culprit. Hopefully I don’t end up excluding my entire home directory!

This whole process was a revelation to me -- the fact that I could pinpoint the exact problem in a piece of system software despite having few notions of the internals of that software. The next time I have a nagging issue in Leopard, DTrace will be my tool of choice in tracking it down. Let’s just hope it’s not a problem with iTunes!

Tags , ,  | 9 comments