224 System Administration far from their maternal sysadmin, who frequently dials up the system from home in the evening to burp it. Unix Systems Become Senile in Weeks, Not Years Unix was developed in a research environment where systems rarely stayed up for several days. It was not designed to stay up for weeks at a time, let alone continuously. Compounding the problem is how Unix utili- ties and applications (especially those from Berkeley) are seemingly devel- oped: a programmer types in some code, compiles it, runs it, and waits for it to crash. Programs that don’t crash are presumed to be running correctly. Production-style quality assurance, so vital for third-party application developers, wasn’t part of the development culture. While this approach suffices for a term project in an operating systems course, it simply doesn’t catch code-cancers that appear in production code that has to remain running for days, weeks, or months at a time. It’s not sur- prising that most major Unix systems suffer from memory leaks, garbage accumulation, and slow corruption of their address space—problems that typically only show themselves after a program has been running for a few days. The difficulty of attaching a debugger to a running program (and the impossibility of attaching a debugger to a crashed program) prevents inter- rogating a program that has been running for days, and then suddenly fails. As a result, bugs usually don’t get fixed (or even tracked down), and peri- odically rebooting Unix is the most reliable way to keep it from exhibiting Alzheimer’s disease. Date: Sat, 29 Feb 1992 17:30:41 PST From: Richard Mlynarik mly@lcs.mit.edu To: UNIX-HATERS Subject: And I thought it was the leap-year So here I am, losing with Unix on the 29th of February: % make -k xds sh: Bus error make: Fatal error: The command `date "+19%y 13 * %m + 32 * %d + 24 * %H + 60 * %M + p" | dc' returned status `19200'
Keeping Unix Running and Tuned 225 Compilation exited abnormally with code 1 at Sat Feb 29 17:01:34 I was started to get really worked-up for a flaming message about Unix choking on leap-year dates, but further examination—and what example of unix lossage does not tempt one into further, pointless, inconclusive, disheartening examination?—shows that the actual bug is that this machine has been up too long. The way I discovered this was when the ispell program told me: swap space exhausted for mmap data of /usr/lib/libc.so.1.6 is not a known word Now, in a blinding flash, it became clear that in fact the poor machine has filled its paging space with non-garbage-collected, non- compactible twinkie crumbs in eleven days, one hour, and ten min- utes of core-dumping, debugger-debugging fun. It is well past TIME TO BOOT! What’s so surprising about Richard Mlynarik’s message, of course, is that the version of Unix he was using had not already decided to reboot itself. You Can’t Tune a Fish Unix has many parameters to tune its performance for different require- ments and operating conditions. Some of these parameters, which set the maximum amount of some system resource, aren’t present in more advanced operating systems that dynamically allocate storage for most sys- tem resources. Some parameters are important, such as the relative priority of system processes. A sysadmin’s job includes setting default parameters to the correct values (you’ve got to wonder why most Unix vendors don’t bother setting up the defaults in their software to match their hardware con- figurations). This process is called “system tuning.” Entire books have been written on the subject. System tuning sometimes requires recompiling the kernel, or, if you have one of those commercial “open systems” that doesn’t give you the sources, hand-patching your operating fix with a debugger. Average users and sysadmins often never find out about vital parameters because of the poor documentation.