UFS: The Root of All Evil 265 UFS lives in a strange world where the computer’s hard disk is divided into three different parts: inodes, data blocks, and the free list. Inodes are point- ers blocks on the disk. They store everything interesting about a file—its contents, its owner, group, when it was created, when it was modified, when it was last accessed—everything, that is, except for the file’s name. An oversight? No, it’s a deliberate design decision. Filenames are stored in a special filetype called directories, which point to inodes. An inode may reside in more than one directory. Unix calls this a “hard link,” which is supposedly one of UFS’s big advantages: the ability to have a single file appear in two places. In practice, hard links are a debugging nightmare. You copy data into a file, and all of a sudden—sur- prise—it gets changed, because the file is really hard linked with another file. Which other file? There’s no simple way to tell. Some two-bit moron whose office is three floors up is twiddling your bits. But you can’t find him. The struggle between good and evil, yin and yang, plays itself out on the disks of Unix’s file system because system administrators must choose before the system is running how to divide the disk into bad (inode) space and good (usable file) space. Once this decision is made, it is set in stone. The system cannot trade between good and evil as it runs, but, as we all know from our own lives, too much or too little of either is not much fun. In Unix’s case when the file system runs out of inodes it won’t put new files on the disk, even if there is plenty of room for them! This happens all the time when putting Unix File Systems onto floppy disks. So most people tend to err on the side of caution and over-allocate inode space. (Of course, that means that they run out of disk blocks, but still have plenty of inodes left…) Unix manufacturers, in their continued propaganda to convince us Unix is “simple to use,” simply make the default inode space very large. The result is too much allocated inode space, which decreases the usable disk space, thereby increasing the cost per useful megabyte. UFS maintains a free list of doubly-linked data blocks not currently under use. Unix needs this free list because there isn’t enough online storage space to track all the blocks that are free on the disk at any instant. Unfortu- nately, it is very expensive to keep the free list consistent: to create a new file, the kernel needs to find a block B on the free list, remove the block from the free list by fiddling with the pointers on the blocks in front of and behind B, and then create a directory entry that points to the inode of the newly un-freed block. To ensure files are not lost or corrupted, the operations must be performed atomically and in order, otherwise data can be lost if the computer crashes
266 The File System while the update is taking places. (Interrupting these sorts of operations can be like interrupting John McEnroe during a serve: both yield startling and unpredictable results.) No matter! The people who designed the Unix File System didn’t think that the computer would crash very often. Rather than taking the time to design UFS so that it would run fast and keep the disk consistent (it is possible to do this), they designed it simply to run fast. As a result, the hard disk is usually in an inconsistent state. As long as you don’t crash during one of these moments, you’re fine. Orderly Unix shutdowns cause no problems. What about power failures and glitches? What about goonball technicians and other incompetent people unplugging the wrong server in the machine room? What about floods in the sewers of Chicago? Well, you’re left with a wet pile of noodles where your file system used to be. The tool that tries to rebuild your file system from those wet noodles is fsck (pronounced “F- sick,”) the file system consistency checker. It scans the entire file system looking for damage that a crashing Unix typically exacts on its disk. Usu- ally fsck can recover the damage. Sometimes it can’t. (If you’ve been hav- ing intermittent hardware failures, SCSI termination problems, and incomplete block transfers, frequently it can’t.) In any event, fsck can take 5, 10, or 20 minutes to find out. During this time, Unix is literally holding your computer hostage. Here’s a message that was forwarded to UNIX-HATERS by MLY it orig- inally appeared on the Usenet Newsgroup comp.arch in July 1990: Date: 13 Jul 9016:58:55 GMT From: aglew@oberon.crhc.uiuc.edu (Andy Glew)2 Subject: Fast Re-booting Newsgroups: comp.arch A few years ago a customer gave us a 30 second boot after power cycle requirement, for a real-time OS. They wanted 10. This DECstation 3100, with 16MB of memory, and an approxi- mately 300Mb local SCSI disk, took 8:19 (eight minutes and nine- teen seconds) to reboot after powercycle. That included fsck’ing the disk. Time measured from the time I flicked the switch to the time I could log in. 2Forwarded to UNIX-HATERS by Richard Mlynarik.
Previous Page Next Page