232 System Administration One can almost detect an emergent intelligence, as in “Colossus: The Forbin Project.” Unix managed to purge from itself the documents that prove it’s buggy. Unix’s method for updating the data and pointers that it stores on the disk allows inconsistencies and incorrect pointers on the disk as a file is being created or modified. When the system crashes before updating the disk with all the appropriate changes, which is always, the file system image on disk becomes corrupt and inconsistent. The corruption is visible during the reboot after a system crash: the Unix boot script automatically runs fsck to put the file system back together again. Many Unix sysadmins don’t realize that inconsistencies occur during a sys- tem dump to tape. The backup program takes a snapshot of the current file system. If there are any users or processes modifying files during the backup, the file system on disk will be inconsistent for short periods of time. Since the dump isn’t instantaneous (and usually takes hours), the snapshot becomes a blurry image. It’s similar to photographing the Indy 500 using a 1 second shutter speed, with similar results: the most important files—the ones that people were actively modifying—are the ones you can’t restore. Because Unix lacks facilities to backup a “live” file system, a proper backup requires taking the system down to its stand-alone or single-user mode, where there will not be any processes on the system changing files on disk during the backup. For systems with gigabytes of disk space, this translates into hours of downtime every day. (With a sysadmin getting paid to watch the tapes whirr.) Clearly, Unix is not a serious option for applica- tions with continuous uptime requirements. One set of Unix systems that desired continuous uptime requirements was forced to tell their users in /etc/motd to “expect anomalies” during backup periods: SunOS Release 4.1.1 (DIKUSUN4CS) #2:Sun Sep 22 20:48:55 MET DST 1991 --- BACKUP PLAN ---------------------------------------------------- Skinfaxe: 24. Aug, 9.00-12.00 Please note that anomalies can Freja & Ask: 31. Aug, 9.00-13.00 be expected when using the Unix Odin: 7. Sep, 9.00-12.00 systems during the backups. Rimfaxe: 14. Sep, 9.00-12.00 Div. Sun4c: 21. Sep, 9.00-13.00 -------------------------------------------------------------------- 1This message is reprinted without Keith Bostic’s permission, who said “As far as I can tell, [reprinting the message] is not going to do either the CSRG or me any good.” He’s right: the backups, made with the Berkeley tape backup program, were also bad.
Disk Partitions and Backups 233 Putting data on backup tapes is only half the job. For getting it back, Berke- ley Unix blesses us with its restore program. Restore has a wonderful interactive mode that lets you chdir around a phantom file system and tag the files you want retrieved, then type a magic command to set the tapes spinning. But if you want to restore the files from the command line, like a real Unix guru, beware. Date: Thu, 30 May 91 18:35:57 PDT From: Gumby Vinayak Wallace gumby@cygnus.com To: UNIX-HATERS Subject: Unix’s Berkeley FFS Have you ever had the misfortune of trying to retrieve a file from backup? Apart from being slow and painful, someone here discov- ered to his misfortune that a wildcard, when passed to the restore pro- gram, retrieves only the first file it matches, not every matching file! But maybe that’s considered featureful “minimalism” for a file sys- tem without backup bits. More Sticky Tape Suppose that you wanted to copy a 500-page document. You want a perfect copy, so you buy a new ream of paper, and copy the document one page at a time, making sure each page is perfect. What do you do if you find a page with a smudge? If you have more intelligence than a bowling ball, you recopy the page and continue. If you are Unix, you give up completely, buy a new ream of paper, and start over. No kidding. Even if the document is 500 pages long, and you've successfully copied the first 499 pages. Unix uses magnetic tape to make copies of its disks, not paper, but the analogy is extremely apt. Occasionally, there will be a small imperfection on a tape that can't be written on. Sometimes Unix discovers this after spending a few hours to dump 2 gigabytes. Unix happily reports the bad spot, asks you to replace the tape with a new one, destroy the evil tape, and start over. Yep, Unix considers an entire tape unusable if it can’t write on one inch of it. Other, more robust operating systems, can use these “bad” tapes. They skip over the bad spot when they reach it and continue. The Unix way translates into lost time and money. Unix names a tape many ways. You might think that something as simple as /dev/tape would be used. Not a chance in the Berkeley version of Unix. It encodes specific parameters of tape drives into the name of the device specifier. Instead of a single name like “tape,” Unix uses a different name
Previous Page Next Page