“It Can’t Be a Bug, My Makefile Depends on It!” 189 Filename Expansion There is one exception to Unix’s each-program-is-self-contained rule: file- name expansion. Very often, one wants Unix utilities to operate on one or more files. The Unix shells provide a shorthand for naming groups of files that are expanded by the shell, producing a list of files that is passed to the utility. For example, say your directory contains the files A, B, and C. To remove all of these files, you might type rm *. The shell will expand “*” to “A B C” and pass these arguments to rm. There are many, many problems with this approach, which we discussed in the previous chapter. You should know, though, that using the shell to expand filenames is not an historical accident: it was a carefully reasoned design decision. In “The Unix Pro- gramming Environment” by Kernighan and Mashey (IEEE Computer, April 1981), the authors claim that, “Incorporating this mechanism into the shell is more efficient than duplicating it everywhere and ensures that it is available to programs in a uniform way.”3 Excuse me? The Standard I/O library (stdio in Unix-speak) is “available to programs in a uniform way.” What would have been wrong with having library functions to do filename expansion? Haven’t these guys heard of linkable code libraries? Furthermore, the efficiency claim is completely vacuous since they don't present any performance numbers to back it up. They don’t even explain what they mean by “efficient.” Does having file- name expansion in the shell produce the most efficient system for program- mers to write small programs, or does it simply produce the most efficient system imaginable for deleting the files of untutored novices? Most of the time, having the shell expand file names doesn’t matter since the outcome is the same as if the utility program did it. But like most things in Unix, it sometimes bites. Hard. Say you are a novice user with two files in a directory, A.m and B.m. You’re used to MS-DOS and you want to rename the files to A.c and B.c. Hmm. There’s no rename command, but there’s this mv command that looks like it does the same thing. So you type mv *.m *.c. The shell expands this to mv A.m B.m and mv overwrites B.m with A.m. This is a bit of a shame since you had been working on B.m for the last couple of hours and that was your only copy. 3Note that this decision flies in the face of the other lauded Unix decision to let any user run any shell. You can’t run any shell: you have to run a shell that performs star-name expansion.—Eds.
190 Programming Spend a few moments thinking about this problem and you can convince yourself that it is theoretically impossible to modify the Unix mv command so that it would have the functionality of the MS-DOS “rename” command. So much for software tools. Robustness, or “All Lines Are Shorter Than 80 Characters” There is an amusing article in the December 1990 issue of Communica- tions of the ACM entitled “An Empirical Study of the Reliability of Unix Utilities” by Miller, Fredriksen, and So. They fed random input to a num- ber of Unix utility programs and found that they could make 24-33% (depending on which vendor’s Unix was being tested) of the programs crash or hang. Occasionally the entire operating system panicked. The whole article started out as a joke. One of the authors was trying to get work done over a noisy phone connection, and the line noise kept crashing various utility programs. He decided to do a more systematic investigation of this phenomenon. Most of the bugs were due to a number of well-known idioms of the C pro- gramming language. In fact, much of the inherent brain damage in Unix can be attributed to the C language. Unix’s kernel and all its utilities are written in C. The noted linguistic theorist Benjamin Whorf said that our language determines what concepts we can think. C has this effect on Unix it prevents programmers from writing robust software by making such a thing unthinkable. The C language is minimal. It was designed to be compiled efficiently on a wide variety of computer hardware and, as a result, has language constructs that map easily onto computer hardware. At the time Unix was created, writing an operating system’s kernel in a high-level language was a revolutionary idea. The time has come to write one in a language that has some form of error checking. C is a lowest-common-denominator language, built at a time when the low- est common denominator was quite low. If a PDP-11 didn’t have it, then C doesn’t have it. The last few decades of programming language research have shown that adding linguistic support for things like error handling, automatic memory management, and abstract data types can make it dra- matically easier to produce robust, reliable software. C incorporates none of these findings. Because of C’s popularity, there has been little motiva- tion to add features such as data tags or hardware support for garbage col- lection into the last, current and next generation of microprocessors: these
Previous Page Next Page