192 Programming a_function() { char c,buff[80] int i = 0 while ((c = getchar()) != '\n') buff[i++] = c buff[i] = '\000' do_it(buff) } Code like this litters Unix. Note how the stack buffer is 80 characters long—because most Unix files only have lines that are 80 character long. Note also how there is no bounds check before a new character is stored in the character array and no test for an end-of-file condition. The bounds check is probably missing because the programmer likes how the assign- ment statement (c = getchar()) is embedded in the loop conditional of the while statement. There is no room to check for end-of-file because that line of code is already testing for the end of a line. Believe it or not, some peo- ple actually praise C for just this kind of terseness—understandability and maintainability be damned! Finally, do_it is called, and the character array suddenly becomes a pointer, which is passed as the first function argument. Exercise for the reader: What happens to this function when an end-of-file condition occurs in the middle of a line of input? When Unix users discover these built-in limits, they tend not to think that the bugs should be fixed. Instead, users develop ways to cope with the situ- ation. For example, tar, the Unix “tape archiver,” can’t deal with path names longer than 100 characters (including directories). Solution: don’t use tar to archive directories to tape use dump. Better solution: Don’t use deep subdirectories, so that a file’s absolute path name is never longer than 100 characters. The ultimate example of careless Unix programming will probably occur at 10:14:07 p.m. on January 18, 2038, when Unix’s 32-bit timeval field overflows… To continue with our example, let’s imagine that our function is called upon to read a line of input that is 85 characters long. The function will read the 85 characters with no problem but where do the last 5 characters end up? The answer is that they end up scribbling over whatever happened to be in the 5 bytes right after the character array. What was there before? The two variables, c and i, might be allocated right after the character array and therefore might be corrupted by the 85-character input line. What about an 850-character input line? It would probably overwrite important
“It Can’t Be a Bug, My Makefile Depends on It!” 193 bookkeeping information that the C runtime system stores on the stack, such as addresses for returning from subroutine calls. At best, corrupting this information will probably cause a program to crash. We say “probably” because you can corrupt the runtime stack to achieve an effect that the original programmer never intended. Imagine that our func- tion was called upon to read a really long line, over 2,000 characters, and that this line was set up to overwrite the bookkeeping information on the call stack so that when the C function returns, it will call a piece of code that was also embedded in the 2,000 character line. This embedded piece of code may do something truly useful, like exec a shell that can run com- mands on the machine. Robert T. Morris’s Unix Worm employed exactly this mechanism (among others) to gain access to Unix computers. Why anyone would want to do that remains a mystery. Date: Thu, 2 May 91 18:16:44 PDT From: Jim McDonald jlm%missoula@lucid.com To: UNIX-HATERS Subject: how many fingers on your hands? Sad to say, this was part of a message to my manager today: The bug was that a program used to update Makefiles had a pointer that stepped past the array it was supposed to index and scribbled onto some data structures used to compute the dependency lists it was auto-magically writing into a Makefile. The net result was that later on the corrupted Makefile didn’t compile everything it should, so necessary .o files weren’t being written, so the build eventually died. One full day wasted because some idiot thought 10 includes was the most anyone would ever use, and then dangerously optimized code that was going to run for less than a millisecond in the process of creating X Makefiles! The disadvantage of working over networks is that you can’t so eas- ily go into someone else's office and rip their bloody heart out. Exceptional Conditions The main challenge of writing robust software is gracefully handling errors and other exceptions. Unfortunately, C provides almost no support for han- dling exceptional conditions. As a result, few people learning program- ming in today’s schools and universities know what exceptions are.