“It Can’t Be a Bug, My Makefile Depends on It!” 191 features would amount to nothing more than wasted silicon since the majority of programs, written in C, wouldn’t use them. Recall that C has no way to handle integer overflow. The solution when using C is simply to use integers that are larger than the problem you have to deal with—and hope that the problem doesn’t get larger during the life- time of your program. C doesn’t really have arrays either. It has something that looks like an array but is really a pointer to a memory location. There is an array indexing expression, array[index], that is merely shorthand for the expression (*(array + index)). Therefore it’s equally valid to write index[array], which is also shorthand for (*(array+index)). Clever, huh? This duality can be commonly seen in the way C programs handle character arrays. Array vari- ables are used interchangeably as pointers and as arrays. To belabor the point, if you have: char *str = "bugy” …then the following equivalencies are also true: 0[str] == 'b' *(str+1) == 'u' *(2+str) == 'g' str[3] == 'y' Isn’t C grand? The problem with this approach is that C doesn’t do any automatic bounds checking on the array references. Why should it? The arrays are really just pointers, and you can have pointers to anywhere in memory, right? Well, you might want to ensure that a piece of code doesn’t scribble all over arbi- trary pieces of memory, especially if the piece of memory in question is important, like the program’s stack. This brings us to the first source of bugs mentioned in the Miller paper. Many of the programs that crashed did so while reading input into a char- acter buffer that was allocated on the call stack. Many C programs do this the following C function reads a line of input into a stack-allocated array and then calls do_it on the line of input.
192 Programming a_function() { char c,buff[80] int i = 0 while ((c = getchar()) != '\n') buff[i++] = c buff[i] = '\000' do_it(buff) } Code like this litters Unix. Note how the stack buffer is 80 characters long—because most Unix files only have lines that are 80 character long. Note also how there is no bounds check before a new character is stored in the character array and no test for an end-of-file condition. The bounds check is probably missing because the programmer likes how the assign- ment statement (c = getchar()) is embedded in the loop conditional of the while statement. There is no room to check for end-of-file because that line of code is already testing for the end of a line. Believe it or not, some peo- ple actually praise C for just this kind of terseness—understandability and maintainability be damned! Finally, do_it is called, and the character array suddenly becomes a pointer, which is passed as the first function argument. Exercise for the reader: What happens to this function when an end-of-file condition occurs in the middle of a line of input? When Unix users discover these built-in limits, they tend not to think that the bugs should be fixed. Instead, users develop ways to cope with the situ- ation. For example, tar, the Unix “tape archiver,” can’t deal with path names longer than 100 characters (including directories). Solution: don’t use tar to archive directories to tape use dump. Better solution: Don’t use deep subdirectories, so that a file’s absolute path name is never longer than 100 characters. The ultimate example of careless Unix programming will probably occur at 10:14:07 p.m. on January 18, 2038, when Unix’s 32-bit timeval field overflows… To continue with our example, let’s imagine that our function is called upon to read a line of input that is 85 characters long. The function will read the 85 characters with no problem but where do the last 5 characters end up? The answer is that they end up scribbling over whatever happened to be in the 5 bytes right after the character array. What was there before? The two variables, c and i, might be allocated right after the character array and therefore might be corrupted by the 85-character input line. What about an 850-character input line? It would probably overwrite important
Previous Page Next Page