Programming in Plato’s Cave 177 Unix programmers are like mathematicians. It’s a curious phenomenon we call “Programming by Implication.” Once we were talking to a Unix pro- grammer about how nice it would be to have a utility that could examine a program and then answer questions such as: “What functions call function foo?” or “Which functions modify the global variable bar?” He agreed that it would be useful and then observed that, “You could write a program like that.” To be fair, the reason he said “You could write a program like that” instead of actually writing the program is that some properties of the C language and the Unix “Programming Environment” combine synergistically to make writing such a utility a pain of epic proportion. You may think we exaggerate, and that this utility could be easily imple- mented by writing a number of small utility programs and then piping them together, but we’re not, and it can’t. Parsing with yacc “Yacc” was what I felt like doing after I learned how to use yacc(1). —Anonymous “YACC” stands for Yet Another Compiler Compiler. It takes a context- free grammar describing a language to be parsed and computes a state machine for a universal pushdown automaton. When the state machine is run, one gets a parser for the language. The theory is well understood since one of the big research problems in the olden days of computer science was reducing the time it took to write compilers. This scheme has one small problem: most programming languages are not context-free. Thus, yacc users must specify code fragments to be run at certain state transitions to handle the cases where context-free grammars blow up. (Type checking is usually done this way.) Most C compilers today have a yacc-generated parser the yacc grammar for GCC 2.1 (an otherwise fine compiler written by the Free Software Foundation) is about 1650 lines long. The actual code output by yacc and the code for the uni- versal pushdown automaton that runs the yacc output are much larger. Some programming languages are easier to parse. Lisp, for example, can be parsed by a recursive-descent parser. “Recursive-descent” is computer jargon for “simple enough to write on a liter of Coke.” As an experiment,
178 Programming we wrote a recursive-descent parser for Lisp. It took about 250 lines of C. If the parser had been written in Lisp, it would not have even filled a page. The olden days mentioned above were just around the time that the editors of this book were born. Dinosaurs ruled the machine room and Real Men programmed with switches on the front panel. Today, sociologists and his- torians are unable to determine why the seemingly rational programmers of the time designed, implemented, and disseminated languages that were so hard to parse. Perhaps they needed open research problems and writing parsers for these hard-to-parse languages seemed like a good one. It kind of makes you wonder what kinds of drugs they were doing back in the olden days. A program to parse C programs and figure out which functions call which functions and where global variables are read and modified is the equiva- lent of a C compiler front end. C compiler front ends are complex artifacts the complexity of the C language and the difficulty of using tools like yacc make them that way. No wonder nobody is rushing to write this program. Die-hard Unix aficionados would say that you don’t need this program since grep is a perfectly good solution. Plus, you can use grep in shell pipelines. Well, the other day we were looking for all uses of the min func- tion in some BSD kernel code. Here’s an example of what we got: % grep min netinet/ip_icmp.c icmplen = oiplen + min(8, oip-ip_len) * that not corrupted and of at least minimum length. * If the incoming packet was addressed directly to us, * to the incoming interface. * Retrieve any source routing from the incoming packet % Yep, grep finds all of the occurrences of min, and then some. “Don’t know how to make love. Stop.” The ideal programming tool should be quick and easy to use for common tasks and, at the same time, powerful enough to handle tasks beyond that for which it was intended. Unfortunately, in their zeal to be general, many Unix tools forget about the quick and easy part. Make is one such tool. In abstract terms, make’s input is a description of a dependency graph. Each node of the dependency graph contains a set of commands to be run when that node is out of date with respect to the nodes that it depends on. Nodes corresponds to files, and the file dates determine
Previous Page Next Page