208 C++ Syntax Syrup of Ipecac Syntactic sugar causes cancer of the semi-colon. —Alan Perlis Practically every kind of syntax error you can make in the C programming language has been redefined in C++, so that now it produces compilable code. Unfortunately, these syntax errors don’t always produce valid code. The reason is that people aren’t perfect. They make typos. In C, no matter how bad it is, these typos are usually caught by the compiler. In C++ they slide right through, promising headaches when somebody actually tries to run the code. C++’s syntactical stew owes itself to the language’s heritage. C++ was never formally designed: it grew. As C++ evolved, a number of constructs were added that introduced ambiguities into the language. Ad hoc rules were used to disambiguate these. The result is a language with nonsensical rules that are so complicated they can rarely be learned. Instead, most pro- grammers keep them on a ready-reference card, or simply refuse to use all of C++’s features and merely program with a restricted subset. For example, there is a C++ rule that says any string that can be parsed as either a declaration or a statement is to be treated as a declaration. Parser experts cringe when they read things like that because they know that such rules are very difficult to implement correctly. AT&T didn’t even get some of these rules correct. For example, when Jim Roskind was trying to figure out the meanings of particular constructs—pieces of code that he thought reasonable humans might interpret differently—he wrote them up and fed them to AT&T’s “cfront” compiler. Cfront crashed. Indeed, if you pick up Jim Roskind’s free grammar for C++ from the Inter- net host ics.uci.edu, you will find the following note in the file c++grammar2.0.tar.Z in the directory ftp/pub: “It should be noted that my grammar cannot be in constant agreement with such implementa- tions as cfront because a) my grammar is internally consistent (mostly courtesy of its formal nature and yacc verification), and b) yacc gener- ated parsers don’t dump core. (I will probably take a lot of flack for that last snipe, but… every time I have had difficulty figuring what was meant syntactically by some construct that the ARM was vague about, and I fed it to cfront, cfront dumped core.)”
Syntax Syrup of Ipecac 209 Date: Sun, 21 May 89 18:02:14 PDT From: tiemann (Michael Tiemann) To: sdm@cs.brown.edu Cc: UNIX-HATERS Subject: C++ Comments Date: 21 May 89 23:59:37 GMT From: sdm@cs.brown.edu (Scott Meyers) Newsgroups: comp.lang.c++ Organization: Brown University Dept. of Computer Science Consider the following C++ source line: //********************** How should this be treated by the C++ compiler? The GNU g++ compiler treats this as a comment-to-EOL followed by a bunch of asterisks, but the AT&T compiler treats it as a slash followed by an open-comment delimiter. I want the former interpretation, and I can’t find anything in Stroustrup’s book that indicates that any other interpretation is to be expected. Actually, compiling -E quickly shows that the culprit is the preprocessor, so my questions are: 1. Is this a bug in the AT&T preprocessor? If not, why not? If so, will it be fixed in 2.0, or are we stuck with it? 2. Is it a bug in the GNU preprocessor? If so, why? Scott Meyers sdm@cs.brown.edu There is an ancient rule for lexing UNIX that the token that should be accepted be the longest one acceptable. Thus ‘foo’ is not parsed as three identifiers, ‘f,’ ‘o,’ and ‘o,’ but as one, namely, ‘foo.’ See how useful this rule is in the following program (and what a judicious choice ‘/*’ was for delimiting comments): double qdiv (p, q) double *p, *q { return *p/*q } So why is the same rule not being applied in the case of C++? Sim- ple. It’s a bug.