208  C++  Syntax  Syrup  of  Ipecac  Syntactic  sugar  causes  cancer  of  the  semi-colon.  —Alan  Perlis  Practically  every  kind  of  syntax  error  you  can  make  in  the  C  programming  language  has  been  redefined  in  C++,  so  that  now  it  produces  compilable  code.  Unfortunately,  these  syntax  errors  don’t  always  produce  valid  code.  The  reason  is  that  people  aren’t  perfect.  They  make  typos.  In  C,  no  matter  how  bad  it  is,  these  typos  are  usually  caught  by  the  compiler.  In  C++  they  slide  right  through,  promising  headaches  when  somebody  actually  tries  to  run  the  code.  C++’s  syntactical  stew  owes  itself  to  the  language’s  heritage.  C++  was  never  formally  designed:  it  grew.  As  C++  evolved,  a  number  of  constructs  were  added  that  introduced  ambiguities  into  the  language.  Ad  hoc  rules  were  used  to  disambiguate  these.  The  result  is  a  language  with  nonsensical  rules  that  are  so  complicated  they  can  rarely  be  learned.  Instead,  most  pro-  grammers  keep  them  on  a  ready-reference  card,  or  simply  refuse  to  use  all  of  C++’s  features  and  merely  program  with  a  restricted  subset.  For  example,  there  is  a  C++  rule  that  says  any  string  that  can  be  parsed  as  either  a  declaration  or  a  statement  is  to  be  treated  as  a  declaration.  Parser  experts  cringe  when  they  read  things  like  that  because  they  know  that  such  rules  are  very  difficult  to  implement  correctly.  AT&T  didn’t  even  get  some  of  these  rules  correct.  For  example,  when  Jim  Roskind  was  trying  to  figure  out  the  meanings  of  particular  constructs—pieces  of  code  that  he  thought  reasonable  humans  might  interpret  differently—he  wrote  them  up  and  fed  them  to  AT&T’s  “cfront”  compiler.  Cfront  crashed.  Indeed,  if  you  pick  up  Jim  Roskind’s  free  grammar  for  C++  from  the  Inter-  net  host  ics.uci.edu,  you  will  find  the  following  note  in  the  file  c++grammar2.0.tar.Z  in  the  directory  ftp/pub:  “It  should  be  noted  that  my  grammar  cannot  be  in  constant  agreement  with  such  implementa-  tions  as  cfront  because  a)  my  grammar  is  internally  consistent  (mostly  courtesy  of  its  formal  nature  and  yacc  verification),  and  b)  yacc  gener-  ated  parsers  don’t  dump  core.  (I  will  probably  take  a  lot  of  flack  for  that  last  snipe,  but…  every  time  I  have  had  difficulty  figuring  what  was  meant  syntactically  by  some  construct  that  the  ARM  was  vague  about,  and  I  fed  it  to  cfront,  cfront  dumped  core.)”  
Syntax  Syrup  of  Ipecac  209  Date:  Sun,  21  May  89  18:02:14  PDT  From:  tiemann  (Michael  Tiemann)  To:  sdm@cs.brown.edu  Cc:  UNIX-HATERS  Subject:  C++  Comments  Date:  21  May  89  23:59:37  GMT  From:  sdm@cs.brown.edu  (Scott  Meyers)  Newsgroups:  comp.lang.c++  Organization:  Brown  University  Dept.  of  Computer  Science  Consider  the  following  C++  source  line:  //**********************  How  should  this  be  treated  by  the  C++  compiler?  The  GNU  g++  compiler  treats  this  as  a  comment-to-EOL  followed  by  a  bunch  of  asterisks,  but  the  AT&T  compiler  treats  it  as  a  slash  followed  by  an  open-comment  delimiter.  I  want  the  former  interpretation,  and  I  can’t  find  anything  in  Stroustrup’s  book  that  indicates  that  any  other  interpretation  is  to  be  expected.  Actually,  compiling  -E  quickly  shows  that  the  culprit  is  the  preprocessor,  so  my  questions  are:  1.  Is  this  a  bug  in  the  AT&T  preprocessor?  If  not,  why  not?  If  so,  will  it  be  fixed  in  2.0,  or  are  we  stuck  with  it?  2.  Is  it  a  bug  in  the  GNU  preprocessor?  If  so,  why?  Scott  Meyers  sdm@cs.brown.edu  There  is  an  ancient  rule  for  lexing  UNIX  that  the  token  that  should  be  accepted  be  the  longest  one  acceptable.  Thus  ‘foo’  is  not  parsed  as  three  identifiers,  ‘f,’  ‘o,’  and  ‘o,’  but  as  one,  namely,  ‘foo.’  See  how  useful  this  rule  is  in  the  following  program  (and  what  a  judicious  choice  ‘/*’  was  for  delimiting  comments):  double  qdiv  (p,  q)  double  *p,  *q  {  return  *p/*q  }  So  why  is  the  same  rule  not  being  applied  in  the  case  of  C++?  Sim-  ple.  It’s  a  bug.  
            
            






































































































































































































































































































































































