Why can't C++ be parsed with a LR(1) parser?

后端 未结 6 1413

I was reading about parsers and parser generators and found this statement in wikipedia\'s LR parsing -page:

Many programming languages can be parsed

6条回答
  •  情歌与酒
    2020-11-22 02:23

    The problem is never defined like this, whereas it should be interesting :

    what is the smallest set of modifications to C++ grammar that would be necessary so that this new grammar could be perfectly parsed by a "non-context-free" yacc parser ? (making use only of one 'hack' : the typename/identifier disambiguation, the parser informing the lexer of every typedef/class/struct)

    I see a few ones:

    1. Type Type; is forbidden. An identifier declared as a typename cannot become a non-typename identifier (note that struct Type Type is not ambiguous and could be still allowed).

      There are 3 types of names tokens :

      • types : builtin-type or because of a typedef/class/struct
      • template-functions
      • identifiers : functions/methods and variables/objects

      Considering template-functions as different tokens solves the func< ambiguity. If func is a template-function name, then < must be the beginning of a template parameter list, otherwise func is a function pointer and < is the comparison operator.

    2. Type a(2); is an object instantiation. Type a(); and Type a(int) are function prototypes.

    3. int (k); is completely forbidden, should be written int k;

    4. typedef int func_type(); and typedef int (func_type)(); are forbidden.

      A function typedef must be a function pointer typedef : typedef int (*func_ptr_type)();

    5. template recursion is limited to 1024, otherwise an increased maximum could be passed as an option to the compiler.

    6. int a,b,c[9],*d,(*f)(), (*g)()[9], h(char); could be forbidden too, replaced by int a,b,c[9],*d; int (*f)();

      int (*g)()[9];

      int h(char);

      one line per function prototype or function pointer declaration.

      An highly preferred alternative would be to change the awful function pointer syntax,

      int (MyClass::*MethodPtr)(char*);

      being resyntaxed as:

      int (MyClass::*)(char*) MethodPtr;

      this being coherent with the cast operator (int (MyClass::*)(char*))

    7. typedef int type, *type_ptr; could be forbidden too : one line per typedef. Thus it would become

      typedef int type;

      typedef int *type_ptr;

    8. sizeof int, sizeof char, sizeof long long and co. could be declared in each source file. Thus, each source file making use of the type int should begin with

      #type int : signed_integer(4)

      and unsigned_integer(4) would be forbidden outside of that #type directive this would be a big step into the stupid sizeof int ambiguity present in so many C++ headers

    The compiler implementing the resyntaxed C++ would, if encountering a C++ source making use of ambiguous syntax, move source.cpp too an ambiguous_syntax folder, and would create automatically an unambiguous translated source.cpp before compiling it.

    Please add your ambiguous C++ syntaxes if you know some!

提交回复
热议问题