I was reading about parsers and parser generators and found this statement in wikipedia\'s LR parsing -page:
Many programming languages can be parsed
The problem is never defined like this, whereas it should be interesting :
what is the smallest set of modifications to C++ grammar that would be necessary so that this new grammar could be perfectly parsed by a "non-context-free" yacc parser ? (making use only of one 'hack' : the typename/identifier disambiguation, the parser informing the lexer of every typedef/class/struct)
I see a few ones:
Type Type;
is forbidden. An identifier declared as a typename cannot become a non-typename identifier (note that struct Type Type
is not ambiguous and could be still allowed).
There are 3 types of names tokens
:
types
: builtin-type or because of a typedef/class/structConsidering template-functions as different tokens solves the func<
ambiguity. If func
is a template-function name, then <
must be the beginning of a template parameter list, otherwise func
is a function pointer and <
is the comparison operator.
Type a(2);
is an object instantiation.
Type a();
and Type a(int)
are function prototypes.
int (k);
is completely forbidden, should be written int k;
typedef int func_type();
and
typedef int (func_type)();
are forbidden.
A function typedef must be a function pointer typedef : typedef int (*func_ptr_type)();
template recursion is limited to 1024, otherwise an increased maximum could be passed as an option to the compiler.
int a,b,c[9],*d,(*f)(), (*g)()[9], h(char);
could be forbidden too, replaced by int a,b,c[9],*d;
int (*f)();
int (*g)()[9];
int h(char);
one line per function prototype or function pointer declaration.
An highly preferred alternative would be to change the awful function pointer syntax,
int (MyClass::*MethodPtr)(char*);
being resyntaxed as:
int (MyClass::*)(char*) MethodPtr;
this being coherent with the cast operator (int (MyClass::*)(char*))
typedef int type, *type_ptr;
could be forbidden too : one line per typedef. Thus it would become
typedef int type;
typedef int *type_ptr;
sizeof int
, sizeof char
, sizeof long long
and co. could be declared in each source file.
Thus, each source file making use of the type int
should begin with
#type int : signed_integer(4)
and unsigned_integer(4)
would be forbidden outside of that #type
directive
this would be a big step into the stupid sizeof int
ambiguity present in so many C++ headers
The compiler implementing the resyntaxed C++ would, if encountering a C++ source making use of ambiguous syntax, move source.cpp
too an ambiguous_syntax
folder, and would create automatically an unambiguous translated source.cpp
before compiling it.
Please add your ambiguous C++ syntaxes if you know some!