flex code:
1 %option noyywrap nodefault yylineno case-insensitive
2 %{
3 #include \"stdio.h\"
4 #include \"tp.tab.h\"
5 %}
6
7 %%
8 \"{\"
Short answer:
yylval.strval = yytext;
You can't use yytext
like that. The string it points to is private to the lexer and will change as soon as the flex action finishes. You need to do something like:
yylval.strval = strdup(yytext);
and then you need to make sure you free the memory afterwards.
Longer answer:
yytext
is actually a pointer into the buffer containing the input. In order to make yytext work as though it were a NUL-terminated string, the flex
framework overwrites the character following the token with a NUL
before it does the action, and then replaces the original character when the action terminates. So strdup
will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token. And it gets worse later, since flex
will read the next part of the source into the same buffer, and now your pointer is to random garbage. There are several possible scenarios, depending on flex
options, but none of them are pretty.
So the golden rule: yytext
is only valid until the end of the action. If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.
In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management. But you still have essentially the same memory management issue with, for example, character string literals.