Writing re-entrant lexer with Flex

I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue):

reentrant.l:

/* Definitions */

digit           [0-9]
letter          [a-zA-Z]
alphanum        [a-zA-Z0-9]
identifier      [a-zA-Z_][a-zA-Z0-9_]+
integer         [0-9]+
natural         [0-9]*[1-9][0-9]*
decimal         ([0-9]+\.|\.[0-9]+|[0-9]+\.[0-9]+)

%{
    #include <stdio.h>

    #define ECHO fwrite(yytext, yyleng, 1, yyout)

    int totalNums = 0;
%}

%option reentrant
%option prefix="simpleit_"

%%

^(.*)\r?\n     printf("%d\t%s", yylineno++, yytext);

%%
/* Routines */

int yywrap(yyscan_t yyscanner)
{
    return 1;
}

int main(int argc, char* argv[])
{
    yyscan_t yyscanner;

    if(argc < 2) {
        printf("Usage: %s fileName\n", argv[0]);
        return -1;
    }

    yyin = fopen(argv[1], "rb");

    yylex(yyscanner);

    return 0;
}

Compilation errors:

vietlq@mylappie:~/Desktop/parsers/reentrant$ gcc lex.simpleit_.c 
reentrant.l: In function ‘main’:
reentrant.l:44: error: ‘yyg’ undeclared (first use in this function)
reentrant.l:44: error: (Each undeclared identifier is reported only once
reentrant.l:44: error: for each function it appears in.)

For a reentrant lexer, all communication must include the state, which is contained within the scanner.

Anywhere in your program (e.g. inside main) you can access the state variables via special functions to which you will pass your scanner. E.g., in your original reentrant.l, you can do this:

yyscan_t scanner;
yylex_init(&scanner);
yyset_in(fopen(argv[1], "rb"), scanner);
yylex(scanner);
yylex_destroy(scanner);

I have renamed scanner to avoid confusion with yyscanner in the actions. In contrast with general C code, all your actions occur within a giant function called yylex, which is passed your scanner by the name yyscanner. Thus, yyscanner is available to all your actions. In addition, yylex has a local variable called yyg that holds the entire state, and most macros conveniently refer to yyg.

While it is true that you can use the yyin macro inside main by defining yyg as you did in your own Answer, that is not recommended. For a reentrant lexer, the macros are meant for actions only.

To see how this is implemented, you can always view the generated code:


/* For convenience, these vars
   are macros in the reentrant scanner. */
#define yyin yyg->yyin_r
...

/* Holds the entire state of the reentrant scanner. */
struct yyguts_t
...

#define YY_DECL int yylex (yyscan_t yyscanner)

/** The main scanner function which does all the work.
 */
YY_DECL
{
    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
...
}

There is lots more on the reentrant option in the flex docs, which include a cleanly compiling example. (Google "flex reentrant", and look for the flex.sourceforge link.) Unlike bison, flex has a fairly straight-forward model for reentrancy. I strongly suggest using reentrant flex with Lemon Parser, rather than with yacc/bison.

来源：https://stackoverflow.com/questions/2634998/writing-re-entrant-lexer-with-flex

标签

thread-safety

flex-lexer

lexical-analysis

reentrancy