问题
In order to make my question easy to understand I want to use the following example:
The following code is called nonblock do-loop in fortran language
DO 20 I=1, N ! line 1
DO 20 J=1, N ! line 2
! more codes
20 CONTINUE ! line 4
Pay attention that the label 20
at line 4 means the end of both the inner do-loop and the outer do-loop.
I want my flex program to parse the feature correctly: when flex reads the label 20
, it will return ENDDO
terminal twice.
Firstly, because I also use bison, so every time bison calls yylex()
to get one terminal. If I can ask bison to get terminals from yylex()
in some cases, and from another function in other cases, maybe I could solve this problem, however, I got no idea here then.
Of course there are some workarounds, for eample, I can use flex's start condition but I don't think it is a good solution. So I ask if there's any way to solve my question without a workaround?
回答1:
It is easy enough to modify the lexical scanner produced by (f)lex to implement a token queue, but that is not necessarily the optimal solution. (See below for a better solution.) (Also, it is really not clear to me that for your particular problem, fabricating the extra token in the lexer is truly appropriate.)
The general approach is to insert code at the top of the yylex
function, which you can do by placing the code immediately after the %%
line and before the first rule. (The code must be indented so that it is not interpreted as a rule.) For non-reentrant scanners, this will typically involve the use of a local static
variable to hold the queue. For a simple but dumb example, using the C API but compiling with C++ so as to have access to the C++ standard library:
%%
/* This code will be executed each time `yylex` is called, before
* any generated code. It may include declarations, even if compiled
* with C89.
*/
static std::deque<int> tokenq;
if (!tokenq.empty()) {
int token = tokenq.front();
tokenq.pop_front();
return token;
}
[[:digit:]]+ { /* match a number and return that many HELLO tokens */
int n = atoi(yytext);
for (int i = 0; i < n; ++i)
tokenq.push_back(HELLO);
}
The above code makes no attempt to provide a semantic value for the queued tokens; you could achieve that using something like a std::queue<std::pair<int, YYSTYPE>>
for the token queue, but the fact that YYSTYPE
is typically a union
will make for some complications. Also, if that were the only reason to use the token queue, it is obvious that it could be replaced with a simple counter, which would be much more efficient. See, for example, this answer which does something vaguely similar to your question (and take note of the suggestions in Note 1 of that answer).
Better alternative: Use a push parser
Although the token queue solution is attractive and simple, it is rarely the best solution. In most cases, code will be clearer and easier to write if you request bison to produce a "push parser". With a push parser, the parser is called by the lexer every time a token is available. This makes it trivial to return multiple tokens from a lexer action; you just call the parser for each token. Similarly, if a rule doesn't produce any tokens, it simply fails to call the parser. In this model, the only lexer action which actually return
s is the <<EOF>>
rule, and it only does so after calling the parser with the END
token to indicate that parsing is complete.
Unfortunately, the interface for push parsers is not only subject to change, as that manual link indicates; it is also very badly documented. So here is a simple but complete example which shows how it is done.
The push parser keeps its state in a yypstate
structure, which needs to be passed to the parser on each call. Since the lexer is called only once for each input file, it is reasonable for the lexer to own that structure, which can be done as above with a local static variable [Note 1]: the parser state is initialized when yylex
is called, and the EOF
rule deletes the parser state in order to reclaim whatever memory it is using.
It is usually most convenient to build a reentrant push parser, which means that the parser does not rely on the global yylval
variable [Note 2]. Instead, a pointer to the semantic value must be provided as an additional argument to yypush_parse
. If your parser doesn't refer to the semantic value for the particular token type, you can provide NULL for this argument. Or, as in the code below, you can use a local semantic value variable in the lexer. It is not necessary that every call to the push parser provide the same pointer. In all, the changes to the scanner definition are minimal:
%%
/* Initialize a parser state object */
yypstate* pstate = yypstate_new();
/* A semantic value which can be sent to the parser on each call */
YYSTYPE yylval;
/* Some example scanner actions */
"keyword" { /* Simple keyword which just sends a value-less token */
yypush_parse(pstate, TK_KEYWORD, NULL); /* See Note 3 */
}
[[:digit:]]+ { /* Token with a semantic value */
yylval.num = atoi(yytext);
yypush_parse(pstate, TK_NUMBER, &yylval);
}
"dice-roll" { /* sends three random numbers */
for (int i = 0; i < 2; ++i) {
yylval.num = rand() % 6;
yypush_parse(pstate, TK_NUMBER, &yylval);
}
<<EOF>> { /* Obligatory EOF rule */
/* Send the parser the end token (0) */
int status = yypush_parse(pstate, 0, NULL);
/* Free the pstate */
yypstate_delete(pstate);
/* return the parser status; 0 is success */
return status;
}
In the parser, not much needs to be changed at all, other than adding the necessary declarations: [Note 4]
%define api.pure full
%define api.push-pull push
Notes
If you were building a reentrant lexer as well, you would use the extra data section of the lexer state object instead of static variables.
If you are using location objects in your parser to track source code locations, this also applies to
yylloc
.The example code does not do a good job of detecting errors, since it doesn't check return codes from the calls to
yypush_parse
. One solution I commonly use is some variant on the macroSEND
:#define SEND(token) do { \ int status = yypush_parse(pstate, token, &yylval); \ if (status != YYPUSH_MORE) { \ yypstate_delete(pstate); \ return status; \ } \ } while (0)
It's also possible to use a
goto
to avoid the multiple instances of theyypstate_delete
andreturn
. YMMV.You may have to modify the prototype of
yyerror
. If you are using locations and/or providing extra parameters to the push_parser, the location object and/or the extra parameters will also be present in theyyerror
call. (The error string is always the last parameter.) For whatever reason, the parser state object is not provided toyyerror
, which means that theyyerror
function no longer has access to variables such asyych
, which are now members of theyypstate
structure rather than being global variables, so if you use these variables in your error reporting (which is not really recommended practice), then you will have to find an alternative solution.
回答2:
Thanks to one of my friends, he provide a way to achieve
If I can ask bison to get terminals from yylex() in some cases, and from another function in other cases
In flex generated flex.cpp code, there is a macro
/* Default declaration of generated scanner - a define so the user can
* easily add parameters.
*/
#ifndef YY_DECL
#define YY_DECL_IS_OURS 1
extern int yylex (void);
#define YY_DECL int yylex (void)
#endif /* !YY_DECL */
so I can "rename" flex's yylex()
function to another function like pure_yylex()
.
So my problem is solved by:
- push all terminals I want to give bison to a global
vector<int>
- implement a
yylex()
function by myself, when bison callyylex()
, this function will firstly try to get terminals from a that globalvector<int>
- if
vector<int>
is empty,yylex()
callspure_yylex()
, and flex starts to work
来源:https://stackoverflow.com/questions/42434603/how-can-flex-return-multiple-terminals-at-one-time