I\'d like to use the same flex/bison scanner/parser for an interpreter and for loading a file to be interpreted. I can not get the newline parsing to work correctly in both
The main problem is that your parser function ypparse
does not return until it reduces the entire language to the start symbol.
If the top level of your grammar is something like:
language : commands ;
commands : command commands | /* empty */ ;
of course the machine will expect a complete script (terminated by you hitting Ctrl-D). If your interpreter is this logic:
loop:
print("prompt>")
yyparse()
if (empty statement)
break
it won't work since yyparse
is consuming the whole script before returning.
The return 0;
solves the problem for this interactive mode because the token value 0 indicates EOF
to the parser, making it think the script has ended.
I do not agree with the solution of making \n
a token. It will only complicate the grammar (a hitherto insignificant piece of whitespace is now significant) and ultimately not work because the yyparse
function will still want to process the complete grammar. That is to say, if you have newline as a token, but the grammar's start symbol represents the entire script, yyparse
will still not return to your interactive prompt loop.
A quick and dirty hack is to let the lexer know whether interactive mode is in effect. Then it can conditionaly return 0;
for every instance of a newline if it is in interactive mode. If the input isn't a complete statement, there will be a syntax error since the script as a whole ends at the newline. In normal file reading mode, your lexer can eats all whitespace without returning, as before allowing the whole file to be processed with a single yyparse
.
If you want interactive input and file reading without implementing two modes of behavior in the lexer, what you can do is change the grammar so it only parses one statement of the language: the yyparse
function returns for every top level statement of your language. (And the lexer eats newlines like before, no returning 0). I.e the start symbol of the grammar is just one statement (possibly empty). Then your file parser must be implemented as a loop (written by you) which calls yyparse to get all the statements from the file until yyparse
encounters an empty input. The downside of this approach is that if the user types incomplete syntax (e.g. dangling open parenthesis), the parser will keep scanning the input until it is satisfied. This is unfriendly, like programs that use scanf
for interactive user input (it's the same problem: scanf
is a parser that doesn't return until it is satisified).
Another possibility is to have an interactive mode which performs its own user input rather than calling yyparse to get the input and parse it. In this mode, you read the user's input into a line buffer. Then you have the parser process the line buffer. To process a line buffer instead of a FILE *
stream is perfectly possible. You just have to write custom input handling (your own definition of the YY_INPUT
macro). This is the approach you will end up needing anyway if you implement a decent interactive mode with line editing and history recall, e.g. using libedit
or GNU readline
.
If pressing ENTER terminates a command then the lexer should return a token for \n. Returning 0 tells the parser the input source is complete (end of file for a file or ^D for a terminal). Add an end-of-line token to your grammar and have the lexer return that when it sees \n.
ETA: But don't forget to handle the case of the last line not ending in ENTER. Have your lexer return an end-of-line token at the end of file unless the last character is \n.