flex-lexer

Lexing The VHDL ' (tick) Token

泄露秘密 提交于 2019-12-22 08:45:08
问题 In VHDL it the ' character can be used to encapsulate a character token ie '.' or it can as an attribute separator (similarish to CPP's :: token) ie string'("hello") . The issue comes up when parsing an attribute name containing a character ie string'('a','b','c') . In this case a naive lexer will incorrectly tokenize the first '(' as a character, and all of the following actual character will be messed up. There is a thread in comp.lang.vhdl google group from 2007 which asks a similar

Flex/Bison Error:request for member `str' in something not a structure or union

杀马特。学长 韩版系。学妹 提交于 2019-12-21 21:34:34
问题 I'm learning flex/bison. I wrote the following program but getting errors. %{ #include <stdio.h> typedef struct node { struct node *left; struct node *right; char *token; }node; node *mknode( node *left, node *right, char *token); void printtree(node *tree); #define YYSTYPE struct node * %} %union { char* str; int num; } %start lines %token <str> WORD %token <str> PLUS MINUS TIMES DIVIDE POWER %token <str> LEFT_PARENTHESIS RIGHT_PARENTHESIS %token <str> END %left PLUS MINUS %left TIMES DIVIDE

Is it possible to set priorities for rules to avoid the “longest-earliest” matching pattern?

五迷三道 提交于 2019-12-21 03:32:07
问题 Another simple question : is there any way to tell flex to prefer a rule that matches a short thing over a rule that matches a longer thing ? I can't find any good documentation about that. Here is why I need that : I parse a file for a pseudo language that contains some keywords corresponding to control instructions. I'd like them to be the absolute priority so that they're not parsed as parts of an expression. I actually need this priority thing because I don't have to write a full grammar

Lex regex gets some extra characters

為{幸葍}努か 提交于 2019-12-20 06:01:07
问题 I have the following definition in my lex file: L [a-zA-Z_] A [a-zA-Z_0-9] %% {L}{A}* { yylval.id = yytext; return IDENTIFIER; } And I do the following in my YACC file: primary_expression : IDENTIFIER { puts("IDENTIFIER: "); printf("%s", $1); } My source code (the one I'm analyzing) has the following assignment: ab= 10; For some reason, that printf("%s", $1); part is printing ab= and not only ab . I'm pretty sure that's the section that is printing ab= because when I delete the printf("%s",

Parser - Segmentation fault when calling yytext

自古美人都是妖i 提交于 2019-12-20 05:47:09
问题 My parser is recognizing the grammar and indicating the correct error line using yylineno. I want to print the symbol wich caused the error. int yyerror(string s) { extern int yylineno; // defined and maintained in lex.yy.c extern char *yytext; // defined and maintained in lex.yy.c cerr << "error: " << s << " -> " << yytext << " @ line " << yylineno << endl; //exit(1); } I get this error when I write something not acceptable by the grammar: error: syntax error -> Segmentation fault Am I not

Flex/Lex - How to know if a variable was declared

依然范特西╮ 提交于 2019-12-20 01:38:50
问题 My grammar allows: C → id := E // assign a value/expression to a variable (VAR) C → print(id) // print variables(VAR) values To get it done, my lex file is: [a-z]{ yylval.var_index=get_var_index(yytext); return VAR; } get_var_index returns the index of the variable in the list, if it does not exist then it creates one. It is working! The problem is: Everytime a variable is matched on lex file it creates a index to that variable. I have to report if 'print(a)' is called and 'a' was not

Writing re-entrant lexer with Flex

安稳与你 提交于 2019-12-19 06:59:08
问题 I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue): reentrant.l: /* Definitions */ digit [0-9] letter [a-zA-Z] alphanum [a-zA-Z0-9] identifier [a-zA-Z_][a-zA-Z0-9_]+ integer [0-9]+ natural [0-9]*[1-9][0-9]* decimal ([0-9]+\.|\.[0-9]+|[0-9]+\.[0-9]+) %{ #include <stdio.h> #define ECHO fwrite(yytext, yyleng, 1, yyout) int totalNums = 0; %} %option reentrant %option

Shift Reduce Conflict

99封情书 提交于 2019-12-18 09:38:48
问题 I'm having trouble fixing a shift reduce conflict in my grammar. I tried to add -v to read the output of the issue and it guides me towards State 0 and mentions that my INT and FLOAT is reduced to variable_definitions by rule 9. I cannot see the conflict and I'm having trouble finding a solution. %{ #include <stdio.h> #include <stdlib.h> %} %token INT FLOAT %token ADDOP MULOP INCOP %token WHILE IF ELSE RETURN %token NUM ID %token INCLUDE %token STREAMIN ENDL STREAMOUT %token CIN COUT %token

yytext contains characters not in match

谁都会走 提交于 2019-12-18 09:31:19
问题 Background I am using flex to generate a lexer for a programming language I am implementing. I have some problems with this rule for identifiers: [a-zA-Z_][a-zA-Z_0-9]* { printf("yytext is %s\n", yytext); yylval.s = yytext; return TOK_IDENTIFIER; } The rule works as it should when my parser is parsing expressions like this: var0 = var1 + var2; The printf statement will print out this: yytext is 'var0' yytext is 'var1' yytext is 'var2' Which is what it should. The problem But when my parser is

Should I avoid “|” in flex patterns?

点点圈 提交于 2019-12-18 09:18:04
问题 I've heard that the "|" operator slows down regex matching, and it certainly seems to be true in Perl, for example. Do I have to worry about that when I build scanners with tools like the Flex lexer generator? 回答1: Absolutely not. Flex (like the lex lexer generator on which it was based, and most other similar compiler-construction tools) compiles all of the regular expressions in the scanner into a single Deterministic Finite State Automaton (DFA). The DFA never backs up during the scan of a