lex

Lex: identifier vs integer

一曲冷凌霜 提交于 2020-01-13 16:02:00
问题 I'm trying to create my own simple programming language. For this I need to insert some regex into Lex. I'm using the following regex to match identifiers and integers. [a-zA-Z][a-zA-Z0-9]* /* identifier */ return IDENTIFIER; ("+"|"-")?[0-9]+ /* integer */ return INTEGER; Now when I check for example an illegal identifier like: 0a = 1; The leading zero is recognized as an integer followed by the 'a' recognized as an identifier. Instead of this I want this token '0a' to be recognized as an

How to make lex/flex recognize tokens not separated by whitespace?

不羁的心 提交于 2020-01-12 08:07:43
问题 I'm taking a course in compiler construction, and my current assignment is to write the lexer for the language we're implementing. I can't figure out how to satisfy the requirement that the lexer must recognize concatenated tokens. That is, tokens not separated by whitespace. E.g.: the string 39if is supposed to be recognized as the number 39 and the keyword if . Simultaneously, the lexer must also exit(1) when it encounters invalid input. A simplified version of the code I have: %{ #include

How to make lex/flex recognize tokens not separated by whitespace?

泪湿孤枕 提交于 2020-01-12 08:06:51
问题 I'm taking a course in compiler construction, and my current assignment is to write the lexer for the language we're implementing. I can't figure out how to satisfy the requirement that the lexer must recognize concatenated tokens. That is, tokens not separated by whitespace. E.g.: the string 39if is supposed to be recognized as the number 39 and the keyword if . Simultaneously, the lexer must also exit(1) when it encounters invalid input. A simplified version of the code I have: %{ #include

ANTLR lexer can't lookahead at all

老子叫甜甜 提交于 2020-01-11 09:23:10
问题 I have the following grammar: rule: 'aaa' | 'a' 'a'; It can successfully parse the string 'aaa', but it fails to parse 'aa' with the following error: line 1:2 mismatched character '<EOF>' expecting 'a' FYI, it is the lexer's problem not the parser's because I don't even call the parser. The main function looks like: @members { public static void main(String[] args) throws Exception { RecipeLexer lexer = new RecipeLexer(new ANTLRInputStream(System.in)); for (Token t = lexer.nextToken(); t

PLY lexer for numbers always returns double

痞子三分冷 提交于 2020-01-07 07:53:20
问题 I am having trouble in ply lex with int and double using the following program. DOUBLE_VAL is returned for 1 whereas i expected INT_VAL. On changing order of INT_VAL and DOUBLE_VAL functions, i get an error on decimal point. How can i resolve them ? tokens = ( 'VERSION', 'ID', 'INT_VAL', 'DOUBLE_VAL' ) t_ignore = ' \t' def t_VERSION(t): r'VERSION' return t def t_DOUBLE_VAL(t): '[-+]?[0-9]+(\.[0-9]+)?([eE][-+]?[0-9]+)?' return t def t_INT_VAL(t): r'[-+]?[0-9]+' return t def t_ID(t): r'[a-zA-Z_

no error while parsing empty file yacc/lex

核能气质少年 提交于 2020-01-06 02:58:06
问题 I have a parser with me generated from yacc/lex. It is working fine for all the rules I have set except one case. If file is empty which this parser is parsing it gives error. I want to add rule so that it does not give error when file is empty. I have not added any checks for that in either of my .l/.y file. How can this be done with YACC/LEX? Thanks in advance !! 回答1: The lexer should recognize the end of input and return a token accordingly (i.e. EOF ). Your grammar's start rule could look

Lex and Yacc issue with comments

℡╲_俬逩灬. 提交于 2020-01-05 10:13:57
问题 I am trying to locate the root cause of an issue. I have the following line that needs to be parsed - sample format "string"; Where sample and format need to be tokenized and whatever is in the inverted commas needs to be provided to the Parser file. There is a catch however, if I have a perl style comment # inside the string, then I get an error. In the lexer.l , I have the following - stringIdentifier [^"]+ <STRING_S>{stringIdentifier} { strncpy(yylval.str, yytext,1023); yylval.str[1023] =

Lex and Yacc issue with comments

本小妞迷上赌 提交于 2020-01-05 10:10:13
问题 I am trying to locate the root cause of an issue. I have the following line that needs to be parsed - sample format "string"; Where sample and format need to be tokenized and whatever is in the inverted commas needs to be provided to the Parser file. There is a catch however, if I have a perl style comment # inside the string, then I get an error. In the lexer.l , I have the following - stringIdentifier [^"]+ <STRING_S>{stringIdentifier} { strncpy(yylval.str, yytext,1023); yylval.str[1023] =

LEX- yylineno returning 1

末鹿安然 提交于 2020-01-05 08:25:26
问题 I have tried a lot of solutions given online. One of the solutions i have tried is from this link: Flex yylineno set to 1 But none of them seem to work for my code of producing a symbol table. The yylineno value doesn't change. It keeps on showing 1 The input I provided in the input file was: main() while varrrr if This is my code snippet: %% {pound}{includekey}{openarrow}{alpha}+{closearrow} {printf("\n %s : Preprocessor Directive at line no: %d!", yytext, yylineno); newfunction(yytext,

gcc giving error on printf while compiling lex output

倾然丶 夕夏残阳落幕 提交于 2020-01-05 08:19:12
问题 For the example.l lex file I get the error below. If I comment out the printf it goes away. I though that the top section of the lex specification could contain any arbitrary C code between the %{ and %} . I need to be able to print some output before lex matches anything. What is wrong with what I have done and how do I fix it? $ cat example.l %{ #include <stdio.h> printf("foobar\n"); %} %% . ECHO; $ lex example.l $ gcc -g -L/usr/lib/flex-2.5.4a -lfl -o example lex.yy.c example.l:3: error: