lexical-analysis | 易学教程

How to combine Regexp and keywords in Scala parser combinators

阅读更多关于 How to combine Regexp and keywords in Scala parser combinators

问题 I've seen two approaches to building parsers in Scala. The first is to extends from RegexParsers and define your won lexical patterns. The issue I see with this is that I don't really understand how it deals with keyword ambiguities. For example, if my keyword match the same pattern as idents, then it processes the keywords as idents. To counter that, I've seen posts like this one that show how to use the StandardTokenParsers to specify keywords. But then, I don't understand how to specify

DFAs vs Regexes when implementing a lexical analyzer?

阅读更多关于 DFAs vs Regexes when implementing a lexical analyzer?

问题 (I'm just learning how to write a compiler, so please correct me if I make any incorrect claims) Why would anyone still implement DFAs in code (goto statements, table-driven implementations) when they can simply use regular expressions? As far as I understand, lexical analyzers take in a string of characters and churn out a list of tokens which, in the languages' grammar definition, are terminals, making it possible for them to be described by a regular expression. Wouldn't it be easier to

error handling in YACC

阅读更多关于 error handling in YACC

问题 hi there i'm trying to make a simple parser and using lex and yacc. the thing is i wanna print my own error messages rather than error symbol used by yacc which prints syntax error . for example this is my yacc code; %{ #include <stdio.h> #include <string.h> #include "y.tab.h" extern FILE *yyin; extern int linenum; %} %token INTRSW IDENTIFIER INTEGER ASSIGNOP SEMICOLON DOUBLEVAL DOUBLERSW COMMA %token IF ELSE WHILE FOR %token CLOSE_BRA OPEN_BRA CLOSE_PARA OPEN_PARA EQ LE GE %token SUM MINUS

How to use yylval with strings in yacc

阅读更多关于 How to use yylval with strings in yacc

问题 I want to pass the actual string of a token. If I have a token called ID, then I want my yacc file to actually know what ID is called. I thing I have to pass a string using yylval to the yacc file from the flex file. How do I do that? 回答1: See the Flex manual section on Interfacing with YACC. 15 Interfacing with Yacc One of the main uses of flex is as a companion to the yacc parser-generator. yacc parsers expect to call a routine named yylex() to find the next input token. The routine is

How to use yylval with strings in yacc

阅读更多关于 How to use yylval with strings in yacc

How can I modify the text of tokens in a CommonTokenStream with ANTLR?

阅读更多关于 How can I modify the text of tokens in a CommonTokenStream with ANTLR?

问题 I'm trying to learn ANTLR and at the same time use it for a current project. I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens. Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code. For example I've tried: import org.antlr.runtime.*; import java.util.*; public

How to use backslash escape char for new line in JavaCC?

阅读更多关于 How to use backslash escape char for new line in JavaCC?

问题 I have an assignment to create a lexical analyser and I've got everything working except for one bit. I need to create a string that will accept a new line, and the string is delimited by double quotes. The string accepts any number, letter, some specified punctuation, backslashes and double quotes within the delimiters. I can't seem to figure out how to escape a new line character. Is there a certain way of escaping characters like new line and tab? Here's some of my code that might help <

Writing re-entrant lexer with Flex

阅读更多关于 Writing re-entrant lexer with Flex

问题 I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue): reentrant.l: /* Definitions */ digit [0-9] letter [a-zA-Z] alphanum [a-zA-Z0-9] identifier [a-zA-Z_][a-zA-Z0-9_]+ integer [0-9]+ natural [0-9]*[1-9][0-9]* decimal ([0-9]+\.|\.[0-9]+|[0-9]+\.[0-9]+) %{ #include <stdio.h> #define ECHO fwrite(yytext, yyleng, 1, yyout) int totalNums = 0; %} %option reentrant %option

C#/.NET Lexer Generators

阅读更多关于 C#/.NET Lexer Generators

问题 I'm looking for a decent lexical scanner generator for C#/.NET -- something that supports Unicode character categories, and generates somewhat readable & efficient code. Anyone know of one? EDIT: I need support for Unicode categories , not just Unicode characters. There are currently 1421 characters in just the Lu (Letter, Uppercase) category alone, and I need to match many different categories very specifically, and would rather not hand-write the character sets necessary for it. Also,

Tips for creating “Context Free Grammar”

阅读更多关于 Tips for creating “Context Free Grammar”

问题 I am new to CFG's, Can someone give me tips in creating CFG that generates some language For example L = {a m b n | m >= n} What I got is: S o -> a | aS o | aS 1 | e S 1 -> b | bS 1 | e but I think this area is wrong, because there is a chance that the number of b 's can be greater than a 's. 回答1: How to write CFG with example a m b n L = {a m b n | m >= n}. Language description: a m b n consist of a followed by b where number of a are equal or more then number of b . some example strings: {^