lexer

antlr4 can't extract literal into token

余生长醉 提交于 2020-01-05 09:05:10
问题 I have the following grammar and am trying to start out slowly, working up to move complex arguments. grammar Command; commands : command+ EOF; command : NAME args NL; args : arg | ; arg : DASH LOWER | LOWER; //arg : DASH 'a' | 'x'; NAME : [_a-zA-Z0-9]+; NL : '\n'; WS : [ \t\r]+ -> skip ; // spaces, tabs, newlines DASH : '-'; LOWER: [a-z];//'a' .. 'z'; I was hoping (for now) to parse files like this: cmd1 cmd3 -a If I run that input through grun I get an error: $ java org.antlr.v4.gui.TestRig

ANTLR: How to skip multiline comments

丶灬走出姿态 提交于 2020-01-04 04:12:19
问题 Given the following lexer: lexer grammar CodeTableLexer; @header { package ch.bsource.ice.parsers; } CodeTabHeader : OBracket Code ' ' Table ' ' Version CBracket; CodeTable : Code ' '* Table; EndCodeTable : 'end' ' '* Code ' '* Table; Code : 'code'; Table : 'table'; Version : '1.0'; Row : 'row'; Tabdef : 'tabdef'; Override : 'override' | 'no_override'; Obsolete : 'obsolete'; Substitute : 'substitute'; Status : 'activ' | 'inactive'; Pkg : 'include_pkg' | 'exclude_pkg'; Ddic : 'include_ddic' |

Haskell lexical layout rule implementation

天大地大妈咪最大 提交于 2020-01-03 11:13:09
问题 I have been working on a pet language, which has Haskell-like syntax. One of the neat things which Haskell does, which I have been trying to replicate, is its insertion of {, } and ; tokens based on code layout, before the parsing step. I found http://www.haskell.org/onlinereport/syntax-iso.html, which includes a specification of how to implement the layout program, and have made a version of it (modified, of course, for my (much simpler) language). Unfortunately, I am getting an incorrect

Haskell lexical layout rule implementation

戏子无情 提交于 2020-01-03 11:12:12
问题 I have been working on a pet language, which has Haskell-like syntax. One of the neat things which Haskell does, which I have been trying to replicate, is its insertion of {, } and ; tokens based on code layout, before the parsing step. I found http://www.haskell.org/onlinereport/syntax-iso.html, which includes a specification of how to implement the layout program, and have made a version of it (modified, of course, for my (much simpler) language). Unfortunately, I am getting an incorrect

Simple XML parser in bison/flex

ぐ巨炮叔叔 提交于 2020-01-02 06:15:09
问题 I would like to create simple xml parser using bison/flex. I don't need validation, comments, arguments, only <tag>value</tag> , where value can be number, string or other <tag>value</tag> . So for example: <div> <mul> <num>20</num> <add> <num>1</num> <num>5</num> </add> </mul> <id>test</id> </div> If it helps, I know the names of all tags that may occur. I know how many sub-tag can be hold by given tag. Is it possible to create bison parser that would do something like that: - new Tag("num",

What would be a good Delphi lexer/parser for Javascript language file? [closed]

这一生的挚爱 提交于 2020-01-02 05:07:52
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Background I want to be able to parse Javascript source in a Delphi Application. I need to be able to identify variables and functions within the source for the purpose of making changes to the code through later code. I understand that I probably need to use a lexer for this purpose but have not had much luck

When parsing Javascript, what determines the meaning of a slash?

三世轮回 提交于 2019-12-28 02:45:06
问题 Javascript has a tricky grammar to parse. Forward-slashes can mean a number of different things: division operator, regular expression literal, comment introducer, or line-comment introducer. The last two are easy to distinguish: if the slash is followed by a star, it starts a multiline comment. If the slash is followed by another slash, it is a line-comment. But the rules for disambiguating division and regex literal are escaping me. I can't find it in the ECMAScript standard. There the

Oracle 全文索引学习

孤者浪人 提交于 2019-12-25 11:50:14
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 关于字符串搜索 有很多时候,使用instr和like实现字符串搜索是很方便快捷,特别是搜索仅跨越很小的表的时候如下所示: SELECT *FROM mytext WHERE INSTR (thetext, 'Oracle') > 0; SELECT * FROM mytext WHERE thetext LIKE '%Oracle%'; 然而对于大表,通过这些文本定位的方法将导致全表扫描,对资源来说消耗比较昂贵,而且实现的搜索功能也非常有限。 Oracle Text,即oracle 全文索引,则可以解决上述方法在效率上及及功能上的局限性。(对于性能的对比可以参阅 http://viralpatel.net/blogs/oracle-index-usage-like-operator-domain-indexes/ ) oracle全文检索 全文检索(full-text search),是 一种将文件中所有文本与检索项匹配的文字资料检索方法。在 oracle 中,用户可以使用 oracle 服务器的上下文( context )选项完成基于文本的查询,相应的方法有通配符查找、模糊匹配、相关分类、近似查找、条件加权和词意扩充等。在 Oracle8.0.x 中 称为 ConText ,在 Oracle8i 中 称为

Ordering lexer rules in a grammar using ANTLR4

三世轮回 提交于 2019-12-25 10:55:15
问题 I'm using ANTLR4 to generate a parser. I am new to parser grammars. I've read the very helpful ANTLR Mega Tutorial but I am still stuck on how to properly order (and/or write) my lexer and parser rules. I want the parser to be able to handle something like this: Hello << name >>, how are you? At runtime I will replace "<< name >>" with the user's name. So mostly I am parsing text words (and punctuation, symbols, etc), except with the occasional "<< something >>" tag, which I am calling a

ANTLR4 - Need an explanation on this String Literals

只愿长相守 提交于 2019-12-25 07:39:20
问题 On my assignment, I have this description for the String Lexer: "String literals consist zero or more characters enclosed by double quotes ("). Use escape sequences (listed below) to represent special characters within a string. It is a compile-time error for a new line or EOF character to appear inside a string literal. All the supported escape sequences are as follows: \b backspace \f formfeed \r carriage return \n newline \t horizontal tab \" double quote \ backslash The following are