ply

Parsing python with PLY, how to code the indent and dedent part

孤人 提交于 2019-12-07 11:32:08
问题 I was trying to parse the function definition for the python language with PLY. I am encountering issues related to the indentation. For instance for a for statement, I would like to be able to know when the block ends. I read the python grammar here: http://docs.python.org/2/reference/grammar.html And the grammar for this part is: for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite] suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT I don't know how to describe the INDENT and

Match unicode in ply's regexes

非 Y 不嫁゛ 提交于 2019-12-06 04:08:08
问题 I'm matching identifiers, but now I have a problem: my identifiers are allowed to contain unicode characters. Therefore the old way to do things is not enough: t_IDENTIFIER = r"[A-Za-z](\\.|[A-Za-z_0-9])*" In my markup language parser I match unicode characters by allowing all the characters except those I explicitly use, because my markup language only has two or three of characters I need to escape that way. How do I match all unicode characters with python regexs and ply? Also is this a

Python PLY zero or more occurrences of a parsing item

巧了我就是萌 提交于 2019-12-06 01:08:14
I am using Python with PLY to parse LISP-like S-Expressions and when parsing a function call there can be zero or more arguments. How can I put this into the yacc code. This is my function so far: def p_EXPR(p): '''EXPR : NUMBER | STRING | LPAREN funcname [EXPR] RPAREN''' if len(p) == 2: p[0] = p[1] else: p[0] = ("Call", p[2], p[3:-1]) I need to replace "[EXPR]" with something that allows zero or more EXPR's. How can I do this? How about this: EXPR : NUMBER | STRING | LPAREN funcname EXPR_REPEAT RPAREN EXPR_REPEAT: /*nothing*/ | EXPR EXPR_REPEAT Are you sure you want a Context Free Grammar and

How to understand and fix conflicts in PLY

最后都变了- 提交于 2019-12-05 20:12:48
I am working on a SystemVerilog parser and I am running into many ply conflicts (both shift/reduce and reduce/reduce). I currently have like 170+ conflicts and the problem I have is that I don't really understand the parser.out file generated by PLY. Without properly understanding that there is little I can do, so my goal is to understand what ply is reporting. All the PLY documentation is brief and not very explainatory... Here you have one of my states, the first where a conflict is found apparently: state 24 (134) attribute_instance_optional_list -> attribute_instance_list . (136) attribute

RegEx with variable data in it - ply.lex

允我心安 提交于 2019-12-05 19:12:39
im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token . data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type `KEYWORD. data = 'Keyword1 Keyword2 Keyword3 Keyword4' def t_KEYWORD(t): # ... r'\$' + data ?? return t text =

Parsing python with PLY, how to code the indent and dedent part

北慕城南 提交于 2019-12-05 17:57:47
I was trying to parse the function definition for the python language with PLY. I am encountering issues related to the indentation. For instance for a for statement, I would like to be able to know when the block ends. I read the python grammar here: http://docs.python.org/2/reference/grammar.html And the grammar for this part is: for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite] suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT I don't know how to describe the INDENT and DEDENT tokens with PLY. I was trying something like: def t_indentation(t): r' |\t' #some special

Does Pyparsing Support Context-Sensitive Grammars?

旧巷老猫 提交于 2019-12-05 00:31:50
问题 Forgive me if I have the incorrect terminology; perhaps just getting the "right" words to describe what I want is enough for me to find the answer on my own. I am working on a parser for ODL (Object Description Language), an arcane language that as far as I can tell is now used only by NASA PDS (Planetary Data Systems; it's how NASA makes its data available to the public). Fortunately, PDS is finally moving to XML, but I still have to write software for a mission that fell just before the

Match unicode in ply's regexes

跟風遠走 提交于 2019-12-04 10:13:21
I'm matching identifiers, but now I have a problem: my identifiers are allowed to contain unicode characters. Therefore the old way to do things is not enough: t_IDENTIFIER = r"[A-Za-z](\\.|[A-Za-z_0-9])*" In my markup language parser I match unicode characters by allowing all the characters except those I explicitly use, because my markup language only has two or three of characters I need to escape that way. How do I match all unicode characters with python regexs and ply? Also is this a good idea at all? I'd want to let people use identifiers like Ω » « ° foo² väli π as an identifiers

How to write a regular expression to match a string literal where the escape is a doubling of the quote character?

旧城冷巷雨未停 提交于 2019-12-04 03:56:37
I am writing a parser using ply that needs to identify FORTRAN string literals. These are quoted with single quotes with the escape character being doubled single quotes. i.e. 'I don''t understand what you mean' is a valid escaped FORTRAN string. Ply takes input in regular expression. My attempt so far does not work and I don't understand why. t_STRING_LITERAL = r"'[^('')]*'" Any ideas? A string literal is: An open single-quote, followed by: Any number of doubled-single-quotes and non-single-quotes, then A close single quote. Thus, our regex is: r"'(''|[^'])*'" You want something like this: r"

Does Pyparsing Support Context-Sensitive Grammars?

邮差的信 提交于 2019-12-03 15:44:17
Forgive me if I have the incorrect terminology; perhaps just getting the "right" words to describe what I want is enough for me to find the answer on my own. I am working on a parser for ODL (Object Description Language), an arcane language that as far as I can tell is now used only by NASA PDS (Planetary Data Systems; it's how NASA makes its data available to the public). Fortunately, PDS is finally moving to XML, but I still have to write software for a mission that fell just before the cutoff. ODL defines objects in something like the following manner: OBJECT = TABLE ROWS = 128 ROW_BYTES =