ply | 易学教程

PLY - return multiple tokens

阅读更多关于 PLY - return multiple tokens

问题 AFAIK the technique for lexing Python source code is: When current line's indentation level is less than previous line's, produce DEDENT. Produce multiple DEDENTs if it is closing multiple INDENTs. When end of input is reached, produce DEDENT(s) if there's unclosed INDENT(s). Now, using PLY: How do I return multiple tokens from a t_definition? How do I make a t_definition that's called when EOF is reached? Simple \Z doesn't work -- PLY complains that it matches empty string. 回答1: As far as I

How to understand and fix conflicts in PLY

阅读更多关于 How to understand and fix conflicts in PLY

问题 I am working on a SystemVerilog parser and I am running into many ply conflicts (both shift/reduce and reduce/reduce). I currently have like 170+ conflicts and the problem I have is that I don't really understand the parser.out file generated by PLY. Without properly understanding that there is little I can do, so my goal is to understand what ply is reporting. All the PLY documentation is brief and not very explainatory... Here you have one of my states, the first where a conflict is found

ply lexmatch regular expression has different groups than a usual re

阅读更多关于 ply lexmatch regular expression has different groups than a usual re

问题 I am using ply and have noticed a strange discrepancy between the token re match stored in t.lex.lexmatch, as compared with an sre_pattern defined in the usual way with the re module. The group(x)'s seem to be off by 1. I have defined a simple lexer to illustrate the behavior I am seeing: import ply.lex as lex tokens = ('CHAR',) def t_CHAR(t): r'.' t.value = t.lexer.lexmatch return t l = lex.lex() (I get a warning about t_error but ignore it for now.) Now I feed some input into the lexer and

Ply Lex parsing problem

阅读更多关于 Ply Lex parsing problem

问题 I'm using ply as my lex parser. My specifications are the following : t_WHILE = r'while' t_THEN = r'then' t_ID = r'[a-zA-Z_][a-zA-Z0-9_]*' t_NUMBER = r'\d+' t_LESSEQUAL = r'<=' t_ASSIGN = r'=' t_ignore = r' \t' When i try to parse the following string : "while n <= 0 then h = 1" It gives following output : LexToken(ID,'while',1,0) LexToken(ID,'n',1,6) LexToken(LESSEQUAL,'<=',1,8) LexToken(NUMBER,'0',1,11) LexToken(ID,'hen',1,14) ------> PROBLEM! LexToken(ID,'h',1,18) LexToken(ASSIGN,'=',1,20)

yacc - Precedence of a rule with no operator?

阅读更多关于 yacc - Precedence of a rule with no operator?

问题 Thinking about parsing regular expressions using yacc (I'm actually using PLY), some of the rules would be like the following: expr : expr expr expr : expr '|' expr expr : expr '*' The problem is, the first rule(concatenation) must take precedence over the second rule, but not the third one. However, the concatenation rule has no operator in it. How can I specify the precedence correctly in this case? Thank you! EDIT: I modified the rules to avoid the issue, but I'm still curious what was the

How to write a regular expression to match a string literal where the escape is a doubling of the quote character?

阅读更多关于 How to write a regular expression to match a string literal where the escape is a doubling of the quote character?

问题 I am writing a parser using ply that needs to identify FORTRAN string literals. These are quoted with single quotes with the escape character being doubled single quotes. i.e. 'I don''t understand what you mean' is a valid escaped FORTRAN string. Ply takes input in regular expression. My attempt so far does not work and I don't understand why. t_STRING_LITERAL = r"'[^('')]*'" Any ideas? 回答1: A string literal is: An open single-quote, followed by: Any number of doubled-single-quotes and non

getter setter as function in python class giving “no attribute found” error

阅读更多关于 getter setter as function in python class giving “no attribute found” error

问题 import operator import re from ply import lex, yacc class Lexer(object): tokens = [ 'COMMA', 'TILDE', 'PARAM', 'LP', 'RP', 'FUNC' ] # Regular expression rules for simple tokens t_COMMA = r'\,' t_TILDE = r'\~' t_PARAM = r'[^\s\(\),&:\"\'~]+' def __init__(self, dict_obj): self.dict_obj = dict_obj def t_LP(self, t): r'\(' return t def t_RP(self, t): r'\)' return t def t_FUNC(self, t): # I want to generate token for this FUNC from the keys of model map # For eg: r'key1|key2' r'(?i)FUNC' return t

How to handle multiple rules for one token with PLY

阅读更多关于 How to handle multiple rules for one token with PLY

问题 I'm working with a jison file and converting it to a parser generator using the lex module from python PLY. I've noticed that in this jison file, certain tokens have multiple rules associated with them. For example, for the token CONTENT , the file specifies the following three rules: [^\x00]*?/("{{") { if(yytext.slice(-2) === "\\\\") { strip(0,1); this.begin("mu"); } else if(yytext.slice(-1) === "\\") { strip(0,1); this.begin("emu"); } else { this.begin("mu"); } if(yytext) return 'CONTENT';

Python PLY zero or more occurrences of a parsing item

阅读更多关于 Python PLY zero or more occurrences of a parsing item

问题 I am using Python with PLY to parse LISP-like S-Expressions and when parsing a function call there can be zero or more arguments. How can I put this into the yacc code. This is my function so far: def p_EXPR(p): '''EXPR : NUMBER | STRING | LPAREN funcname [EXPR] RPAREN''' if len(p) == 2: p[0] = p[1] else: p[0] = ("Call", p[2], p[3:-1]) I need to replace "[EXPR]" with something that allows zero or more EXPR's. How can I do this? 回答1: How about this: EXPR : NUMBER | STRING | LPAREN funcname

RegEx with variable data in it - ply.lex

阅读更多关于 RegEx with variable data in it - ply.lex

问题 im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token . data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type