RegEx with variable data in it - ply.lex

im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token. data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type `KEYWORD.

data = 'Keyword1 Keyword2 Keyword3 Keyword4'
def t_KEYWORD(t):
    # ... r'\$' + data ??
    return t

text = '''
Some test data


even more

$var = 2231




$[]Test this 2.31 + / &
'''

autoit = lex.lex()
autoit.input(text)
while True:
    tok = autoit.token()
    if not tok: break
    print(tok)

So i was trying to add the variable to that regex, but it didnt work. I'm always gettin: No regular expression defined for rule 't_KEYWORD'.

Thank you in advance! John

As @DSM suggests you can use the TOKEN decorator. The regular expression to find tokens like cat or dog is 'cat|dog' (that is, words separated by '|' rather than a space). So try:

from ply.lex import TOKEN
data = data.split() #make data a list of keywords

@TOKEN('|'.join(data))
def t_KEYWORD(t):
    return t

ply.lex uses the docstring for the regular expression. Notice the order which you define tokens defines their precedence, which this is usually important to manage.

The docstring at the top cannot be an expression, so you need to do this token definition by token definition.

We can test this in the interpreter:

def f():
    "this is " + "my help"  #not a docstring :(
f.func_doc #is None
f.func_doc = "this is " + "my help" #now it is!

Hence this ought to work:

def t_KEYWORD(token):
    return token
t_KEYWORD.func_doc=r'REGULAR EXPRESSION HERE' #can be an expression

Not sure if this works with ply, but the docstring is the __doc__ attribute of a function so if you write a decorator that takes a string expression and sets that to the __doc__ attribute of the function ply might use that.

来源：https://stackoverflow.com/questions/12217816/regex-with-variable-data-in-it-ply-lex

标签

python

lexer

ply