问题
I have following parsing problem. My keywords can be prefixed with an underscore for deactivation of the option, block etc.
#coding: utf8
from pyparsing import Keyword, Combine, pyparsing_common, Literal, Suppress, Group, OneOrMore
test_string = r'''
keyword1list {
keyword1 {
option 213
}
_keyword1 {
option 214
}
}
'''
This can happen to any keyword, here keyword1list
, keyword1
or option
. What I like to achieve is to either leave those blocks out during parsing or parse them but catch the deactivation prefix.
Currently, I can successfully parse the "activated" test_string
with the following code, but it fails for apparent reasons with the underscored keyword.
lparent = Suppress(Literal('{'))
rparent = Suppress(Literal('}'))
kw1_block = Keyword('keyword1') + lparent
kw1_block = kw1_block + Keyword('option') + pyparsing_common.number.setResultsName('option')
kw1_block = Group(kw1_block + rparent).setResultsName('keyw1')
kw2_block = Keyword('keyword1list') + lparent
kw2_block = kw2_block+ OneOrMore(kw1_block) + rparent
kw2_block = Group(kw2_block).setResultsName('keyword1list', listAllMatches=True)
result = kw2_block.parseString(test_string)
print(result.dump())
tmp = kw2_block.runTests(test_string.replace('\n', '\\n'))
print tmp[0]
My current solution is, to put all keywords in a list and set up a dictionary to combine them all with the underscore and give them a flag.
#coding: utf8
from pyparsing import Keyword, Combine, pyparsing_common, Literal, Suppress, Group, OneOrMore, ZeroOrMore
test_string = r'''
keyword1list {
_keyword1 {
option 1
}
keyword1 {
option 2
}
_keyword1 {
option 3
}
keyword1 {
option 4
}
keyword1 {
option 5
}
}
'''
kwlist = ['keyword1', 'keyword1list', 'option']
keywords = {}
for k in kwlist:
keywords[k] = Keyword('_' + k).setResultsName('deactivated') | Keyword(
k).setResultsName('activated')
lparent = Suppress(Literal('{'))
rparent = Suppress(Literal('}'))
kw1_block = keywords['keyword1'] + lparent
kw1_block = kw1_block + keywords[
'option'] + pyparsing_common.number.setResultsName('option') + rparent
kw1_block = Group(kw1_block).setResultsName('keyword1', listAllMatches=True)
kw2_block = keywords['keyword1list'] + lparent
kw2_block = kw2_block + ZeroOrMore(kw1_block) + rparent
kw2_block = Group(kw2_block).setResultsName('keyword1list')
result = kw2_block.parseString(test_string)
print(result.dump())
tmp = kw2_block.runTests(test_string.replace('\n', '\\n'))
print tmp[0]
While this allows to parse everything properly I have to recreate the logic afterwards (finding the deactivated keywords and drop them from the result), which I like to avoid. I believe I need a parseAction
on the underscored keywords to drop those tokens somehow but I currently cannot figure out how to do this.
Any help is greatly appreciated.
回答1:
When I see a parser that is intended for filtering out selected blocks of text, my first approach is usually to write a parser that will match just the selected part, and then use transformString with a suppressed form of that parser:
kwlist = ['keyword1', 'keyword1list', 'option']
to_suppress = MatchFirst(Keyword('_' + k) for k in kwlist)
kw_body = nestedExpr("{", "}") | Word(nums)
filter = (to_suppress + kw_body).suppress()
print(filter.transformString(test_string))
Running this with your test string gives:
keyword1list {
keyword1 {
option 2
}
keyword1 {
option 4
}
keyword1 {
option 5
}
}
来源:https://stackoverflow.com/questions/60835663/how-to-treat-prefixed-keywords