问题
Below specified rules are generated for each sentence. We have to group them for each sentence. The input is in file. Output also should be in file
sentenceid=2
NP--->N_NNP
NP--->N_NN_S_NU
NP--->N_NNP
NP--->N_NNP
NP--->N_NN_O_NU
VGF--->V_VM_VF
sentenceid=3
NP--->N_NN
VGNF--->V_VM_VNF
JJP--->JJ
NP--->N_NN_S_NU
NP--->N_NN
VGF--->V_VM_VF
sentenceid=4
NP--->N_NNP
NP--->N_NN_S_NU
NP--->N_NNP_O_M
VGF--->V_VM_VF
The above section containing input ,that is actually grammar for each sentence. I want to group adjacent rules sentence wise. Output should be like below.
sentenceid=2
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF--->V_VM_VF
sentenceid=3
NP--->N_NN
VGNF--->V_VM_VNF
JJP--->JJ
NP--->N_NN_S_NU N_NN
VGF--->V_VM_VF
senetnceid=4
NP--->N_NNP N_NN_S_NU N_NNP_O_M
VGF--->V_VM_VF
How can I implement this? I need almost 1000 sentences rules for probability calculation. This is the CFG grammar for each sentence, I want to group adjacent rules sentence-wise.
回答1:
How about this: considering sentence are in different files.
#!/usr/bin/python
import re
marker = '--->'
def parse_it(sen):
total_dic = dict()
marker_memory = ''
with open(sen, 'r') as fh:
mem = None
lo = list()
for line in fh.readlines():
if line.strip():
match = re.search('(sentenceid=\d+)', line)
if match:
if mem and lo:
total_dic[marker_memory].append(lo)
marker_memory = match.group(0)
total_dic[marker_memory] = []
else:
k,v = line.strip().split(marker)
k,v = [ x.strip() for x in [k,v]]
if not mem or mem == k:
lo.append((k,v))
mem = k
else:
total_dic[marker_memory].append(lo)
lo = [(k,v)]
mem = k
#total_dic[marker_memory].append(lo)
return total_dic
dic = parse_it('sentence')
for kin,lol in dic.iteritems():
print
print kin
for i in lol:
k,v = zip(*i)
print '%s%s %s' % (k[0],marker,' '.join(v))
Output:
sentenceid=3
VGF---> V_VM_VF
NP---> N_NN
VGNF---> V_VM_VNF
JJP---> JJ
NP---> N_NN_S_NU N_NN
VGF---> V_VM_VF
sentenceid=2
NP---> N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF---> V_VM_VF
sentenceid=4
VGF---> V_VM_VF
NP---> N_NNP N_NN_S_NU N_NNP_O_M
来源:https://stackoverflow.com/questions/21472527/grouping-of-cfg-grammar-rules-sentencewise