问题
I using Spacy for custom sentence spliting and i need to parametrized the custom_delimeter/word for sentence spiting but i didnt find how to pass as an arugument here is the function,
# Manual or Custom Based
def mycustom_boundary(docx):
for token in docx[:-1]:
if token.text == '...':
docx[token.i+1].is_sent_start = True
return docx
# Adding the rule before parsing
nlp.add_pipe(mycustom_boundary,before='parser')
Please let me know how can i send as a argument custom based splitter as list to function?
回答1:
You could turn your component into a class that can be initialized with a list of delimiters? For example:
class MyCustomBoundary(object):
def __init__(self, delimiters):
self.delimiters = delimiters
def __call__(self, doc): # this is applied when you call it on a Doc
for token in doc[:-1]:
if token.text in self.delimiters:
doc[token.i+1].is_sent_start = True
return doc
You can then add it to your pipeline like this:
mycustom_boundary = MyCustomBoundary(delimiters=['...', '---'])
nlp.add_pipe(mycustom_boundary, before='parser')
来源:https://stackoverflow.com/questions/54529875/spacy-custom-sentence-spliting