Spacy custom sentence spliting

五迷三道 提交于 2021-01-28 17:50:54

问题


I using Spacy for custom sentence spliting and i need to parametrized the custom_delimeter/word for sentence spiting but i didnt find how to pass as an arugument here is the function,

# Manual or Custom Based
def mycustom_boundary(docx):
    for token in docx[:-1]:
        if token.text == '...':
            docx[token.i+1].is_sent_start = True
    return docx

# Adding the rule before parsing
nlp.add_pipe(mycustom_boundary,before='parser')

Please let me know how can i send as a argument custom based splitter as list to function?


回答1:


You could turn your component into a class that can be initialized with a list of delimiters? For example:

class MyCustomBoundary(object):
    def __init__(self, delimiters):
        self.delimiters = delimiters

    def __call__(self, doc):  # this is applied when you call it on a Doc
        for token in doc[:-1]:
            if token.text in self.delimiters:
                doc[token.i+1].is_sent_start = True
        return doc

You can then add it to your pipeline like this:

mycustom_boundary = MyCustomBoundary(delimiters=['...', '---'])
nlp.add_pipe(mycustom_boundary, before='parser')


来源:https://stackoverflow.com/questions/54529875/spacy-custom-sentence-spliting

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!