Remove a word in a span from SpaCy?

后端 未结 2 1676
情深已故
情深已故 2020-12-31 14:26

I am parsing a sentence with Spacy like following:

import spacy
nlp = spacy.load(\"en\")
span = nlp(\"This is some text.\")

I am wondering

相关标签:
2条回答
  • 2020-12-31 14:37

    The other answer requires you to lose POS information.

    def remove_i_element_from_span(span, index):
      nlp_list = list(span)
      del nlp_list[index]
      return nlp(" ".join([e.text for e in nlp_list]))
    
    0 讨论(0)
  • 2020-12-31 14:39

    There is a workaround for that.

    The idea is that you create a numpy array from the doc, you delete the entry you don't want and then you create a doc from the new numpy array.

    import spacy
    from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
    from spacy.tokens import Doc
    import numpy
    
    def remove_span(doc, index):
        np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
        np_array_2 = numpy.delete(np_array, (index), axis = 0)
        doc2 = Doc(doc.vocab, words=[t.text for i, t in enumerate(doc) if i!=index])
        doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array_2)
        return doc2
    
    # load english model
    nlp = spacy.load('en')
    doc = nlp("This is some text")
    new_doc = remove_span(doc, 3)
    print(new_doc)
    

    Hope it helps!

    0 讨论(0)
提交回复
热议问题