how to use spacy lemmatizer to get a word into basic form

后端 未结 5 1312
青春惊慌失措
青春惊慌失措 2021-02-02 08:25

I am new to spacy and I want to use its lemmatizer function, but I don\'t know how to use it, like I into strings of word, which will return the string with the basic form the w

相关标签:
5条回答
  • 2021-02-02 08:44

    Code :

    import os
    from spacy.en import English, LOCAL_DATA_DIR
    
    data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR)
    
    nlp = English(data_dir=data_dir)
    
    doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others")
    
    for token in doc3:
        print token, token.lemma, token.lemma_
    

    Output :

    this 496 this
    is 488 be
    spacy 173779 spacy
    lemmatize 1510965 lemmatize
    testing 2900 testing
    . 419 .
    programming 3408 programming
    books 1011 book
    are 488 be
    more 529 more
    better 615 better
    than 555 than
    others 871 others
    

    Example Ref: here

    0 讨论(0)
  • 2021-02-02 08:45

    I used:

    import spacy
    
    nlp = en_core_web_sm.load()
    doc = nlp("did displaying words")
    print(" ".join([token.lemma_ for token in doc]))
    >>> do display word
    

    But it gived OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. error, I used:

    pip3 install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
    

    to get rid of error.

    0 讨论(0)
  • 2021-02-02 08:46

    If you want to use just the Lemmatizer, you can do that in the following way:

    from spacy.lemmatizer import Lemmatizer
    from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES
    
    lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)
    lemmas = lemmatizer(u'ducks', u'NOUN')
    print(lemmas)
    

    Output

    ['duck']
    

    Update

    Since spacy version 2.2, LEMMA_INDEX, LEMMA_EXC, and LEMMA_RULES have been bundled into a Lookups Object:

    import spacy
    nlp = spacy.load('en')
    
    nlp.vocab.lookups
    >>> <spacy.lookups.Lookups object at 0x7f89a59ea810>
    nlp.vocab.lookups.tables
    >>> ['lemma_lookup', 'lemma_rules', 'lemma_index', 'lemma_exc']
    

    You can still use the lemmatizer directly with a word and a POS (part of speech) tag:

    from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB
    
    lemmatizer = nlp.vocab.morphology.lemmatizer
    lemmatizer('ducks', NOUN)
    >>> ['duck']
    

    You can pass the POS tag as the imported constant like above or as string:

    lemmatizer('ducks', 'NOUN')
    >>> ['duck']
    

    from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB

    0 讨论(0)
  • 2021-02-02 08:52

    Previous answer is convoluted and can't be edited, so here's a more conventional one.

    # make sure your downloaded the english model with "python -m spacy download en"
    
    import spacy
    nlp = spacy.load('en')
    
    doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.")
    
    for token in doc:
        print(token, token.lemma, token.lemma_)
    

    Output:

    Apples 6617 apples
    and 512 and
    oranges 7024 orange
    are 536 be
    similar 1447 similar
    . 453 .
    Boots 4622 boot
    and 512 and
    hippos 98365 hippo
    are 536 be
    n't 538 not
    . 453 .
    

    From the official Lighting tour

    0 讨论(0)
  • 2021-02-02 08:55

    I use Spacy version 2.x

    import spacy
    nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])
    doc = nlp('did displaying words')
    print (" ".join([token.lemma_ for token in doc]))
    

    and the output :

    do display word
    

    Hope it helps :)

    0 讨论(0)
提交回复
热议问题