POS-Tagger is incredibly slow

前端 未结 3 1971
忘掉有多难
忘掉有多难 2020-12-10 16:21

I am using nltk to generate n-grams from sentences by first removing given stop words. However, nltk.pos_tag() is extremely slow taking up to 0.6 s

3条回答
  •  囚心锁ツ
    2020-12-10 16:58

    nltk pos_tag is defined as:
    from nltk.tag.perceptron import PerceptronTagger
    def pos_tag(tokens, tagset=None):
        tagger = PerceptronTagger()
        return _pos_tag(tokens, tagset, tagger)
    

    so each call to pos_tag instantiates the perceptrontagger module which takes much of the computation time.You can save this time by directly calling tagger.tag yourself as:

    from nltk.tag.perceptron import PerceptronTagger
    tagger=PerceptronTagger()
    sentence_pos = tagger.tag(word_tokenize(sentence))
    

提交回复
热议问题