POS-Tagger is incredibly slow

前端 未结 3 1972
忘掉有多难
忘掉有多难 2020-12-10 16:21

I am using nltk to generate n-grams from sentences by first removing given stop words. However, nltk.pos_tag() is extremely slow taking up to 0.6 s

相关标签:
3条回答
  • 2020-12-10 16:58
    nltk pos_tag is defined as:
    from nltk.tag.perceptron import PerceptronTagger
    def pos_tag(tokens, tagset=None):
        tagger = PerceptronTagger()
        return _pos_tag(tokens, tagset, tagger)
    

    so each call to pos_tag instantiates the perceptrontagger module which takes much of the computation time.You can save this time by directly calling tagger.tag yourself as:

    from nltk.tag.perceptron import PerceptronTagger
    tagger=PerceptronTagger()
    sentence_pos = tagger.tag(word_tokenize(sentence))
    
    0 讨论(0)
  • 2020-12-10 16:59

    Use pos_tag_sents for tagging multiple sentences:

    >>> import time
    >>> from nltk.corpus import brown
    >>> from nltk import pos_tag
    >>> from nltk import pos_tag_sents
    >>> sents = brown.sents()[:10]
    >>> start = time.time(); pos_tag(sents[0]); print time.time() - start
    0.934092998505
    >>> start = time.time(); [pos_tag(s) for s in sents]; print time.time() - start
    9.5061340332
    >>> start = time.time(); pos_tag_sents(sents); print time.time() - start 
    0.939551115036
    
    0 讨论(0)
  • 2020-12-10 17:05

    If you are looking for another POS tagger with fast performances in Python, you might want to try RDRPOSTagger. For example, on English POS tagging, the tagging speed is 8K words/second for a single threaded implementation in Python, using a computer of Core 2Duo 2.4GHz. You can get faster tagging speed by simply using the multi-threaded mode. RDRPOSTagger obtains very competitive accuracies in comparison to state-of-the-art taggers and now supports pre-trained models for 40 languages. See experimental results in this paper.

    0 讨论(0)
提交回复
热议问题