How to improve speed with Stanford NLP Tagger and NLTK

前端 未结 2 1464
[愿得一人]
[愿得一人] 2020-12-05 05:35

Is there any way to use the Standford Tagger in a more performant fashion?

Each call to NLTK\'s wrapper starts a new java instance per analyzed string which is very

相关标签:
2条回答
  • 2020-12-05 06:12

    Found the solution. It is possible to run the POS Tagger in servlet mode and then connect to it via HTTP. Perfect.

    http://nlp.stanford.edu/software/pos-tagger-faq.shtml#d

    example

    start server in background

    nohup java -mx1000m -cp /var/stanford-postagger-full-2014-01-04/stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -model /var/stanford-postagger-full-2014-01-04/models/german-dewac.tagger -port 2020 >& /dev/null &
    

    adjust firewall to limit access to port 2020 from localhost only

    iptables -A INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
    iptables -A INPUT -p tcp --dport 2020 -j DROP
    

    test it with wget

    wget http://localhost:2020/?die welt ist schön
    

    shutdown server

    pkill -f stanford
    

    restore iptable settings

    iptables -D INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
    iptables -D INPUT -p tcp --dport 2020 -j DROP
    
    0 讨论(0)
  • 2020-12-05 06:23

    Using nltk.tag.stanford.POSTagger.tag_sents() for tagging multiple sentences.

    The tag_sents has replaced the old batch_tag function, see https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L61


    DEPRECATED:

    Tag the sentences using batch_tag instead of tag, see http://www.nltk.org/_modules/nltk/tag/stanford.html#StanfordTagger.batch_tag

    0 讨论(0)
提交回复
热议问题