Is there any way to use the Standford Tagger in a more performant fashion?
Each call to NLTK\'s wrapper starts a new java instance per analyzed string which is very
Found the solution. It is possible to run the POS Tagger in servlet mode and then connect to it via HTTP. Perfect.
http://nlp.stanford.edu/software/pos-tagger-faq.shtml#d
example
start server in background
nohup java -mx1000m -cp /var/stanford-postagger-full-2014-01-04/stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTaggerServer -model /var/stanford-postagger-full-2014-01-04/models/german-dewac.tagger -port 2020 >& /dev/null &
adjust firewall to limit access to port 2020 from localhost only
iptables -A INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -A INPUT -p tcp --dport 2020 -j DROP
test it with wget
wget http://localhost:2020/?die welt ist schön
shutdown server
pkill -f stanford
restore iptable settings
iptables -D INPUT -p tcp -s localhost --dport 2020 -j ACCEPT
iptables -D INPUT -p tcp --dport 2020 -j DROP
Using nltk.tag.stanford.POSTagger.tag_sents()
for tagging multiple sentences.
The tag_sents
has replaced the old batch_tag
function, see https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L61
DEPRECATED:
Tag the sentences using batch_tag
instead of tag
, see http://www.nltk.org/_modules/nltk/tag/stanford.html#StanfordTagger.batch_tag