I am working on a project that requires me to tag tokens using nltk and python. So I wanted to use this. But came up with a few problems. I went through a lot of other already a
The original answer was written for Stanford POS Tagger Version 3.6.0, Date 2015-12-09
There is a new Version (3.7.0, released 2016-10-31). Here's the code for the newer version:
from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize
# Add the jar and model via their path (instead of setting environment variables):
jar = 'your_path/stanford-postagger-full-2016-10-31/stanford-postagger.jar'
model = 'your_path/stanford-postagger-full-2016-10-31/models/english-left3words-distsim.tagger'
pos_tagger = StanfordPOSTagger(model, jar, encoding='utf8')
text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)
I had the same problem (but using OS X and PyCharm), finally got it to work. Here's what I've pieced together from the StanfordPOSTagger Documentation and alvas' work on the issue (big thanks!):
from nltk.internals import find_jars_within_path
from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize
# Alternatively to setting the CLASSPATH add the jar and model via their path:
jar = '/Users/nischi/PycharmProjects/stanford-postagger-full-2015-12-09/stanford-postagger.jar'
model = '/Users/nischi/PycharmProjects/stanford-postagger-full-2015-12-09/models/english-left3words-distsim.tagger'
pos_tagger = StanfordPOSTagger(model, jar)
# Add other jars from Stanford directory
stanford_dir = pos_tagger._stanford_jar.rpartition('/')[0]
stanford_jars = find_jars_within_path(stanford_dir)
pos_tagger._stanford_jar = ':'.join(stanford_jars)
text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)
Hope this helps.