NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable

前端 未结 2 746
孤街浪徒
孤街浪徒 2021-02-19 20:37

I am working on a project that requires me to tag tokens using nltk and python. So I wanted to use this. But came up with a few problems. I went through a lot of other already a

相关标签:
2条回答
  • 2021-02-19 20:59

    I use Jupyter Notebook with Pycharm. I tried the Run Configuration in Pycharm to set env variable, but not work. So I use os.environ to set it in the code:

    import os
    
    os.environ["CLASSPATH"] = "/yourPath/stanford-parser-full-2018-10-17:yourPath/stanford-postagger-full-2018-10-16:yourPath/stanford-ner-2018-10-16"
    os.environ["STANFORD_MODELS"] = "yourPath/stanford-postagger-full-2018-10-16/models:yourPath/stanford-ner-2018-10-16/classifiers"
    
    stanford_tagger = StanfordPOSTagger('english-bidirectional-distsim.tagger')
    

    Hope it will help!

    0 讨论(0)
  • 2021-02-19 21:13

    Update

    The original answer was written for Stanford POS Tagger Version 3.6.0, Date 2015-12-09

    There is a new Version (3.7.0, released 2016-10-31). Here's the code for the newer version:

    from nltk.tag import StanfordPOSTagger
    from nltk import word_tokenize
    
    # Add the jar and model via their path (instead of setting environment variables):
    jar = 'your_path/stanford-postagger-full-2016-10-31/stanford-postagger.jar'
    model = 'your_path/stanford-postagger-full-2016-10-31/models/english-left3words-distsim.tagger'
    
    pos_tagger = StanfordPOSTagger(model, jar, encoding='utf8')
    
    text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
    print(text)
    

    Original answer

    I had the same problem (but using OS X and PyCharm), finally got it to work. Here's what I've pieced together from the StanfordPOSTagger Documentation and alvas' work on the issue (big thanks!):

    from nltk.internals import find_jars_within_path
    from nltk.tag import StanfordPOSTagger
    from nltk import word_tokenize
    
    # Alternatively to setting the CLASSPATH add the jar and model via their path:
    jar = '/Users/nischi/PycharmProjects/stanford-postagger-full-2015-12-09/stanford-postagger.jar'
    model = '/Users/nischi/PycharmProjects/stanford-postagger-full-2015-12-09/models/english-left3words-distsim.tagger'
    
    pos_tagger = StanfordPOSTagger(model, jar)
    
    # Add other jars from Stanford directory
    stanford_dir = pos_tagger._stanford_jar.rpartition('/')[0]
    stanford_jars = find_jars_within_path(stanford_dir)
    pos_tagger._stanford_jar = ':'.join(stanford_jars)
    
    text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
    print(text)
    

    Hope this helps.

    0 讨论(0)
提交回复
热议问题