Stanford Parser and NLTK

后端 未结 18 2333
既然无缘
既然无缘 2020-11-22 01:32

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)

18条回答
  •  伪装坚强ぢ
    2020-11-22 02:30

    Note that this answer applies to NLTK v 3.0, and not to more recent versions.

    I cannot leave this as a comment because of reputation, but since I spent (wasted?) some time solving this I would rather share my problem/solution to get this parser to work in NLTK.

    In the excellent answer from alvas, it is mentioned that:

    e.g. for the Parser, there won't be a model directory.

    This led me wrongly to:

    • not be careful to the value I put to STANFORD_MODELS (and only care about my CLASSPATH)
    • leave ../path/tostanford-parser-full-2015-2012-09/models directory * virtually empty* (or with a jar file whose name did not match nltk regex)!

    If the OP, like me, just wanted to use the parser, it may be confusing that when not downloading anything else (no POStagger, no NER,...) and following all these instructions, we still get an error.

    Eventually, for any CLASSPATH given (following examples and explanations in answers from this thread) I would still get the error:

    NLTK was unable to find stanford-parser-(\d+)(.(\d+))+-models.jar! Set the CLASSPATH environment variable. For more information, on stanford-parser-(\d+)(.(\d+))+-models.jar,

    see: http://nlp.stanford.edu/software/lex-parser.shtml

    OR:

    NLTK was unable to find stanford-parser.jar! Set the CLASSPATH environment variable. For more information, on stanford-parser.jar, see: http://nlp.stanford.edu/software/lex-parser.shtml

    Though, importantly, I could correctly load and use the parser if I called the function with all arguments and path fully specified, as in:

    stanford_parser_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser.jar'
    stanford_model_jar = '../lib/stanford-parser-full-2015-04-20/stanfor-parser-3.5.2-models.jar'    
    parser = StanfordParser(path_to_jar=stanford_parser_jar, 
                        path_to_models_jar=stanford_model_jar)
    

    Solution for Parser alone:

    Therefore the error came from NLTK and how it is looking for jars using the supplied STANFORD_MODELS and CLASSPATH environment variables. To solve this, the *-models.jar, with the correct formatting (to match the regex in NLTK code, so no -corenlp-....jar) must be located in the folder designated by STANFORD_MODELS.

    Namely, I first created:

    mkdir stanford-parser-full-2015-12-09/models
    

    Then added in .bashrc:

    export STANFORD_MODELS=/path/to/stanford-parser-full-2015-12-09/models
    

    And finally, by copying stanford-parser-3.6.0-models.jar (or corresponding version), into:

    path/to/stanford-parser-full-2015-12-09/models/
    

    I could get StanfordParser to load smoothly in python with the classic CLASSPATH that points to stanford-parser.jar. Actually, as such, you can call StanfordParser with no parameters, the default will just work.

提交回复
热议问题