Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)
Note that this answer applies to NLTK v 3.0, and not to more recent versions.
I cannot leave this as a comment because of reputation, but since I spent (wasted?) some time solving this I would rather share my problem/solution to get this parser to work in NLTK.
In the excellent answer from alvas, it is mentioned that:
e.g. for the Parser, there won't be a model directory.
This led me wrongly to:
STANFORD_MODELS
(and only care about my CLASSPATH
)../path/tostanford-parser-full-2015-2012-09/models directory
* virtually empty* (or with a jar file whose name did not match nltk regex)!If the OP, like me, just wanted to use the parser, it may be confusing that when not downloading anything else (no POStagger, no NER,...) and following all these instructions, we still get an error.
Eventually, for any CLASSPATH
given (following examples and explanations in answers from this thread) I would still get the error:
NLTK was unable to find stanford-parser-(\d+)(.(\d+))+-models.jar! Set the CLASSPATH environment variable. For more information, on stanford-parser-(\d+)(.(\d+))+-models.jar,
see: http://nlp.stanford.edu/software/lex-parser.shtml
OR:
NLTK was unable to find stanford-parser.jar! Set the CLASSPATH environment variable. For more information, on stanford-parser.jar, see: http://nlp.stanford.edu/software/lex-parser.shtml
Though, importantly, I could correctly load and use the parser if I called the function with all arguments and path fully specified, as in:
stanford_parser_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser.jar'
stanford_model_jar = '../lib/stanford-parser-full-2015-04-20/stanfor-parser-3.5.2-models.jar'
parser = StanfordParser(path_to_jar=stanford_parser_jar,
path_to_models_jar=stanford_model_jar)
Therefore the error came from NLTK
and how it is looking for jars using the supplied STANFORD_MODELS
and CLASSPATH
environment variables. To solve this, the *-models.jar
, with the correct formatting (to match the regex in NLTK
code, so no -corenlp-....jar) must be located in the folder designated by STANFORD_MODELS
.
Namely, I first created:
mkdir stanford-parser-full-2015-12-09/models
Then added in .bashrc
:
export STANFORD_MODELS=/path/to/stanford-parser-full-2015-12-09/models
And finally, by copying stanford-parser-3.6.0-models.jar
(or corresponding version), into:
path/to/stanford-parser-full-2015-12-09/models/
I could get StanfordParser
to load smoothly in python with the classic CLASSPATH
that points to stanford-parser.jar
. Actually, as such, you can call StanfordParser
with no parameters, the default will just work.