Stanford Parser and NLTK

后端未结

关注

 18  2362

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)

Deprecated Answer

The answer below is deprecated, please use the solution on https://stackoverflow.com/a/51981566/610569 for NLTK v3.3 and above.

EDITED

Note: The following answer will only work on:

NLTK version >=3.2.4
Stanford Tools compiled since 2015-04-20
Python 2.7, 3.4 and 3.5 (Python 3.6 is not yet officially supported)

As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!

TL;DR

cd $HOME

# Update / Install NLTK
pip install -U nltk

# Download the Stanford NLP tools
wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip
# Extract the zip file.
unzip stanford-ner-2015-04-20.zip 
unzip stanford-parser-full-2015-04-20.zip 
unzip stanford-postagger-full-2015-04-20.zip


export STANFORDTOOLSDIR=$HOME

export CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/stanford-ner.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar

export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/classifiers

Then:

>>> from nltk.tag.stanford import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']), Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

>>> from nltk.parse.stanford import StanfordDependencyParser
>>> dep_parser=StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> print [parse.tree() for parse in dep_parser.raw_parse("The quick brown fox jumps over the lazy dog.")]
[Tree('jumps', [Tree('fox', ['The', 'quick', 'brown']), Tree('dog', ['over', 'the', 'lazy'])])]

In Long:

Firstly, one must note that the Stanford NLP tools are written in Java and NLTK is written in Python. The way NLTK is interfacing the tool is through the call the Java tool through the command line interface.

Secondly, the NLTK API to the Stanford NLP tools have changed quite a lot since the version 3.1. So it is advisable to update your NLTK package to v3.1.

Thirdly, the NLTK API to Stanford NLP Tools wraps around the individual NLP tools, e.g. Stanford POS tagger, Stanford NER Tagger, Stanford Parser.

For the POS and NER tagger, it DOES NOT wrap around the Stanford Core NLP package.

For the Stanford Parser, it's a special case where it wraps around both the Stanford Parser and the Stanford Core NLP (personally, I have not used the latter using NLTK, i would rather follow @dimazest's demonstration on http://www.eecs.qmul.ac.uk/~dm303/stanford-dependency-parser-nltk-and-anaconda.html )

Note that as of NLTK v3.1, the STANFORD_JAR and STANFORD_PARSER variables is deprecated and NO LONGER used

In Longer:

STEP 1

Assuming that you have installed Java appropriately on your OS.

Now, install/update your NLTK version (see http://www.nltk.org/install.html):

Using pip: sudo pip install -U nltk
Debian distro (using apt-get): sudo apt-get install python-nltk

For Windows (Use the 32-bit binary installation):

Install Python 3.4: http://www.python.org/downloads/ (avoid the 64-bit versions)
Install Numpy (optional): http://sourceforge.net/projects/numpy/files/NumPy/ (the version that specifies pythnon3.4)
Install NLTK: http://pypi.python.org/pypi/nltk
Test installation: Start>Python34, then type import nltk

(Why not 64 bit? See https://github.com/nltk/nltk/issues/1079)

Then out of paranoia, recheck your nltk version inside python:

from __future__ import print_function
import nltk
print(nltk.__version__)

Or on the command line:

python3 -c "import nltk; print(nltk.__version__)"

Make sure that you see 3.1 as the output.

For even more paranoia, check that all your favorite Stanford NLP tools API are available:

from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
from nltk.parse.stanford import StanfordNeuralDependencyParser
from nltk.tag.stanford import StanfordPOSTagger, StanfordNERTagger
from nltk.tokenize.stanford import StanfordTokenizer

(Note: The imports above will ONLY ensure that you are using a correct NLTK version that contains these APIs. Not seeing errors in the import doesn't mean that you have successfully configured the NLTK API to use the Stanford Tools)

STEP 2

Now that you have checked that you have the correct version of NLTK that contains the necessary Stanford NLP tools interface. You need to download and extract all the necessary Stanford NLP tools.

TL;DR, in Unix:

cd $HOME

# Download the Stanford NLP tools
wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip
# Extract the zip file.
unzip stanford-ner-2015-04-20.zip 
unzip stanford-parser-full-2015-04-20.zip 
unzip stanford-postagger-full-2015-04-20.zip

In Windows / Mac:

Download and unzip the parser from http://nlp.stanford.edu/software/lex-parser.shtml#Download
Download and unizp the FULL VERSION tagger from http://nlp.stanford.edu/software/tagger.shtml#Download
Download and unizp the NER tagger from http://nlp.stanford.edu/software/CRF-NER.shtml#Download

STEP 3

Setup the environment variables such that NLTK can find the relevant file path automatically. You have to set the following variables:

Add the appropriate Stanford NLP .jar file to the CLASSPATH environment variable.
- e.g. for the NER, it will be stanford-ner-2015-04-20/stanford-ner.jar
- e.g. for the POS, it will be stanford-postagger-full-2015-04-20/stanford-postagger.jar
- e.g. for the parser, it will be stanford-parser-full-2015-04-20/stanford-parser.jar and the parser model jar file, stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar
Add the appropriate model directory to the STANFORD_MODELS variable (i.e. the directory where you can find where the pre-trained models are saved)
- e.g. for the NER, it will be in stanford-ner-2015-04-20/classifiers/
- e.g. for the POS, it will be in stanford-postagger-full-2015-04-20/models/
- e.g. for the Parser, there won't be a model directory.

In the code, see that it searches for the STANFORD_MODELS directory before appending the model name. Also see that, the API also automatically tries to search the OS environments for the `CLASSPATH)

Note that as of NLTK v3.1, the STANFORD_JAR variables is deprecated and NO LONGER used. Code snippets found in the following Stackoverflow questions might not work:

Stanford Dependency Parser Setup and NLTK
nltk interface to stanford parser
trouble importing stanford pos tagger into nltk
Stanford Entity Recognizer (caseless) in Python Nltk
How to improve speed with Stanford NLP Tagger and NLTK
How can I get the stanford NLTK python module?
Stanford Parser and NLTK windows
Stanford Named Entity Recognizer (NER) functionality with NLTK
Stanford parser with NLTK produces empty output
Extract list of Persons and Organizations using Stanford NER Tagger in NLTK
Error using Stanford POS Tagger in NLTK Python

TL;DR for STEP 3 on Ubuntu

export STANFORDTOOLSDIR=/home/path/to/stanford/tools/

export CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/stanford-ner.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser.jar:$STANFORDTOOLSDIR/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar

export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models:$STANFORDTOOLSDIR/stanford-ner-2015-04-20/classifiers

(For Windows: See https://stackoverflow.com/a/17176423/610569 for instructions for setting environment variables)

You MUST set the variables as above before starting python, then:

>>> from nltk.tag.stanford import StanfordPOSTagger
>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger')
>>> st.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> from nltk.parse.stanford import StanfordParser
>>> parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
>>> list(parser.raw_parse("the quick brown fox jumps over the lazy dog"))
[Tree('ROOT', [Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['quick']), Tree('JJ', ['brown']), Tree('NN', ['fox'])]), Tree('NP', [Tree('NP', [Tree('NNS', ['jumps'])]), Tree('PP', [Tree('IN', ['over']), Tree('NP', [Tree('DT', ['the']), Tree('JJ', ['lazy']), Tree('NN', ['dog'])])])])])])]

Alternatively, you could try add the environment variables inside python, as the previous answers have suggested but you can also directly tell the parser/tagger to initialize to the direct path where you kept the .jar file and your models.

There is NO need to set the environment variables if you use the following method BUT when the API changes its parameter names, you will need to change accordingly. That is why it is MORE advisable to set the environment variables than to modify your python code to suit the NLTK version.

For example (without setting any environment variables):

# POS tagging:

from nltk.tag import StanfordPOSTagger

stanford_pos_dir = '/home/alvas/stanford-postagger-full-2015-04-20/'
eng_model_filename= stanford_pos_dir + 'models/english-left3words-distsim.tagger'
my_path_to_jar= stanford_pos_dir + 'stanford-postagger.jar'

st = StanfordPOSTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('What is the airspeed of an unladen swallow ?'.split())


# NER Tagging:
from nltk.tag import StanfordNERTagger

stanford_ner_dir = '/home/alvas/stanford-ner/'
eng_model_filename= stanford_ner_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'

st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

# Parsing:
from nltk.parse.stanford import StanfordParser

stanford_parser_dir = '/home/alvas/stanford-parser/'
eng_model_path = stanford_parser_dir  + "edu/stanford/nlp/models/lexparser/englishRNN.ser.gz"
my_path_to_models_jar = stanford_parser_dir  + "stanford-parser-3.5.2-models.jar"
my_path_to_jar = stanford_parser_dir  + "stanford-parser.jar"

parser=StanfordParser(model_path=eng_model_path, path_to_models_jar=my_path_to_models_jar, path_to_jar=my_path_to_jar)

0 讨论(0)

庸人自扰

2020-11-22 02:21
Note that this answer applies to NLTK v 3.0, and not to more recent versions.

Here is the windows version of alvas's answer
```
sentences = ('. '.join(['this is sentence one without a period','this is another foo bar sentence '])+'.').encode('ascii',errors = 'ignore')
catpath =r"YOUR CURRENT FILE PATH"

f = open('stanfordtemp.txt','w')
f.write(sentences)
f.close()

parse_out = os.popen(catpath+r"\nlp_tools\stanford-parser-2010-08-20\lexparser.bat "+catpath+r"\stanfordtemp.txt").readlines()

bracketed_parse = " ".join( [i.strip() for i in parse_out if i.strip() if i.strip()[0] == "("] )
bracketed_parse = "\n(ROOT".join(bracketed_parse.split(" (ROOT")).split('\n')
aa = map(lambda x :ParentedTree.fromstring(x),bracketed_parse)
```
NOTES:
- In lexparser.bat you need to change all the paths into absolute path to avoid java errors such as "class not found"
- I strongly recommend you to apply this method under windows since I Tried several answers on the page and all the methods communicates python with Java fails.
- wish to hear from you if you succeed on windows and wish you can tell me how you overcome all these problems.
- search python wrapper for stanford coreNLP to get the python version
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-11-22 02:23
Note that this answer applies to NLTK v 3.0, and not to more recent versions.

A slight update (or simply alternative) on danger89's comprehensive answer on using Stanford Parser in NLTK and Python

With stanford-parser-full-2015-04-20, JRE 1.8 and nltk 3.0.4 (python 2.7.6), it appears that you no longer need to extract the englishPCFG.ser.gz from stanford-parser-x.x.x-models.jar or setting up any os.environ
```
from nltk.parse.stanford import StanfordParser

english_parser = StanfordParser('path/stanford-parser.jar', 'path/stanford-parser-3.5.2-models.jar')

s = "The real voyage of discovery consists not in seeking new landscapes, but in having new eyes."

sentences = english_parser.raw_parse_sents((s,))
print sentences #only print <listiterator object> for this version

#draw the tree
for line in sentences:
    for sentence in line:
        sentence.draw()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-11-22 02:25
I am on a windows machine and you can simply run the parser normally as you do from the command like but as in a different directory so you don't need to edit the lexparser.bat file. Just put in the full path.
```
cmd = r'java -cp \Documents\stanford_nlp\stanford-parser-full-2015-01-30 edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "typedDependencies" \Documents\stanford_nlp\stanford-parser-full-2015-01-30\stanford-parser-3.5.1-models\edu\stanford\nlp\models\lexparser\englishFactored.ser.gz stanfordtemp.txt'
parse_out = os.popen(cmd).readlines()
```
The tricky part for me was realizing how to run a java program from a different path. There must be a better way but this works.
0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2020-11-22 02:26

There is python interface for stanford parser

http://projects.csail.mit.edu/spatial/Stanford_Parser

0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-11-22 02:27
Note that this answer applies to NLTK v 3.0, and not to more recent versions.

Since nobody really mentioned and it's somehow troubled me a lot, here is an alternative way to use Stanford parser in python:
```
stanford_parser_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser.jar'
stanford_model_jar = '../lib/stanford-parser-full-2015-04-20/stanford-parser-3.5.2-models.jar'    
parser = StanfordParser(path_to_jar=stanford_parser_jar, 
                        path_to_models_jar=stanford_model_jar)
```
in this way, you don't need to worry about the path thing anymore.

For those who cannot use it properly on Ubuntu or run the code in Eclipse.
0 讨论(0)
发布评论:

提交评论
- 加载中...