Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)
The answer below is deprecated, please use the solution on https://stackoverflow.com/a/51981566/610569 for NLTK v3.3 and above.
Note: The following answer will only work on:
As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix.
Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!
The follow code comes from https://github.com/nltk/nltk/pull/1735#issuecomment-306091826
In terminal:
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000
In Python:
>>> from nltk.tag.stanford import CoreNLPPOSTagger, CoreNLPNERTagger
>>> from nltk.parse.corenlp import CoreNLPParser
>>> stpos, stner = CoreNLPPOSTagger(), CoreNLPNERTagger()
>>> stpos.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]
>>> stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> next(
... parser.raw_parse('The quick brown fox jumps over the lazy dog.')
... ).pretty_print() # doctest: +NORMALIZE_WHITESPACE
ROOT
|
S
_______________|__________________________
| VP |
| _________|___ |
| | PP |
| | ________|___ |
NP | | NP |
____|__________ | | _______|____ |
DT JJ JJ NN VBZ IN DT JJ NN .
| | | | | | | | | |
The quick brown fox jumps over the lazy dog .
>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents(
... [
... 'The quick brown fox jumps over the lazy dog.',
... 'The quick grey wolf jumps over the lazy fox.',
... ]
... )
>>> parse_fox.pretty_print() # doctest: +NORMALIZE_WHITESPACE
ROOT
|
S
_______________|__________________________
| VP |
| _________|___ |
| | PP |
| | ________|___ |
NP | | NP |
____|__________ | | _______|____ |
DT JJ JJ NN VBZ IN DT JJ NN .
| | | | | | | | | |
The quick brown fox jumps over the lazy dog .
>>> parse_wolf.pretty_print() # doctest: +NORMALIZE_WHITESPACE
ROOT
|
S
_______________|__________________________
| VP |
| _________|___ |
| | PP |
| | ________|___ |
NP | | NP |
____|_________ | | _______|____ |
DT JJ JJ NN VBZ IN DT JJ NN .
| | | | | | | | | |
The quick grey wolf jumps over the lazy fox .
>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents(
... [
... "I 'm a dog".split(),
... "This is my friends ' cat ( the tabby )".split(),
... ]
... )
>>> parse_dog.pretty_print() # doctest: +NORMALIZE_WHITESPACE
ROOT
|
S
_______|____
| VP
| ________|___
NP | NP
| | ___|___
PRP VBP DT NN
| | | |
I 'm a dog
Please take a look at http://www.nltk.org/_modules/nltk/parse/corenlp.html for more information on of the Stanford API. Take a look at the docstrings!
The answer below is deprecated, please use the solution on https://stackoverflow.com/a/51981566/610569 for NLTK v3.3 and above.
As of the current Stanford parser (2015-04-20), the default output for the lexparser.sh
has changed so the script below will not work.
But this answer is kept for legacy sake, it will still work with http://nlp.stanford.edu/software/stanford-parser-2012-11-12.zip though.
I suggest you don't mess with Jython, JPype. Let python do python stuff and let java do java stuff, get the Stanford Parser output through the console.
After you've installed the Stanford Parser in your home directory ~/
, just use this python recipe to get the flat bracketed parse:
import os
sentence = "this is a foo bar i want to parse."
os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh ~/stanfordtemp.txt").readlines()
bracketed_parse = " ".join( [i.strip() for i in parser_out if i.strip()[0] == "("] )
print bracketed_parse
If I remember well, the Stanford parser is a java library, therefore you must have a Java interpreter running on your server/computer.
I used it once a server, combined with a php script. The script used php's exec() function to make a command-line call to the parser like so:
<?php
exec( "java -cp /pathTo/stanford-parser.jar -mx100m edu.stanford.nlp.process.DocumentPreprocessor /pathTo/fileToParse > /pathTo/resultFile 2>/dev/null" );
?>
I don't remember all the details of this command, it basically opened the fileToParse, parsed it, and wrote the output in the resultFile. PHP would then open the result file for further use.
The end of the command directs the parser's verbose to NULL, to prevent unnecessary command line information from disturbing the script.
I don't know much about Python, but there might be a way to make command line calls.
It might not be the exact route you were hoping for, but hopefully it'll give you some inspiration. Best of luck.
Note that this answer applies to NLTK v 3.0, and not to more recent versions.
Sure, try the following in Python:
import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/path/to/standford/jars'
os.environ['STANFORD_MODELS'] = '/path/to/standford/jars'
parser = stanford.StanfordParser(model_path="/location/of/the/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences
# GUI
for line in sentences:
for sentence in line:
sentence.draw()
Output:
[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]
Note 1: In this example both the parser & model jars are in the same folder.
Note 2:
Note 3: The englishPCFG.ser.gz file can be found inside the models.jar file (/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz). Please use come archive manager to 'unzip' the models.jar file.
Note 4: Be sure you are using Java JRE (Runtime Environment) 1.8 also known as Oracle JDK 8. Otherwise you will get: Unsupported major.minor version 52.0.
Download NLTK v3 from: https://github.com/nltk/nltk. And install NLTK:
sudo python setup.py install
You can use the NLTK downloader to get Stanford Parser, using Python:
import nltk
nltk.download()
Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)
OR:
Download and install NLTK v3, same as above.
Download the latest version from (current version filename is stanford-parser-full-2015-01-29.zip): http://nlp.stanford.edu/software/lex-parser.shtml#Download
Extract the standford-parser-full-20xx-xx-xx.zip.
Create a new folder ('jars' in my example). Place the extracted files into this jar folder: stanford-parser-3.x.x-models.jar and stanford-parser.jar.
As shown above you can use the environment variables (STANFORD_PARSER & STANFORD_MODELS) to point to this 'jars' folder. I'm using Linux, so if you use Windows please use something like: C://folder//jars.
Open the stanford-parser-3.x.x-models.jar using an Archive manager (7zip).
Browse inside the jar file; edu/stanford/nlp/models/lexparser. Again, extract the file called 'englishPCFG.ser.gz'. Remember the location where you extract this ser.gz file.
When creating a StanfordParser instance, you can provide the model path as parameter. This is the complete path to the model, in our case /location/of/englishPCFG.ser.gz.
Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)
Note that this answer applies to NLTK v 3.0, and not to more recent versions.
Here is an adaptation of danger98's code that works with nltk3.0.0 on windoze, and presumably the other platforms as well, adjust directory names as appropriate for your setup:
import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = 'd:/stanford-parser'
os.environ['STANFORD_MODELS'] = 'd:/stanford-parser'
os.environ['JAVAHOME'] = 'c:/Program Files/java/jre7/bin'
parser = stanford.StanfordParser(model_path="d:/stanford-grammars/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences
Note that the parsing command has changed (see the source code at www.nltk.org/_modules/nltk/parse/stanford.html), and that you need to define the JAVAHOME variable. I tried to get it to read the grammar file in situ in the jar, but have so far failed to do that.
I am using nltk version 3.2.4. And following code worked for me.
from nltk.internals import find_jars_within_path
from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize
# Alternatively to setting the CLASSPATH add the jar and model via their
path:
jar = '/home/ubuntu/stanford-postagger-full-2017-06-09/stanford-postagger.jar'
model = '/home/ubuntu/stanford-postagger-full-2017-06-09/models/english-left3words-distsim.tagger'
pos_tagger = StanfordPOSTagger(model, jar)
# Add other jars from Stanford directory
stanford_dir = pos_tagger._stanford_jar.rpartition('/')[0]
stanford_jars = find_jars_within_path(stanford_dir)
pos_tagger._stanford_jar = ':'.join(stanford_jars)
text = pos_tagger.tag(word_tokenize("Open app and play movie"))
print(text)
Output:
[('Open', 'VB'), ('app', 'NN'), ('and', 'CC'), ('play', 'VB'), ('movie', 'NN')]