Stanford Parser and NLTK

后端未结

关注

 18  2358

既然无缘 2020-11-22 01:32

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)

18条回答

难免孤独 (楼主)

2020-11-22 02:12

Deprecated Answer

The answer below is deprecated, please use the solution on https://stackoverflow.com/a/51981566/610569 for NLTK v3.3 and above.

EDITED

Note: The following answer will only work on:

NLTK version ==3.2.5
Stanford Tools compiled since 2016-10-31
Python 2.7, 3.5 and 3.6

As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!

TL;DR

The follow code comes from https://github.com/nltk/nltk/pull/1735#issuecomment-306091826

In terminal:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

In Python:

>>> from nltk.tag.stanford import CoreNLPPOSTagger, CoreNLPNERTagger
>>> from nltk.parse.corenlp import CoreNLPParser

>>> stpos, stner = CoreNLPPOSTagger(), CoreNLPNERTagger()

>>> stpos.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]


>>> parser = CoreNLPParser(url='http://localhost:9000')

>>> next(
...     parser.raw_parse('The quick brown fox jumps over the lazy dog.')
... ).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )

>>> parse_fox.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> parse_wolf.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|_________      |    |     _______|____    |
 DT   JJ   JJ   NN   VBZ   IN   DT      JJ   NN  .
 |    |    |    |     |    |    |       |    |   |
The quick grey wolf jumps over the     lazy fox  .

>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )

>>> parse_dog.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
        ROOT
         |
         S
  _______|____
 |            VP
 |    ________|___
 NP  |            NP
 |   |         ___|___
PRP VBP       DT      NN
 |   |        |       |
 I   'm       a      dog

Please take a look at http://www.nltk.org/_modules/nltk/parse/corenlp.html for more information on of the Stanford API. Take a look at the docstrings!

0 讨论(0)

查看其它18个回答