English grammar for parsing in NLTK

后端未结

关注

 8  834

Is there a ready-to-use English grammar that I can just load it and use in NLTK? I\'ve searched around examples of parsing with NLTK, but it seems like that I have to manual

相关标签:

8条回答

感动是毒

2020-11-29 20:48

Did you try POS tagging in NLTK?

text = word_tokenize("And now for something completely different")
nltk.pos_tag(text)

The answer is something like this

[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),('completely', 'RB'), ('different', 'JJ')]

Got this example from here NLTK_chapter03

0 讨论(0)

再見小時候

2020-11-29 20:50

I've tried NLTK, PyStatParser, Pattern. IMHO Pattern is best English parser introduced in above article. Because it supports pip install and There is a fancy document on the website (http://www.clips.ua.ac.be/pages/pattern-en). I couldn't find reasonable document for NLTK (And it gave me inaccurate result for me by its default. And I couldn't find how to tune it). pyStatParser is much slower than described above in my Environment. (About one minute for initialization and It took couple of seconds to parse long sentences. Maybe I didn't use it correctly).

0 讨论(0)
发布评论:

提交评论
- 加载中...

独厮守ぢ

2020-11-29 20:50

I'm found out that nltk working good with parser grammar developed by Stanford.

Syntax Parsing with Stanford CoreNLP and NLTK

It is very easy to start to use Stanford CoreNLP and NLTK. All you need is small preparation, after that you can parse sentences with following code:

from nltk.parse.corenlp import CoreNLPParser
parser = CoreNLPParser()
parse = next(parser.raw_parse("I put the book in the box on the table."))

Preparation:

Download Java Stanford model
Run CoreNLPServer

You can use following code to run CoreNLPServer:

import os
from nltk.parse.corenlp import CoreNLPServer
# The server needs to know the location of the following files:
#   - stanford-corenlp-X.X.X.jar
#   - stanford-corenlp-X.X.X-models.jar
STANFORD = os.path.join("models", "stanford-corenlp-full-2018-02-27")
# Create the server
server = CoreNLPServer(
   os.path.join(STANFORD, "stanford-corenlp-3.9.1.jar"),
   os.path.join(STANFORD, "stanford-corenlp-3.9.1-models.jar"),    
)
# Start the server in the background
server.start()

Do not forget stop server with executing server.stop()

0 讨论(0)

孤城傲影

2020-11-29 20:51

There are a few grammars in the nltk_data distribution. In your Python interpreter, issue nltk.download().

0 讨论(0)
发布评论:

提交评论
- 加载中...

我在风中等你

2020-11-29 21:00

There is a Library called Pattern. It is quite fast and easy to use.

>>> from pattern.en import parse
>>>  
>>> s = 'The mobile web is more important than mobile apps.'
>>> s = parse(s, relations=True, lemmata=True)
>>> print s

'The/DT/B-NP/O/NP-SBJ-1/the mobile/JJ/I-NP/O/NP-SBJ-1/mobile' ...

0 讨论(0)

离开以前

2020-11-29 21:03

You can take a look at pyStatParser, a simple python statistical parser that returns NLTK parse Trees. It comes with public treebanks and it generates the grammar model only the first time you instantiate a Parser object (in about 8 seconds). It uses a CKY algorithm and it parses average length sentences (like the one below) in under a second.

>>> from stat_parser import Parser
>>> parser = Parser()
>>> print parser.parse("How can the net amount of entropy of the universe be massively decreased?")
(SBARQ
  (WHADVP (WRB how))
  (SQ
    (MD can)
    (NP
      (NP (DT the) (JJ net) (NN amount))
      (PP
        (IN of)
        (NP
          (NP (NNS entropy))
          (PP (IN of) (NP (DT the) (NN universe))))))
    (VP (VB be) (ADJP (RB massively) (VBN decreased))))
  (. ?))

0 讨论(0)

1 2 下一页