how to get parse tree using python nltk?

后端 未结 2 1511
长情又很酷
长情又很酷 2021-01-31 12:58

Given the following sentence:

The old oak tree from India fell down.

How can I get the following parse tree representation of the sentence usin

相关标签:
2条回答
  • 2021-01-31 13:09

    Older question, but you can use nltk together with the bllipparser. Here is a longer example from nltk. After some fiddling I myself used the following:

    To install (with nltk already installed):

    sudo python3 -m nltk.downloader bllip_wsj_no_aux
    pip3 install bllipparser
    

    To use:

    from nltk.data import find
    from bllipparser import RerankingParser
    
    model_dir = find('models/bllip_wsj_no_aux').path
    parser = RerankingParser.from_unified_model_dir(model_dir)
    
    best = parser.parse("The old oak tree from India fell down.")
    
    print(best.get_reranker_best())
    print(best.get_parser_best())
    

    Output:

    -80.435259246021 -23.831876011253 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down))) (. .)))
    -79.703612178593 -24.505514522222 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (ADVP (RB down))) (. .)))
    
    0 讨论(0)
  • 2021-01-31 13:29

    Here is alternative solution using StanfordCoreNLP instead of nltk. There are few library that build on top of StanfordCoreNLP, I personally use pycorenlp to parse the sentence.

    First you have to download stanford-corenlp-full folder where you have *.jar file inside. And run the server inside the folder (default port is 9000).

    export CLASSPATH="`find . -name '*.jar'`"
    java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port?] # run server
    

    Then in Python, you can run the following in order to tag the sentence.

    from pycorenlp import StanfordCoreNLP
    nlp = StanfordCoreNLP('http://localhost:9000')
    
    text = "The old oak tree from India fell down."
    
    output = nlp.annotate(text, properties={
      'annotators': 'parse',
      'outputFormat': 'json'
    })
    
    print(output['sentences'][0]['parse']) # tagged output sentence
    
    0 讨论(0)
提交回复
热议问题