Get the K best parses of a sentence with Stanford Parser

前端 未结 2 658
滥情空心
滥情空心 2021-01-21 18:03

I want to have the K best parses of a sentence, I figured that this can be done with ExhaustivePCFGParser Class , the problem is that I don\'t know how to use this class , more

相关标签:
2条回答
  • 2021-01-21 18:42

    This is a work-around I implemented based on Christopher Manning's answer above, assuming you wish to use Python. The Python wrapper for CoreNLP does not have "K-best parse trees" implemented so the alternative is to use the terminal command

    java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 20 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt
    

    Do note that you need to have Stanford CoreNLP with all the JAR files downloaded into a directory, as well as the pre-requisite Python libraries installed (see the import statements)

    import os
    import subprocess
    import nltk
    from nltk.tree import ParentedTree
    
    ip_sent = "a quick brown fox jumps over the lazy dog."
    
    data_path = "<Your path>/stanford-corenlp-full-2018-10-05/data/testsent.txt" # Change the path of working directory to this data_path
    with open(data_path, "w") as file:
        file.write(ip_sent) # Write to the file specified; the text in this file is fed into the LexicalParser
    
    os.chdir("/home/user/Sidney/Vignesh's VQA/SpElementEx/extLib/stanford-corenlp-full-2018-10-05") # Change the working directory to the path where the JAR files are stored
    terminal_op = subprocess.check_output('java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 5 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt', shell = True) # Run the command via the terminal and capture the output in the form of bytecode
    op_string = terminal_op.decode('utf-8') # Convert to string object 
    parse_set = re.split("# Parse [0-9] with score -[0-9][0-9].[0-9]+\n", op_string) # Split the output based on the specified pattern 
    print(parse_set)
    
    # Print the parse trees in a pretty_print format
    for i in parse_set:
        parsetree = ParentedTree.fromstring(i)
        print(type(parsetree))
        parsetree.pretty_print()
    

    Hope this helps.

    0 讨论(0)
  • 2021-01-21 18:50

    In general you do things via a LexicalizedParser object which is a "grammar" which provides all these things (the grammars, lexicon, indices, etc.).

    From the command-line, the following will work:

    java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 20 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt
    

    At the API level, you need to get a LexicalizedParserQuery object. Once you have a LexicalizedParser lp (as in ParserDemo.java) you can do the following:

    LexicalizedParser lp = ... // Load / train a model
    LexicalizedParserQuery lpq = lp.parserQuery();
    lpq.parse(sentence);
    List<ScoredObject<Tree>> kBest = lpq.getKBestPCFGParses(20);
    

    A LexicalizedParserQuery is sort of equivalent to a java regex Matcher.

    Note: at present kBest parsing works well only for PCFG not factored grammars.

    0 讨论(0)
提交回复
热议问题