How to find the shortest dependency path between two words in Python?

前端 未结 3 1894
无人共我
无人共我 2021-02-07 07:25

I try to find the dependency path between two words in Python given dependency tree.

For sentence

Robots in popular culture are there to remind u

3条回答
  •  逝去的感伤
    2021-02-07 07:55

    This answer relies on Stanford CoreNLP to obtain the dependency tree of a sentence. It borrows quite some code from HugoMailhot's answer when using networkx.

    Before running the code, one needs to:

    1. sudo pip install pycorenlp (python interface for Stanford CoreNLP)
    2. Download Stanford CoreNLP
    3. Start a Stanford CoreNLP server as follows:

      java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 50000
      

    Then one can run the following code to find the shortest dependency path between two words:

    import networkx as nx
    from pycorenlp import StanfordCoreNLP
    from pprint import pprint
    
    nlp = StanfordCoreNLP('http://localhost:{0}'.format(9000))
    def get_stanford_annotations(text, port=9000,
                                 annotators='tokenize,ssplit,pos,lemma,depparse,parse'):
        output = nlp.annotate(text, properties={
            "timeout": "10000",
            "ssplit.newlineIsSentenceBreak": "two",
            'annotators': annotators,
            'outputFormat': 'json'
        })
        return output
    
    # The code expects the document to contains exactly one sentence.
    document =  'Robots in popular culture are there to remind us of the awesomeness of'\
                'unbound human agency.'
    print('document: {0}'.format(document))
    
    # Parse the text
    annotations = get_stanford_annotations(document, port=9000,
                                           annotators='tokenize,ssplit,pos,lemma,depparse')
    tokens = annotations['sentences'][0]['tokens']
    
    # Load Stanford CoreNLP's dependency tree into a networkx graph
    edges = []
    dependencies = {}
    for edge in annotations['sentences'][0]['basic-dependencies']:
        edges.append((edge['governor'], edge['dependent']))
        dependencies[(min(edge['governor'], edge['dependent']),
                      max(edge['governor'], edge['dependent']))] = edge
    
    graph = nx.Graph(edges)
    #pprint(dependencies)
    #print('edges: {0}'.format(edges))
    
    # Find the shortest path
    token1 = 'Robots'
    token2 = 'awesomeness'
    for token in tokens:
        if token1 == token['originalText']:
            token1_index = token['index']
        if token2 == token['originalText']:
            token2_index = token['index']
    
    path = nx.shortest_path(graph, source=token1_index, target=token2_index)
    print('path: {0}'.format(path))
    
    for token_id in path:
        token = tokens[token_id-1]
        token_text = token['originalText']
        print('Node {0}\ttoken_text: {1}'.format(token_id,token_text))
    

    The output is:

    document: Robots in popular culture are there to remind us of the awesomeness of unbound human agency.
    path: [1, 5, 8, 12]
    Node 1  token_text: Robots
    Node 5  token_text: are
    Node 8  token_text: remind
    Node 12 token_text: awesomeness
    

    Note that Stanford CoreNLP can be tested online: http://nlp.stanford.edu:8080/parser/index.jsp

    This answer was tested with Stanford CoreNLP 3.6.0., pycorenlp 0.3.0 and python 3.5 x64 on Windows 7 SP1 x64 Ultimate.

提交回复
热议问题