Visualize Parse Tree Structure

前端 未结 1 2024
别那么骄傲
别那么骄傲 2020-12-31 20:48

I would like to display the parsing (POS tagging) from openNLP as a tree structure visualization. Below I provide the parse tree from openNLP

1条回答
  •  傲寒
    傲寒 (楼主)
    2020-12-31 21:10

    Here is an igraph version. This function takes the result from Parse_annotator as its input, so ptext in your example. NLP::Tree_parse already creates a nice tree structure, so the idea here is to traverse it recursively and create an edgelist to plug into igraph. The edgelist is just a 2-column matrix of head->tail values.

    In order for igraph to create edges between the proper nodes, they need to have unique identifiers. I did this by appending a sequence of integers (using regmatches<-) to the words in the text prior to using Tree_parse.

    The internal function edgemaker traverses the tree, filling in edgelist as it goes. There are options to color the leaves separately from the rest of the nodes, but if you pass the option vertex.label.color it will color them all the same.

    ## Make a graph from Tree_parse result
    parse2graph <- function(ptext, leaf.color='chartreuse4', label.color='blue4',
                            title=NULL, cex.main=.9, ...) {
        stopifnot(require(NLP) && require(igraph))
    
        ## Replace words with unique versions
        ms <- gregexpr("[^() ]+", ptext)                                      # just ignoring spaces and brackets?
        words <- regmatches(ptext, ms)[[1]]                                   # just words
        regmatches(ptext, ms) <- list(paste0(words, seq.int(length(words))))  # add id to words
    
        ## Going to construct an edgelist and pass that to igraph
        ## allocate here since we know the size (number of nodes - 1) and -1 more to exclude 'TOP'
        edgelist <- matrix('', nrow=length(words)-2, ncol=2)
    
        ## Function to fill in edgelist in place
        edgemaker <- (function() {
            i <- 0                                       # row counter
            g <- function(node) {                        # the recursive function
                if (inherits(node, "Tree")) {            # only recurse subtrees
                    if ((val <- node$value) != 'TOP1') { # skip 'TOP' node (added '1' above)
                        for (child in node$children) {
                            childval <- if(inherits(child, "Tree")) child$value else child
                            i <<- i+1
                            edgelist[i,1:2] <<- c(val, childval)
                        }
                    }
                    invisible(lapply(node$children, g))
                }
            }
        })()
    
        ## Create the edgelist from the parse tree
        edgemaker(Tree_parse(ptext))
    
        ## Make the graph, add options for coloring leaves separately
        g <- graph_from_edgelist(edgelist)
        vertex_attr(g, 'label.color') <- label.color  # non-leaf colors
        vertex_attr(g, 'label.color', V(g)[!degree(g, mode='out')]) <- leaf.color
        V(g)$label <- sub("\\d+", '', V(g)$name)      # remove the numbers for labels
        plot(g, layout=layout.reingold.tilford, ...)
        if (!missing(title)) title(title, cex.main=cex.main)
    }
    

    So, using your example, the string x and its annotated version ptext, which looks like

    x <- 'Scroll bar does not work the best either.'
    ptext
    # [1] "(TOP (S (NP (NNP Scroll) (NN bar)) (VP (VBZ does) (RB not) (VP (VB work) (NP (DT the) (JJS best)) (ADVP (RB either))))(. .)))"
    

    Create the graph by calling

    library(igraph)
    library(NLP)
    
    parse2graph(ptext,  # plus optional graphing parameters
                title = sprintf("'%s'", x), margin=-0.05,
                vertex.color=NA, vertex.frame.color=NA,
                vertex.label.font=2, vertex.label.cex=1.5, asp=0.5,
                edge.width=1.5, edge.color='black', edge.arrow.size=0)
    

    0 讨论(0)
提交回复
热议问题