I am using the NLP package to parse sentences. How can I extract an element from the Tree
output that is created? For example I'd like to grab the Noun Phrases (NP
) from the example below:
library(NLP)
library(openNLP)
s <- c(
"Really, I like chocolate because it is good.",
"Robots are rather evil and most are devoid of decency"
)
s <- as.String(s)
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator))
parse_annotator <- Parse_Annotator()
p <- parse_annotator(s, a2)
ptexts <- sapply(p$features, `[[`, "parse")
ptexts
ptrees <- lapply(ptexts, Tree_parse)
ptrees
## [[1]]
## (TOP
## (S
## (S
## (S
## (ADVP (RB Really))
## (, ,)
## (NP (PRP I))
## (VP
## (VBP like)
## (NP (NN chocolate))
## (SBAR (IN because) (S (NP (PRP it)) (VP (VBZ is) (ADJP (JJ good)))))))
## (. .)
## (, ,)
## (NP (NNP Robots))
## (VP (VBP are) (ADJP (RB rather) (JJ evil))))
## (CC and)
## (S (NP (RBS most)) (VP (VBP are) (ADJP (JJ devoid) (PP (IN of) (NP (NN decency))))))))
I'd like to grab pieces from the Tree
but can't figure out from the documentation for Tree_parse
. Using str
indicates it should be easy to do but I can't achieve it.
I'd like it to return something like:
[1] "I" "Robots"
Or as a list
rather than a vector.
This likely requires having openNLPmodels.en installed from: http://datacube.wu.ac.at/src/contrib/
Download and run
install.packages(
"http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz",
repos=NULL,
type="source"
)
`
If it's helpful folks can source the Tree
directly in using the curl package from my Dropbox:
library(curl)
ptrees <- source(curl("https://dl.dropboxusercontent.com/u/61803503/Errors/tree.R"))[[1]]
来源:https://stackoverflow.com/questions/28133394/how-to-extract-elements-from-nlp-tree