I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate n
I don't have an open console on which to test this, but have your tried to tokenize with tagPOS and then grep for "noun", "noun" or perhaps paste(tagPOS(acq), collapse=".") and search for "noun.noun". Then gregexpr could be used to extract positions.
EDIT: The format of the tagged output was a bit different than I remembered. I think this method of read.table()-ing after substituting "\n"s for spaces is much more efficient than what I see above:
acqdf <- read.table(textConnection(gsub(" ", "\n", acqTag)), sep="/", stringsAsFactors=FALSE)
acqdf$nnadj <- grepl("NN|JJ", acqdf$V2)
acqdf$nnadj
# [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE
#[16] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
#[31] TRUE FALSE FALSE FALSE FALSE TRUE FALSE
acqdf$nnadj[1:(nrow(acqdf)-1)] & acqdf$nnadj[2:nrow(acqdf)]
# [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
#[16] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#[31] FALSE FALSE FALSE FALSE FALSE FALSE
acqdf$pair <- c(NA, acqdf$nnadj[1:(nrow(acqdf)-1)] & acqdf$nnadj[2:nrow(acqdf)])
acqdf[1:7, ]
V1 V2 nnadj pair
1 Gulf NNP TRUE NA
2 Applied NNP TRUE TRUE
3 Technologies NNP TRUE TRUE
4 Inc NNP TRUE TRUE
5 said VBD FALSE FALSE
6 it PRP FALSE FALSE
7 sold VBD FALSE FALSE