Extracting noun+noun or (adj|noun)+noun from Text

后端 未结 2 1908
星月不相逢
星月不相逢 2020-12-09 00:17

I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate n

2条回答
  •  时光说笑
    2020-12-09 00:28

    I don't have an open console on which to test this, but have your tried to tokenize with tagPOS and then grep for "noun", "noun" or perhaps paste(tagPOS(acq), collapse=".") and search for "noun.noun". Then gregexpr could be used to extract positions.

    EDIT: The format of the tagged output was a bit different than I remembered. I think this method of read.table()-ing after substituting "\n"s for spaces is much more efficient than what I see above:

     acqdf <- read.table(textConnection(gsub(" ", "\n", acqTag)), sep="/", stringsAsFactors=FALSE)
     acqdf$nnadj <- grepl("NN|JJ", acqdf$V2)
     acqdf$nnadj 
    # [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE
    #[16] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
    #[31]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE
     acqdf$nnadj[1:(nrow(acqdf)-1)] & acqdf$nnadj[2:nrow(acqdf)]
    # [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
    #[16] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
    #[31] FALSE FALSE FALSE FALSE FALSE FALSE
     acqdf$pair <- c(NA, acqdf$nnadj[1:(nrow(acqdf)-1)] & acqdf$nnadj[2:nrow(acqdf)])
     acqdf[1:7, ]
    
                V1  V2 nnadj  pair
    1         Gulf NNP  TRUE    NA
    2      Applied NNP  TRUE  TRUE
    3 Technologies NNP  TRUE  TRUE
    4          Inc NNP  TRUE  TRUE
    5         said VBD FALSE FALSE
    6           it PRP FALSE FALSE
    7         sold VBD FALSE FALSE
    

提交回复
热议问题