Extracting Nouns and Verbs from Text

前端 未结 1 1546
无人及你
无人及你 2020-12-15 01:39

I was wondering if it is possible to extract nouns, verbs separately in R package openNLP? I use the the tagPOS function which tags the sentence but what to do in case I wa

相关标签:
1条回答
  • 2020-12-15 02:24

    Using an example: (this is to extract words tagged as /VBx, where x is any single character)

    library("openNLP")
    
    acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."
    
    acqTag <- tagPOS(acq)
    
    sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\\s)(\\w+$)", "\\2", x))
    
         [,1]                           
    [1,] "said"                         
    [2,] "sold"                         
    [3,] "engaged"                      
    [4,] "said"                         
    [5,] "is"                           
    [6,] "did"                          
    [7,] " not/RB explain./NN Reuter./."
    

    Ok, my regular expression needs some improvement in order to get rid of the last line in the result.

    EDIT

    An alternative could be to ignore rows containing a space character

    sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\\s)(\\w+$)", "\\2", x); res[!grepl("\\s",res)]} )
    
    0 讨论(0)
提交回复
热议问题