Remove a verb as a stopword

前端未结

关注

 3  1611

谎友^

There are some words which are used sometimes as a verb and sometimes as other part of speech.

Example

A sentence with the meaning of the w

Intro treetagger

library(koRpus)

your_sentences <- c("I blame myself for what happened", 
                    "For what happened the blame is yours")

text.tagged <- treetag(file="I blame myself for what happened", 
                  format="obj", treetagger="manual", lang="en",
                  TT.options = list(path="C:\\Treetagger", preset="en") )
text.tagged@TT.res[, 1:2]
#       token tag    
#1         I  PP
#2     blame VVP 
#3    myself  PP 
#4       for  IN
#5      what  WP
#6  happened VVD

The sentences have been analysed now and the "only thing left" is to remove those occurrences of "blame" that are a verb.

Solution

I'll do this sentence for sentence by creating a function that first tags the sentence, then checks for "bad words" like "blame" that are also a verb and finally removes them from the sentence:

remove_words <- function(sentence, badword="blame"){
  tagged.text <- treetag(file=sentence, format="obj", treetagger="manual", lang="en", 
                         TT.options=list(path=":C\\Treetagger", preset="en"))
  # Check for bad words AND verb:
  cond1 <- (tagged.text@TT.res$token == badword)
  cond2 <- (substring(tagged.text@TT.res$tag, 0, 1) == "V")
  redflag <- which(cond1 & cond2)

  # If no such case, return sentence as is. If so, then remove that word:
  if(length(redflag) == 0) return(sentence)
  else{
    splitsent <- strsplit(sentence, " ")[[1]]
    splitsent <- splitsent[-redflag]
    return(paste0(splitsent, collapse=" "))
  }
}

lapply(your_sentences, remove_words)
# [[1]]
# [1] "I myself for what happened"
# [[2]]
# [1] "For what happened the blame is yours"

0 讨论(0)

渐次进展

2021-01-07 08:01

You can do something like this in Python .

import ntlk
>>> text = word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]

And add youre filter to eliminate Verbs for instance .

Hope this is helpful !

0 讨论(0)

情话喂你

2021-01-07 08:11
In python it is done as:
```
from nltk import pos_tag
s1 = "I blame myself for what happened"
pos_tag(s1.split())
```
It will give you words with there tags
0 讨论(0)
发布评论:

提交评论
- 加载中...