There are some words which are used sometimes as a verb and sometimes as other part of speech.
Example
A sentence with the meaning of the w
You can install TreeTagger and then use the koRpus
package in R to use TreeTagger from R. Install it in a location like e.g. C:\Treetagger
.
I will first show how treetagger works so you understand what's going in the actual solution further down below in this answer:
library(koRpus)
your_sentences <- c("I blame myself for what happened",
"For what happened the blame is yours")
text.tagged <- treetag(file="I blame myself for what happened",
format="obj", treetagger="manual", lang="en",
TT.options = list(path="C:\\Treetagger", preset="en") )
text.tagged@TT.res[, 1:2]
# token tag
#1 I PP
#2 blame VVP
#3 myself PP
#4 for IN
#5 what WP
#6 happened VVD
The sentences have been analysed now and the "only thing left" is to remove those occurrences of "blame"
that are a verb.
I'll do this sentence for sentence by creating a function that first tags the sentence, then checks for "bad words" like "blame"
that are also a verb and finally removes them from the sentence:
remove_words <- function(sentence, badword="blame"){
tagged.text <- treetag(file=sentence, format="obj", treetagger="manual", lang="en",
TT.options=list(path=":C\\Treetagger", preset="en"))
# Check for bad words AND verb:
cond1 <- (tagged.text@TT.res$token == badword)
cond2 <- (substring(tagged.text@TT.res$tag, 0, 1) == "V")
redflag <- which(cond1 & cond2)
# If no such case, return sentence as is. If so, then remove that word:
if(length(redflag) == 0) return(sentence)
else{
splitsent <- strsplit(sentence, " ")[[1]]
splitsent <- splitsent[-redflag]
return(paste0(splitsent, collapse=" "))
}
}
lapply(your_sentences, remove_words)
# [[1]]
# [1] "I myself for what happened"
# [[2]]
# [1] "For what happened the blame is yours"
You can do something like this in Python .
import ntlk
>>> text = word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
And add youre filter to eliminate Verbs for instance .
Hope this is helpful !
In python it is done as:
from nltk import pos_tag
s1 = "I blame myself for what happened"
pos_tag(s1.split())
It will give you words with there tags