How can I detect the presence of more than two consecutive characters in a word and remove that word?
I seem to be able to do it like this:
# example
You can use grepl
instead.
mystring <- c(1, 2, 3, "toot", "tooooot", "good", "apple", "banana")
mystring[!grepl("(.)\\1{2,}", mystring)]
## [1] "1" "2" "3" "toot" "good" "apple" "banana"
** Explanation**
\\1
matches first group (in this case (.)
). {2,}
specifies that preceding character should be matched atleast 2 times or more. Since we want to match any character repeated 3 times or more - (.)
is first occurrence, \\1
needs to be matched 2 times ore more.
Combine the expressions like so:
gsub("^[[:alpha:]]*([[:alpha:]])\\1\\1[[:alpha:]]*$", "", mystring)
An other possibility :
mystring[grepl("(.{1})\\1{2,}", mystring, perl=T)] <- ""