I have some data which I want to clean up using a regular expression in R.
It is easy to find how to get elements that contain certain patterns, or do not contain ce
Edit: From the comments, and with a little testing, one would find that my suggestion wasn't correct.
Here are two correct solutions:
vector[!grepl("['pyfgcrl]", vector)] ## kohske
grep("['pyfgcrl]", vector, value = TRUE, invert = TRUE) ## flodel
If either of them wants to re-post and accept credit for their answer, I'm more than happy to delete mine here.
The general function that you are looking for is grepl
. From the help file for grepl
:
grepl
returns a logical vector (match or not for each element ofx
).
Additionally, you should read the help page for regex
which describes what character classes are. In this case, you create a character class ['pyfgcrl]
, which says to look for any character in the square brackets. You can then negate this with !
.
So, up to this point, we have something that looks like:
!grepl("['pyfgcrl]", vector)
To get what you are looking for, you subset as usual.
vector[!grepl("['pyfgcrl]", vector)]
For the second solution, offered by @flodel, grep
by default returns the position where a match is made, and the value = TRUE
argument lets you return the actual string value instead. invert = TRUE
means to return the values that were not matched.