How can I remove the letters between two specific patterns in R?
For instance
a= \"a#g abcdefgtdkfef_jpg>pple\"
I would like to
Adding to the previous replies, if you work with a string that looks like "a#g abcdefgtdkfef_jpg>pple ; #__something_else___jpg>"
, some of these methods will sub the whole string with an expression like "#.*jpg>"
, and you will get an empty string as a result. To avoid that, you can use R regex "#[^jpg>]+jpg>"
that will allow you to match the pattern more selectively.
There's no need to load a package for this operation. You can use the base R function sub
. It's used to match the first occurrence of a regular expression.
a <- "a#g abcdefgtdkfef_jpg>pple"
sub("#g.*jpg>", "", a)
# [1] "apple"
Regular expression explained:
#g
matches "#g"
.*
matches any character except \n
(zero or more times)jpg>
matches "jpg>"
So here we're removing everything starting at #g
up to and including jpg>
In regards to your comment
I tried to find some function in stringR but I couldn't
It's actually spelled stringr
(case-sensitive). You could use str_replace
.
library(stringr)
str_replace(a, "#g.*jpg>", "")
# [1] "apple"
I wanted to add to Rich's answer because it does not work when multiple replacements need to be done in the same text.
If you want to remove multiple times in the same string you need to tweak the code a bit:
a <- "a#g abcdefgtdkfef_jpg>pple
or#g abcdefgtdkfef_jpg>ange
ma#g abcdefgtdkfef_jpg>ngo"
# Code to get the individual fruits
gsub("#g.*?jpg>", "", a)
# Output
# [1] "apple orange mango"