Conventionally in R one can use metacharacters in a regex with two slashes, e.g. ( becomes \\(, but I find the same isn\'t true for square brackets.
mystring
You should enable perl = TRUE
, then you can use Perl-like syntax which is more straight-forward (IMHO):
gsub("[\\[\\]$]","",mystring, perl = TRUE)
Or, you may use "smart placement" when placing ]
at the start of the bracket expression ([
is not special inside it, there is no need escaping [
there):
gsub("[][$]","",mystring)
See demo
Result:
[1] "abcde"
More details
The [...]
construct is considered a bracket expression by the TRE regex engine (used by default in base R regex functions - (g)sub, grep(l), (g)regexpr - when used without perl=TRUE
), which is a POSIX regex construct. Bracket expressions, unlike character classes in NFA regex engines, do not support escape sequences, i.e. the \
char is treated as a a literal backslash char inside them.
Thus, the [\[\]]
in a TRE regex matches \
or [
char (with the [\[\]
part that is actually equal to [\[]
) and then a ]
. So, it matches \]
or []
substrings, just have a look at gsub("[\\[\\]]", "", "[]\\]ab]")
demo - it outputs ab]
because []
and \]
are matched and eventually removed.
Note that the terms POSIX bracket expressions and NFA character classes are used in the same meaning as is used at https://www.regular-expressions.info, it is not quite a standard, but there is a need to differentiate between the two.
I would sidestep [ab]
syntax and use (a|b)
. Besides working, it may also be more readable:
gsub("(\\[|\\]|\\$)","",mystring)
You can just use \\[
as the thing to match, you don't need additional square brackets unless you are matching multiple options:
> mystring <- 'abc[de'
> gsub("\\[", "", mystring)
[1] "abcde"
You can make this even simpler and faster for single characters by taking away the special meaning using fixed=TRUE
:
> mystring <- 'abc[de'
> gsub("[", "", mystring, fixed=TRUE)
[1] "abcde"
Or if the first thing inside of square brackets is square brackets (unescaped), then they are taken as the literal character rather than having the usual special meaning:
> mystring <- 'a,bc[d]e$'
> gsub("[][,$]", "", mystring)
[1] "abcde"