I\'ve got the following vector:
words <- c(\"5lang\",\"kasverschil2\",\"b2b\")
I want to remove \"5\"
in \"5lang\"
gsub("^\\d+|\\d+$", "", words)
#[1] "lang" "kasverschil" "b2b"
Another option would be to use stringi
library(stringi)
stri_replace_all_regex(words, "^\\d+|\\d+$", "")
#[1] "lang" "kasverschil" "b2b"
Using a variant of the data set provided by the OP here are benchmarks for 3 three main solutions (note that these strings are very short and contrived; results may differ on a larger, real data set):
words <- rep(c("5lang","kasverschil2","b2b"), 100000)
library(stringi)
library(microbenchmark)
GSUB <- function() gsub("^\\d+|\\d+$", "", words)
STRINGI <- function() stri_replace_all_regex(words, "^\\d+|\\d+$", "")
GREGEXPR <- function() {
gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
sapply(regmatches(words, mm, invert=TRUE), paste, collapse="")
}
microbenchmark(
GSUB(),
STRINGI(),
GREGEXPR(),
times=100L
)
## Unit: milliseconds
## expr min lq median uq max neval
## GSUB() 301.0988 349.9952 396.3647 431.6493 632.7568 100
## STRINGI() 465.9099 513.1570 569.1972 629.4176 738.4414 100
## GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904 100
You can use gsub
which uses regular expressions:
gsub("^[0-9]|[0-9]$", "", words)
# [1] "lang" "kasverschil" "b2b"
Explanation:
The pattern ^[0-9]
matches any number at the beginning of a string, while the pattern [0-9]$
matches any number at the end of the string. by separating these two patterns by |
you want to match either the first or the second pattern. Then, you replace the matched pattern with an empty string.
Get instances where numbers appear at the beginning or end of a word and match everything else. You need to collapse results because of possible multiple matches:
gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
sapply(regmatches(words, mm, invert=TRUE), paste, collapse="")