I have a string:
str1 <- \"This is a string, that I\'ve written
to ask about a question, or at least tried to.\"
How would
The general problem of mathcing text requires regular expressions. In this case you just want to match specific characters, but the functions to call are the same. You want gregexpr
.
matched_commas <- gregexpr(",", str1, fixed = TRUE)
n_commas <- length(matched_commas[[1]])
matched_ion <- gregexpr("ion", str1, fixed = TRUE)
n_ion <- length(matched_ion[[1]])
If you want to only match "ion" at the end of words, then you do need regular expressions. \b
represents a word boundary, and you need to escape the backslash.
gregexpr(
"ion\\b",
"ionisation should only be matched at the end of the word",
perl = TRUE
)
Another option is stringi
library(stringi)
stri_count(str1,fixed=',')
#[1] 2
stri_count(str1,fixed='ion')
#[1] 1
vec <- paste(sample(letters, 1e6, replace=T), collapse=' ')
f1 <- function() str_count(vec, 'a')
f2 <- function() stri_count(vec, fixed='a')
f3 <- function() length(gregexpr('a', vec)[[1]])
library(microbenchmark)
microbenchmark(f1(), f2(), f3(), unit='relative', times=20L)
#Unit: relative
#expr min lq mean median uq max neval cld
# f1() 18.41423 18.43579 18.37623 18.36428 18.46115 17.79397 20 b
# f2() 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 20 a
# f3() 18.35381 18.42019 18.30015 18.35580 18.20973 18.21109 20 b
This really is an adaptation of Richie Cotton's answer. I hate having to repeat the same function over and over. This approach allows you to feed a vector of terms to match within the string:
str1 <- "This is a string, that I've written to ask about a question,
or at least tried to."
matches <- c(",", "ion")
sapply(matches, function(x) length(gregexpr(x, str1, fixed = TRUE)[[1]]))
# , ion
# 2 1
The stringr
package has a function str_count
that does this for you nicely.
library(stringr)
str_count(str1, ',')
[1] 2
str_count(str1, 'ion')
[1] 1
EDIT:
Cause I was curious:
vec <- paste(sample(letters, 1e6, replace=T), collapse=' ')
system.time(str_count(vec, 'a'))
user system elapsed
0.052 0.000 0.054
system.time(length(gregexpr('a', vec, fixed=T)[[1]]))
user system elapsed
2.124 0.016 2.146
system.time(length(gregexpr('a', vec, fixed=F)[[1]]))
user system elapsed
0.052 0.000 0.052