I have this example data
d<-\"30,3\"
class(d)
I have this character objects in one column in my work data frame and I need to be able to
You could also try stringi
package stri_count_*
funcitons (should be very effcient)
library(stringi)
stri_count_regex(d, "\\d+")
## [1] 2
stri_count_fixed(d, ",") + 1
## [1] 2
stringr
package has a similar functionality
library(stringr)
str_count(d, "\\d+")
## [1] 2
Update:
If you want to subset your data set by length 2 vectors, could try
df[stri_count_regex(df$d, "\\d+") == 2,, drop = FALSE]
# d
# 2 30,5
Or simpler
subset(df, stri_count_regex(d, "\\d+") == 2)
# d
# 2 30,5
Update #2
Here's a benchmark that illustrates why one should consider using external packages (@rengis answer wasn't included because it doesn't answer the question)
library(microbenchmark)
library(stringi)
d <- rep("30,3", 1e4)
microbenchmark( akrun = nchar(gsub('[^,]', '', d))+1,
GG1 = count.fields(textConnection(d), sep = ","),
GG2 = sapply(gregexpr(",", d), length) + 1,
DA1 = stri_count_regex(d, "\\d+"),
DA2 = stri_count_fixed(d, ",") + 1)
# Unit: microseconds
# expr min lq mean median uq max neval
# akrun 8817.950 9479.9485 11489.7282 10642.4895 12480.845 46538.39 100
# GG1 55451.474 61906.2460 72324.0820 68783.9935 78980.216 150673.72 100
# GG2 33026.455 43349.5900 60960.8762 51825.6845 72293.923 203126.27 100
# DA1 4730.302 5120.5145 6206.8297 5550.7930 7179.536 10507.09 100
# DA2 380.147 418.2395 534.6911 448.2405 597.259 2278.11 100