Count values separated by a comma in a character string

前端未结

关注

 5  1228

小蘑菇 2020-11-30 12:39

I have this example data

d<-\"30,3\"
class(d)

I have this character objects in one column in my work data frame and I need to be able to

5条回答

有刺的猬 (楼主)

2020-11-30 12:49

You could also try stringi package stri_count_* funcitons (should be very effcient)

library(stringi)
stri_count_regex(d, "\\d+")
## [1] 2
stri_count_fixed(d, ",") + 1
## [1] 2

stringr package has a similar functionality

library(stringr)
str_count(d, "\\d+")
## [1] 2

Update:

If you want to subset your data set by length 2 vectors, could try

df[stri_count_regex(df$d, "\\d+") == 2,, drop = FALSE]
#      d
# 2 30,5

Or simpler

subset(df, stri_count_regex(d, "\\d+") == 2)
#      d
# 2 30,5

Update #2

Here's a benchmark that illustrates why one should consider using external packages (@rengis answer wasn't included because it doesn't answer the question)

library(microbenchmark)
library(stringi)
d <- rep("30,3", 1e4)

microbenchmark( akrun = nchar(gsub('[^,]', '', d))+1,
                GG1 = count.fields(textConnection(d), sep = ","),
                GG2 = sapply(gregexpr(",", d), length) + 1,
                DA1 = stri_count_regex(d, "\\d+"),
                DA2 = stri_count_fixed(d, ",") + 1)

# Unit: microseconds
#  expr       min         lq       mean     median        uq       max neval
# akrun  8817.950  9479.9485 11489.7282 10642.4895 12480.845  46538.39   100
#   GG1 55451.474 61906.2460 72324.0820 68783.9935 78980.216 150673.72   100
#   GG2 33026.455 43349.5900 60960.8762 51825.6845 72293.923 203126.27   100
#   DA1  4730.302  5120.5145  6206.8297  5550.7930  7179.536  10507.09   100
#   DA2   380.147   418.2395   534.6911   448.2405   597.259   2278.11   100

0 讨论(0)

查看其它5个回答