Count values separated by a comma in a character string

前端 未结 5 1228
小蘑菇
小蘑菇 2020-11-30 12:39

I have this example data

d<-\"30,3\"
class(d)

I have this character objects in one column in my work data frame and I need to be able to

5条回答
  •  有刺的猬
    2020-11-30 12:49

    You could also try stringi package stri_count_* funcitons (should be very effcient)

    library(stringi)
    stri_count_regex(d, "\\d+")
    ## [1] 2
    stri_count_fixed(d, ",") + 1
    ## [1] 2
    

    stringr package has a similar functionality

    library(stringr)
    str_count(d, "\\d+")
    ## [1] 2
    

    Update:

    If you want to subset your data set by length 2 vectors, could try

    df[stri_count_regex(df$d, "\\d+") == 2,, drop = FALSE]
    #      d
    # 2 30,5
    

    Or simpler

    subset(df, stri_count_regex(d, "\\d+") == 2)
    #      d
    # 2 30,5
    

    Update #2

    Here's a benchmark that illustrates why one should consider using external packages (@rengis answer wasn't included because it doesn't answer the question)

    library(microbenchmark)
    library(stringi)
    d <- rep("30,3", 1e4)
    
    microbenchmark( akrun = nchar(gsub('[^,]', '', d))+1,
                    GG1 = count.fields(textConnection(d), sep = ","),
                    GG2 = sapply(gregexpr(",", d), length) + 1,
                    DA1 = stri_count_regex(d, "\\d+"),
                    DA2 = stri_count_fixed(d, ",") + 1)
    
    # Unit: microseconds
    #  expr       min         lq       mean     median        uq       max neval
    # akrun  8817.950  9479.9485 11489.7282 10642.4895 12480.845  46538.39   100
    #   GG1 55451.474 61906.2460 72324.0820 68783.9935 78980.216 150673.72   100
    #   GG2 33026.455 43349.5900 60960.8762 51825.6845 72293.923 203126.27   100
    #   DA1  4730.302  5120.5145  6206.8297  5550.7930  7179.536  10507.09   100
    #   DA2   380.147   418.2395   534.6911   448.2405   597.259   2278.11   100
    

提交回复
热议问题