Count values separated by a comma in a character string

前端 未结 5 1233
小蘑菇
小蘑菇 2020-11-30 12:39

I have this example data

d<-\"30,3\"
class(d)

I have this character objects in one column in my work data frame and I need to be able to

相关标签:
5条回答
  • 2020-11-30 12:49

    You could also try stringi package stri_count_* funcitons (should be very effcient)

    library(stringi)
    stri_count_regex(d, "\\d+")
    ## [1] 2
    stri_count_fixed(d, ",") + 1
    ## [1] 2
    

    stringr package has a similar functionality

    library(stringr)
    str_count(d, "\\d+")
    ## [1] 2
    

    Update:

    If you want to subset your data set by length 2 vectors, could try

    df[stri_count_regex(df$d, "\\d+") == 2,, drop = FALSE]
    #      d
    # 2 30,5
    

    Or simpler

    subset(df, stri_count_regex(d, "\\d+") == 2)
    #      d
    # 2 30,5
    

    Update #2

    Here's a benchmark that illustrates why one should consider using external packages (@rengis answer wasn't included because it doesn't answer the question)

    library(microbenchmark)
    library(stringi)
    d <- rep("30,3", 1e4)
    
    microbenchmark( akrun = nchar(gsub('[^,]', '', d))+1,
                    GG1 = count.fields(textConnection(d), sep = ","),
                    GG2 = sapply(gregexpr(",", d), length) + 1,
                    DA1 = stri_count_regex(d, "\\d+"),
                    DA2 = stri_count_fixed(d, ",") + 1)
    
    # Unit: microseconds
    #  expr       min         lq       mean     median        uq       max neval
    # akrun  8817.950  9479.9485 11489.7282 10642.4895 12480.845  46538.39   100
    #   GG1 55451.474 61906.2460 72324.0820 68783.9935 78980.216 150673.72   100
    #   GG2 33026.455 43349.5900 60960.8762 51825.6845 72293.923 203126.27   100
    #   DA1  4730.302  5120.5145  6206.8297  5550.7930  7179.536  10507.09   100
    #   DA2   380.147   418.2395   534.6911   448.2405   597.259   2278.11   100
    
    0 讨论(0)
  • These two approaches are each short, work on vectors of strings, do not involve the expense of explicitly constructing the split string and do not use any packages. Here d is a vector of strings such as d <- c("1,2,3", "5,2") :

    1) count.fields

    count.fields(textConnection(d), sep = ",")
    

    2) gregexpr

    lengths(gregexpr(",", d)) + 1
    
    0 讨论(0)
  • 2020-11-30 13:02

    You could use scan.

     v1 <- scan(text=d, sep=',', what=numeric(), quiet=TRUE)
     v1
     #[1] 30  3
    

    Or using stri_split from stringi. This should take both character and factor class without converting explicitly to character using as.character

    library(stringi)
    v2 <- as.numeric(unlist(stri_split(d,fixed=',')))
    v2
    #[1] 30  3
    

    You can do the count using base R by

    length(v1)
    #[1] 2
    

    Or

    nchar(gsub('[^,]', '', d))+1
    #[1] 2
    

    Visualize the regex

     [^,]
    

    Regular expression visualization

    Debuggex Demo

    Update

    If d is a column in a dataset df and want to subset rows with number of digits equals 2

      d<-c("30,3,5","30,5") 
      df <- data.frame(d,stringsAsFactors=FALSE)
      df[nchar(gsub('[^,]', '',df$d))+1==2,,drop=FALSE]
      #    d
      #2 30,5
    

    Just to test

      df[nchar(gsub('[^,]', '',df$d))+1==10,,drop=FALSE]
      #[1] d
      #<0 rows> (or 0-length row.names)
    
    0 讨论(0)
  • 2020-11-30 13:09

    Here is a possibility

    > as.numeric(unlist(strsplit("30,3", ",")))
    # 30  3
    
    0 讨论(0)
  • 2020-11-30 13:10

    A slight variation on the accepted answer, requires no packages. Using the example d <- c("1,2,3", "5,2")

    lengths(strsplit(d, ","))
    
    > [1] 3 2
    

    Or as a data.frame

    df <- data.frame(d = d)
    
    df$counts <- lengths(strsplit(df$d, ","))
    
    df
    
    #----
        d counts
    1,2,3      3
      5,2      2
    
    0 讨论(0)
提交回复
热议问题