R count number of commas and string

前端 未结 4 936
遇见更好的自我
遇见更好的自我 2021-01-17 19:25

I have a string:

    str1 <- \"This is a string, that I\'ve written 
        to ask about a question, or at least tried to.\"

How would

相关标签:
4条回答
  • 2021-01-17 19:39

    The general problem of mathcing text requires regular expressions. In this case you just want to match specific characters, but the functions to call are the same. You want gregexpr.

    matched_commas <- gregexpr(",", str1, fixed = TRUE)
    n_commas <- length(matched_commas[[1]])
    
    matched_ion <- gregexpr("ion", str1, fixed = TRUE)
    n_ion <- length(matched_ion[[1]])
    

    If you want to only match "ion" at the end of words, then you do need regular expressions. \b represents a word boundary, and you need to escape the backslash.

    gregexpr(
      "ion\\b", 
      "ionisation should only be matched at the end of the word", 
      perl = TRUE
    )
    
    0 讨论(0)
  • 2021-01-17 19:47

    Another option is stringi

    library(stringi)
    stri_count(str1,fixed=',')
    #[1] 2
    stri_count(str1,fixed='ion')
    #[1] 1
    

    Benchmarks

    vec <- paste(sample(letters, 1e6, replace=T), collapse=' ')
    f1 <- function() str_count(vec, 'a')
    f2 <- function() stri_count(vec, fixed='a')
    f3 <- function() length(gregexpr('a', vec)[[1]])
    
    library(microbenchmark)
    microbenchmark(f1(), f2(), f3(), unit='relative', times=20L)
    #Unit: relative
    #expr      min       lq     mean   median       uq      max neval cld
    # f1() 18.41423 18.43579 18.37623 18.36428 18.46115 17.79397    20   b
    # f2()  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000    20  a 
    # f3() 18.35381 18.42019 18.30015 18.35580 18.20973 18.21109    20   b
    
    0 讨论(0)
  • 2021-01-17 19:49

    This really is an adaptation of Richie Cotton's answer. I hate having to repeat the same function over and over. This approach allows you to feed a vector of terms to match within the string:

    str1 <- "This is a string, that I've written to ask about a question, 
        or at least tried to."
    matches <- c(",", "ion") 
    sapply(matches,  function(x) length(gregexpr(x, str1, fixed = TRUE)[[1]]))
    #  , ion 
    #  2   1 
    
    0 讨论(0)
  • 2021-01-17 19:54

    The stringr package has a function str_count that does this for you nicely.

    library(stringr)
    
    str_count(str1, ',')
    [1] 2
    str_count(str1, 'ion')
    [1] 1
    

    EDIT:

    Cause I was curious:

    vec <- paste(sample(letters, 1e6, replace=T), collapse=' ')
    
    system.time(str_count(vec, 'a'))
       user  system elapsed 
      0.052   0.000   0.054 
    
    system.time(length(gregexpr('a', vec, fixed=T)[[1]]))
       user  system elapsed 
      2.124   0.016   2.146 
    
    system.time(length(gregexpr('a', vec, fixed=F)[[1]]))
       user  system elapsed 
      0.052   0.000   0.052 
    
    0 讨论(0)
提交回复
热议问题