R: removing numbers at begin and end of a string

后端 未结 3 1786
小鲜肉
小鲜肉 2020-12-19 09:56

I\'ve got the following vector:

words <- c(\"5lang\",\"kasverschil2\",\"b2b\")

I want to remove \"5\" in \"5lang\"

相关标签:
3条回答
  • 2020-12-19 10:27
     gsub("^\\d+|\\d+$", "", words)    
     #[1] "lang"        "kasverschil" "b2b"
    

    Another option would be to use stringi

     library(stringi)
     stri_replace_all_regex(words, "^\\d+|\\d+$", "")
      #[1] "lang"        "kasverschil" "b2b"        
    

    Using a variant of the data set provided by the OP here are benchmarks for 3 three main solutions (note that these strings are very short and contrived; results may differ on a larger, real data set):

    words <- rep(c("5lang","kasverschil2","b2b"), 100000)
    
    library(stringi)
    library(microbenchmark)
    
    GSUB <- function() gsub("^\\d+|\\d+$", "", words)
    STRINGI <- function() stri_replace_all_regex(words, "^\\d+|\\d+$", "")
    GREGEXPR <- function() {
        gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
        sapply(regmatches(words, mm, invert=TRUE), paste, collapse="") 
    }
    
    microbenchmark( 
        GSUB(),
        STRINGI(),
        GREGEXPR(),
        times=100L
    )
    
    ## Unit: milliseconds
    ##        expr       min        lq    median        uq       max neval
    ##      GSUB()  301.0988  349.9952  396.3647  431.6493  632.7568   100
    ##   STRINGI()  465.9099  513.1570  569.1972  629.4176  738.4414   100
    ##  GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904   100
    
    0 讨论(0)
  • 2020-12-19 10:27

    You can use gsub which uses regular expressions:

    gsub("^[0-9]|[0-9]$", "", words)
    # [1] "lang"        "kasverschil" "b2b"
    

    Explanation:

    The pattern ^[0-9] matches any number at the beginning of a string, while the pattern [0-9]$ matches any number at the end of the string. by separating these two patterns by | you want to match either the first or the second pattern. Then, you replace the matched pattern with an empty string.

    0 讨论(0)
  • 2020-12-19 10:41

    Get instances where numbers appear at the beginning or end of a word and match everything else. You need to collapse results because of possible multiple matches:

    gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
    sapply(regmatches(words, mm, invert=TRUE), paste, collapse="") 
    
    0 讨论(0)
提交回复
热议问题