R: removing numbers at begin and end of a string

后端未结

关注

 3  1786

I\'ve got the following vector:

words <- c(\"5lang\",\"kasverschil2\",\"b2b\")

I want to remove \"5\" in \"5lang\"


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2020-12-19 10:27
              
            
            
                                                                       
 gsub("^\\d+|\\d+$", "", words)    
 #[1] "lang"        "kasverschil" "b2b"


Another option would be to use stringi

 library(stringi)
 stri_replace_all_regex(words, "^\\d+|\\d+$", "")
  #[1] "lang"        "kasverschil" "b2b"        


Using a variant of the data set provided by the OP here are benchmarks for 3 three main solutions (note that these strings are very short and contrived; results may differ on a larger, real data set):

words <- rep(c("5lang","kasverschil2","b2b"), 100000)

library(stringi)
library(microbenchmark)

GSUB <- function() gsub("^\\d+|\\d+$", "", words)
STRINGI <- function() stri_replace_all_regex(words, "^\\d+|\\d+$", "")
GREGEXPR <- function() {
    gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
    sapply(regmatches(words, mm, invert=TRUE), paste, collapse="") 
}

microbenchmark( 
    GSUB(),
    STRINGI(),
    GREGEXPR(),
    times=100L
)

## Unit: milliseconds
##        expr       min        lq    median        uq       max neval
##      GSUB()  301.0988  349.9952  396.3647  431.6493  632.7568   100
##   STRINGI()  465.9099  513.1570  569.1972  629.4176  738.4414   100
##  GREGEXPR() 5073.1960 5706.8160 6194.1070 6742.1552 7647.8904   100

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2020-12-19 10:27
              
            
            
                                                                       
You can use gsub which uses regular expressions:

gsub("^[0-9]|[0-9]$", "", words)
# [1] "lang"        "kasverschil" "b2b"


Explanation:

The pattern ^[0-9] matches any number at the beginning of a string, while the pattern [0-9]$ matches any number at the end of the string. by separating these two patterns by | you want to match either the first or the second pattern. Then, you replace the matched pattern with an empty string.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-19 10:41
              
            
            
                                                                       
Get instances where numbers appear at the beginning or end of a word and match everything else. You need to collapse results because of possible multiple matches:

gregexpr(pattern='(^[0-9]+|[0-9]+$)', text = words) -> mm
sapply(regmatches(words, mm, invert=TRUE), paste, collapse="") 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复