Split a string vector at whitespace

后端未结

关注

 9  1942

I have the following vector:

tmp3 <- c(\"1500 2\", \"1500 1\", \"1510 2\", \"1510 1\", \"1520 2\", \"1520 1\", \"1530 2\", 
\"1530 1\", \"1540 2\", \"1540


                      
              相关标签:


      
      
        
          9条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2020-12-08 10:17
              
            
            
                                                                       
One could use read.table on textConnection:

X <- read.table(textConnection(tmp3))


then

> str(X)
'data.frame':   10 obs. of  2 variables:
 $ V1: int  1500 1500 1510 1510 1520 1520 1530 1530 1540 1540
 $ V2: int  2 1 2 1 2 1 2 1 2 1


so X$V2 is what you need.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2020-12-08 10:22
              
            
            
                                                                       
This should do it:

library(plyr)
ldply(strsplit(tmp3, split = " "))[[2]]


If you need a numeric vector, use

as.numeric(ldply(strsplit(tmp3, split = " "))[[2]])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  無奈伤痛        
                
              
                            
                2020-12-08 10:22
              
            
            
                                                                       
Just to add two more options - using stringr::str_split() or data.table::tstrsplit()

1) using stringr::str_split()

# data posted above by the asker
tmp3 <- c("1500 2", "1500 1", "1510 2", "1510 1", "1520 2", "1520 1", "1530 2", 
          "1530 1", "1540 2", "1540 1")

library(stringr)

as.integer(
  str_split(string = tmp3, 
            pattern = "[[:space:]]", 
            simplify = TRUE)[, 2] 
)
#>  [1] 2 1 2 1 2 1 2 1 2 1


simplify = TRUE tells str_split to return a matrix, then we can index the matrix for the desired column, therefore, the [, 2] part

2) Using  data.table::tstrsplit()

library(data.table)

as.data.table(tmp3)[, tstrsplit(tmp3, split = "[[:space:]]", type.convert = TRUE)][, V2]
#>  [1] 2 1 2 1 2 1 2 1 2 1


type.convert = TRUE is responsible for the conversion to integer here, but use this with care for other datasets.
The indexing [, V2] part has a similar reason as explained above for [, 2]. Here it selects the second column of the returned data table object, which contains the values desired by the asker as integers.



sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.0  magrittr_1.5    tools_4.0.0     htmltools_0.4.0
#>  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.1  
#>  [9] highr_0.8       knitr_1.28      stringr_1.4.0   xfun_0.13      
#> [13] digest_0.6.25   rlang_0.4.6     evaluate_0.14


^{Created on 2020-05-06 by the reprex package (v0.3.0)}
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2020-12-08 10:24
              
            
            
                                                                       
It depends a little bit on how closely your actual data matches the example data you've given. I you're just trying to get everything after the space, you can use gsub:

gsub(".+\\s+", "", tmp3)
[1] "2" "1" "2" "1" "2" "1" "2" "1" "2" "1"


If you're trying to implement a rule more complicated than "take everything after the space", you'll need a more complicated regular expresion.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2020-12-08 10:24
              
            
            
                                                                       
What I think is the most elegant way to do this

>     res <- sapply(strsplit(tmp3, " "), "[[", 2)


If you need it to be an integer

>     storage.mode(res) <- "integer"

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-12-08 10:30
              
            
            
                                                                       
There's probably a better way, but here are two approaches with strsplit():

as.numeric(data.frame(strsplit(tmp3, " "))[2,])
as.numeric(lapply(strsplit(tmp3," "), function(x) x[2]))


The as.numeric() may not be necessary if you can use characters...
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复

Split a string vector at whitespace

1) using `stringr::str_split()`

2) Using `data.table::tstrsplit()`