Split a string vector at whitespace

后端 未结 9 1934
独厮守ぢ
独厮守ぢ 2020-12-08 09:35

I have the following vector:

tmp3 <- c(\"1500 2\", \"1500 1\", \"1510 2\", \"1510 1\", \"1520 2\", \"1520 1\", \"1530 2\", 
\"1530 1\", \"1540 2\", \"1540         


        
相关标签:
9条回答
  • 2020-12-08 10:17

    One could use read.table on textConnection:

    X <- read.table(textConnection(tmp3))
    

    then

    > str(X)
    'data.frame':   10 obs. of  2 variables:
     $ V1: int  1500 1500 1510 1510 1520 1520 1530 1530 1540 1540
     $ V2: int  2 1 2 1 2 1 2 1 2 1
    

    so X$V2 is what you need.

    0 讨论(0)
  • 2020-12-08 10:22

    This should do it:

    library(plyr)
    ldply(strsplit(tmp3, split = " "))[[2]]
    

    If you need a numeric vector, use

    as.numeric(ldply(strsplit(tmp3, split = " "))[[2]])
    
    0 讨论(0)
  • 2020-12-08 10:22

    Just to add two more options - using stringr::str_split() or data.table::tstrsplit()

    1) using stringr::str_split()

    # data posted above by the asker
    tmp3 <- c("1500 2", "1500 1", "1510 2", "1510 1", "1520 2", "1520 1", "1530 2", 
              "1530 1", "1540 2", "1540 1")
    
    library(stringr)
    
    as.integer(
      str_split(string = tmp3, 
                pattern = "[[:space:]]", 
                simplify = TRUE)[, 2] 
    )
    #>  [1] 2 1 2 1 2 1 2 1 2 1
    

    simplify = TRUE tells str_split to return a matrix, then we can index the matrix for the desired column, therefore, the [, 2] part

    2) Using data.table::tstrsplit()

    library(data.table)
    
    as.data.table(tmp3)[, tstrsplit(tmp3, split = "[[:space:]]", type.convert = TRUE)][, V2]
    #>  [1] 2 1 2 1 2 1 2 1 2 1
    

    type.convert = TRUE is responsible for the conversion to integer here, but use this with care for other datasets. The indexing [, V2] part has a similar reason as explained above for [, 2]. Here it selects the second column of the returned data table object, which contains the values desired by the asker as integers.

    sessionInfo()
    #> R version 4.0.0 (2020-04-24)
    #> Platform: x86_64-w64-mingw32/x64 (64-bit)
    #> Running under: Windows 10 x64 (build 18362)
    #> 
    #> Matrix products: default
    #> 
    #> locale:
    #> [1] LC_COLLATE=English_United States.1252 
    #> [2] LC_CTYPE=English_United States.1252   
    #> [3] LC_MONETARY=English_United States.1252
    #> [4] LC_NUMERIC=C                          
    #> [5] LC_TIME=English_United States.1252    
    #> 
    #> attached base packages:
    #> [1] stats     graphics  grDevices utils     datasets  methods   base     
    #> 
    #> loaded via a namespace (and not attached):
    #>  [1] compiler_4.0.0  magrittr_1.5    tools_4.0.0     htmltools_0.4.0
    #>  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.1  
    #>  [9] highr_0.8       knitr_1.28      stringr_1.4.0   xfun_0.13      
    #> [13] digest_0.6.25   rlang_0.4.6     evaluate_0.14
    

    Created on 2020-05-06 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2020-12-08 10:24

    It depends a little bit on how closely your actual data matches the example data you've given. I you're just trying to get everything after the space, you can use gsub:

    gsub(".+\\s+", "", tmp3)
    [1] "2" "1" "2" "1" "2" "1" "2" "1" "2" "1"
    

    If you're trying to implement a rule more complicated than "take everything after the space", you'll need a more complicated regular expresion.

    0 讨论(0)
  • 2020-12-08 10:24

    What I think is the most elegant way to do this

    >     res <- sapply(strsplit(tmp3, " "), "[[", 2)
    

    If you need it to be an integer

    >     storage.mode(res) <- "integer"
    
    0 讨论(0)
  • 2020-12-08 10:30

    There's probably a better way, but here are two approaches with strsplit():

    as.numeric(data.frame(strsplit(tmp3, " "))[2,])
    as.numeric(lapply(strsplit(tmp3," "), function(x) x[2]))
    

    The as.numeric() may not be necessary if you can use characters...

    0 讨论(0)
提交回复
热议问题