Extracting nth element from a nested list following strsplit - R

前端 未结 4 1069
醉梦人生
醉梦人生 2021-01-02 17:09

I\'ve been trying to understand how to deal with the output of strsplit a bit better. I often have data such as this that I wish to split:

myd         


        
相关标签:
4条回答
  • 2021-01-02 17:27

    Try this:

    > read.table(text = mydata, sep = "/", as.is = TRUE, fill = TRUE)
       V1 V2 V3
    1 144  4  5
    2 154  2 NA
    3 146  3  5
    4 142 NA NA
    5 143  4 NA
    6 DNB NA NA
    7  90 NA NA
    

    If you want to treat DNB as an NA then add the argument na.strings="DNB" .

    If you really want to use strsplit then try this:

    > do.call(rbind, lapply(strsplit(mydata, "/"), function(x) head(c(x,NA,NA), 3)))
         [,1]  [,2] [,3]
    [1,] "144" "4"  "5" 
    [2,] "154" "2"  NA  
    [3,] "146" "3"  "5" 
    [4,] "142" NA   NA  
    [5,] "143" "4"  NA  
    [6,] "DNB" NA   NA  
    [7,] "90"  NA   NA  
    

    Note: Using alexis_laz's observation that x[i] returns NA if i is not in 1:length(x) the last line of code above could be simplified to:

    t(sapply(strsplit(mydata, "/"), "[", 1:3))
    
    0 讨论(0)
  • 2021-01-02 17:31

    You can assign the length inside sapply, resulting in NA where the current length is shorter than the assigned length.

    s <- strsplit(mydata, "/")
    sapply(s, function(x) { length(x) <- 3; x[2] })
    # [1] "4" "2" "3" NA  "4" NA  NA 
    

    Then you can add a second indexing argument with mapply

    m <- max(sapply(s, length))
    mapply(function(x, y, z) { length(x) <- z; x[y] }, s, 2, m)
    # [1] "4" "2" "3" NA  "4" NA  NA 
    
    0 讨论(0)
  • 2021-01-02 17:38

    You could use regex (if it is allowed)

     library(stringr)
     str_extract(mydata , perl("(?<=\\d/)\\d+"))
     #[1] "4" "2" "3" NA  "4" NA  NA 
     str_extract(mydata , perl("(?<=/\\d/)\\d+"))
    #[1] "5" NA  "5" NA  NA  NA  NA 
    
    0 讨论(0)
  • 2021-01-02 17:40

    (at least regarding 1D vectors) [ seems to return NA when "i > length(x)" whereas [[ returns an error.

    x = runif(5)
    x[6]
    #[1] NA
    x[[6]]
    #Error in x[[6]] : subscript out of bounds
    

    Digging a bit, do_subset_dflt (i.e. [) calls ExtractSubset where we notice that when a wanted index ("ii") is "> length(x)" NA is returned (a bit modified to be clean):

    if(0 <= ii && ii < nx && ii != NA_INTEGER)
        result[i] = x[ii];
    else
        result[i] = NA_INTEGER;
    

    On the other hand do_subset2_dflt (i.e. [[) returns an error if the wanted index ("offset") is "> length(x)" (modified a bit to be clean):

    if(offset < 0 || offset >= xlength(x)) {
        if(offset < 0 && (isNewList(x)) ...
        else errorcall(call, R_MSG_subs_o_b);
    }
    

    where #define R_MSG_subs_o_b _("subscript out of bounds")

    (I'm not sure about the above code snippets but they do seem relevant based on their returns)

    0 讨论(0)
提交回复
热议问题