strsplit by row and distribute results by column in data.frame

前端 未结 5 1928
不思量自难忘°
不思量自难忘° 2021-02-20 15:38

So I have the data.frame

dat = data.frame(x = c(\'Sir Lancelot the Brave\', \'King Arthur\',  
                       \'The Black Knight\', \'The Rabbit\'), stri         


        
相关标签:
5条回答
  • 2021-02-20 16:16

    Here is a nice and simple approach with tidyr.

    library(tidyr)
    
    ncol <- max(sapply(dat, length))
    
    dat %>%
      separate(x, paste0("V", seq(1,ncol)))
    

    Note: You will get a warning, however, it is basically telling you that separate is padding the data with NA's. So you can ignore the warning.

    0 讨论(0)
  • 2021-02-20 16:17

    Here's one option. The single complication is that you need to first convert each vector to a data.frame with one row, as data.frames are what rbind.fill() expects.

    library(plyr)
    rbind.fill(lapply(sbt, function(X) data.frame(t(X))))
    #     X1       X2     X3    X4
    # 1  Sir Lancelot    the Brave
    # 2 King   Arthur   <NA>  <NA>
    # 3  The    Black Knight  <NA>
    # 4  The   Rabbit   <NA>  <NA>
    

    My own inclination, though, would be to just use base R, like this:

    n <- max(sapply(sbt, length))
    l <- lapply(sbt, function(X) c(X, rep(NA, n - length(X))))
    data.frame(t(do.call(cbind, l)))
    #     X1       X2     X3    X4
    # 1  Sir Lancelot    the Brave
    # 2 King   Arthur   <NA>  <NA>
    # 3  The    Black Knight  <NA>
    # 4  The   Rabbit   <NA>  <NA>
    
    0 讨论(0)
  • 2021-02-20 16:33
    sbt = strsplit(dat$x, " ")
    sbt
    #[[1]]
    #[1] "Sir"      "Lancelot" "the"      "Brave"   
    #[[2]]
    #[1] "King"   "Arthur"
    #[[3]]
    #[1] "The"    "Black"  "Knight"
    #[[4]]
    #[1] "The"    "Rabbit"
    
    ncol = max(sapply(sbt,length))
    ncol
    # [1] 4
    
    as.data.table(lapply(1:ncol,function(i)sapply(sbt,"[",i)))
    #      V1       V2     V3    V4
    # 1:  Sir Lancelot    the Brave
    # 2: King   Arthur     NA    NA
    # 3:  The    Black Knight    NA
    # 4:  The   Rabbit     NA    NA
    
    0 讨论(0)
  • 2021-02-20 16:39

    Using data.table as it appears you are trying to use it.

    library(data.table)
    DT <- data.table(dat)
    DTB <- DT[, list(y = unlist(strsplit(x, ' '))), by = x]
    
    new <- rep(NA_character_,  DTB[,.N,by =x][which.max(N), N])
    names(new) <- paste0('V', seq_along(new))
    DTB[,{.new <- new 
          .new[seq_len(.N)] <- y 
           as.list(.new)} ,by= x]
    

    Or using reshape2 dcast to reshape

    library(reshape2)
    
    dcast(DTB[,list(id = seq_len(.N),y),by= x ], x ~id, value.var = 'y')
    
    0 讨论(0)
  • 2021-02-20 16:43

    This is an old question, I know, but I thought I would share two additional options.

    Option 1

    concat.split from my "splitstackshape" package was designed exactly for this type of thing.

    library(splitstackshape)
    concat.split(dat, "x", " ")
    #                        x  x_1      x_2    x_3   x_4
    # 1 Sir Lancelot the Brave  Sir Lancelot    the Brave
    # 2            King Arthur King   Arthur             
    # 3       The Black Knight  The    Black Knight      
    # 4             The Rabbit  The   Rabbit        
    

    Option 2

    data.table has recently (as of version 1.8.11, I believe) had some additions to its arsenal, notably in this case dcast.data.table. To use it, unlist the split data (as was done in @mnel's answer), create a "time" variable using .N (how many new values per row), and use dcast.data.table to transform the data into the form you are looking for.

    library(data.table)
    library(reshape2)
    packageVersion("data.table")
    # [1] ‘1.8.11’
    
    DT <- data.table(dat)
    S1 <- DT[, list(X = unlist(strsplit(x, " "))), by = seq_len(nrow(DT))]
    S1[, Time := sequence(.N), by = seq_len]
    dcast.data.table(S1, seq_len ~ Time, value.var="X")
    #    seq_len    1        2      3     4
    # 1:       1  Sir Lancelot    the Brave
    # 2:       2 King   Arthur     NA    NA
    # 3:       3  The    Black Knight    NA
    # 4:       4  The   Rabbit     NA    NA
    
    0 讨论(0)
提交回复
热议问题