Create a variable that identifies the original data.frame after rbind command in R

后端 未结 6 549
后悔当初
后悔当初 2020-12-11 06:07

I am relatively new to R and I would like to know how can I create a variable (number sequence) that identifies the each of the original data.frames before being joined with

相关标签:
6条回答
  • 2020-12-11 06:26

    A fairly extensible solution:

    # test data:
    df1 <- data.frame(id=letters[1:2])
    df2 <- data.frame(id=letters[1:2])
    

    Collect your data into a list then rbind all at once:

    dfs <- c("df1","df2")
    do.call(rbind, Map("[<-", mget(dfs), TRUE, "source", dfs) )
    
    #      id source
    #df1.1  a    df1
    #df1.2  b    df1
    #df2.1  a    df2
    #df2.2  b    df2
    

    Also, note in this example that when you rbind using a named list, your rownames reference the source data. This means you can nearly get what you want using just:

    dfs <- c("df1","df2")
    do.call(rbind, mget(dfs) )
    
    #      id
    #df1.1  a
    #df1.2  b
    #df2.1  a
    #df2.2  b
    
    0 讨论(0)
  • 2020-12-11 06:32

    Why not just:

        rbind( cbind(df1, origin="df1"),
               cbind(df2,  origin='df2') )
    

    Or if you want to preserve rownames:

      rbind( cbind(df1, origin=paste("df1",rownames(df1), sep="_") ),
             cbind(df2, origin=paste("df1",rownames(df1), sep="_") ) )
    
    0 讨论(0)
  • 2020-12-11 06:33

    There's a function in the gdata package called combine that does just that.

    df1 <- data.frame(a = seq(1, 5, by = 1),
                      b = seq(21, 25, by = 1))
    
    df2 <- data.frame(a = seq(6, 10, by = 1),
                      b = seq(26, 30, by = 1))
    
    library(gdata)
    combine(df1, df2)
    
        a  b source
    1   1 21    df1
    2   2 22    df1
    3   3 23    df1
    4   4 24    df1
    5   5 25    df1
    6   6 26    df2
    7   7 27    df2
    8   8 28    df2
    9   9 29    df2
    10 10 30    df2
    
    0 讨论(0)
  • 2020-12-11 06:33

    You can use

    transform(dat, newCol = cumsum(ID == 1))
    

    where dat is the name of your data frame and ID is the name of the ID column.

    0 讨论(0)
  • 2020-12-11 06:45

    It looks like bind_rows from the dplyr package will do this too. Using maloneypatr's example:

    df1 <- data.frame(a = seq(1, 5, by = 1),
                      b = seq(21, 25, by = 1))
    
    df2 <- data.frame(a = seq(6, 10, by = 1),
                      b = seq(26, 30, by = 1))
    
    dplyr::bind_rows(df1, df2, .id = "source")
    
    Source: local data frame [10 x 3]
    
    #    source     a     b
    #     (chr) (dbl) (dbl)
    # 1       1     1    21
    # 2       1     2    22
    # 3       1     3    23
    # 4       1     4    24
    # 5       1     5    25
    # 6       2     6    26
    # 7       2     7    27
    # 8       2     8    28
    # 9       2     9    29
    # 10      2    10    30
    
    0 讨论(0)
  • 2020-12-11 06:45

    Thanks all! I ended up with a simple solution working with a friend of mine by creating an index, like this:

    index<-rep(1,times=nrow(data.frame))
    
    for (i in 1:(nrow(data.frame)-1)){
    
    if (data_frame$ID [i+1]<= data.frame$ID[i]) {
    index[i+1]<-index[i]+1
    }
    else {index[i+1]<-index[i]}}
    
    new.data.frame <- cbind(index, data.frame)
    
    0 讨论(0)
提交回复
热议问题