I am relatively new to R and I would like to know how can I create a variable (number sequence) that identifies the each of the original data.frames before being joined with
A fairly extensible solution:
# test data:
df1 <- data.frame(id=letters[1:2])
df2 <- data.frame(id=letters[1:2])
Collect your data into a list then rbind
all at once:
dfs <- c("df1","df2")
do.call(rbind, Map("[<-", mget(dfs), TRUE, "source", dfs) )
# id source
#df1.1 a df1
#df1.2 b df1
#df2.1 a df2
#df2.2 b df2
Also, note in this example that when you rbind
using a named list, your rownames reference the source data. This means you can nearly get what you want using just:
dfs <- c("df1","df2")
do.call(rbind, mget(dfs) )
# id
#df1.1 a
#df1.2 b
#df2.1 a
#df2.2 b
Why not just:
rbind( cbind(df1, origin="df1"),
cbind(df2, origin='df2') )
Or if you want to preserve rownames:
rbind( cbind(df1, origin=paste("df1",rownames(df1), sep="_") ),
cbind(df2, origin=paste("df1",rownames(df1), sep="_") ) )
There's a function in the gdata
package called combine
that does just that.
df1 <- data.frame(a = seq(1, 5, by = 1),
b = seq(21, 25, by = 1))
df2 <- data.frame(a = seq(6, 10, by = 1),
b = seq(26, 30, by = 1))
library(gdata)
combine(df1, df2)
a b source
1 1 21 df1
2 2 22 df1
3 3 23 df1
4 4 24 df1
5 5 25 df1
6 6 26 df2
7 7 27 df2
8 8 28 df2
9 9 29 df2
10 10 30 df2
You can use
transform(dat, newCol = cumsum(ID == 1))
where dat
is the name of your data frame and ID
is the name of the ID column.
It looks like bind_rows
from the dplyr
package will do this too. Using maloneypatr's example:
df1 <- data.frame(a = seq(1, 5, by = 1),
b = seq(21, 25, by = 1))
df2 <- data.frame(a = seq(6, 10, by = 1),
b = seq(26, 30, by = 1))
dplyr::bind_rows(df1, df2, .id = "source")
Source: local data frame [10 x 3]
# source a b
# (chr) (dbl) (dbl)
# 1 1 1 21
# 2 1 2 22
# 3 1 3 23
# 4 1 4 24
# 5 1 5 25
# 6 2 6 26
# 7 2 7 27
# 8 2 8 28
# 9 2 9 29
# 10 2 10 30
Thanks all! I ended up with a simple solution working with a friend of mine by creating an index, like this:
index<-rep(1,times=nrow(data.frame))
for (i in 1:(nrow(data.frame)-1)){
if (data_frame$ID [i+1]<= data.frame$ID[i]) {
index[i+1]<-index[i]+1
}
else {index[i+1]<-index[i]}}
new.data.frame <- cbind(index, data.frame)