问题
This came up just in an answer to another question here. When you rbind
two data frames, it matches columns by name rather than index, which can lead to unexpected behavior:
> df<-data.frame(x=1:2,y=3:4)
> df
x y
1 1 3
2 2 4
> rbind(df,df[,2:1])
x y
1 1 3
2 2 4
3 1 3
4 2 4
Of course, there are workarounds. For example:
rbind(df,rename(df[,2:1],names(df)))
data.frame(rbind(as.matrix(df),as.matrix(df[,2:1])))
On edit: rename
from the plyr
package doesn't actually work this way (although I thought I had it working when I originally wrote this...). The way to do this by renaming is to use SimonO101's solution:
rbind(df,setNames(df[,2:1],names(df)))
Also, maybe surprisingly,
data.frame(rbindlist(list(df,df[,2:1])))
works by index (and if we don't mind a data table, then it's pretty concise), so this is a difference between do.call(rbind)
.
The question is, what is the most concise way to rbind
two data frames where the names don't match? I know this seems trivial, but this kind of thing can end up cluttering code. And I don't want to have to write a new function called rbindByIndex
. Ideally it would be something like rbind(df,df[,2:1],byIndex=T)
.
回答1:
You might find setNames
handy here...
rbind(df, setNames(rev(df), names(df)))
# x y
#1 1 3
#2 2 4
#3 3 1
#4 4 2
I suspect your real use-case is somewhat more complex. You can of course reorder columns in the first argument of setNames
as you wish, just use names(df)
in the second argument, so that the names of the reordered columns match the original.
回答2:
This seems pretty easy:
mapply(c,df,df[,2:1])
x y
[1,] 1 3
[2,] 2 4
[3,] 3 1
[4,] 4 2
For this simple case, though, you have to turn it back into a dataframe (because mapply
simplifies it to a matrix):
as.data.frame(mapply(c,df,df[,2:1]))
x y
1 1 3
2 2 4
3 3 1
4 4 2
Important note 1: There appears to be a downside of type coercion when your dataframe contains vectors of different types:
df<-data.frame(x=1:2,y=3:4,z=c('a','b'))
mapply(c,df,df[,c(2:1,3)])
x y z
[1,] 1 3 2
[2,] 2 4 1
[3,] 3 1 2
[4,] 4 2 1
Important note 2: It also is terrible if you have factors.
df<-data.frame(x=factor(1:2),y=factor(3:4))
mapply(c,df[,1:2],df[,2:1])
x y
[1,] 1 1
[2,] 2 2
[3,] 1 1
[4,] 2 2
So, as long as you have all numeric data, it's okay.
来源:https://stackoverflow.com/questions/19297475/simplest-way-to-get-rbind-to-ignore-column-names