R programming: Combining Two Data Frames

偶尔善良 提交于 2019-12-12 04:02:08

问题


Folks,

I would like to concatenate or merge if you will 2 data frames df1 and df2. My goal is as simple as making a new data frame whose columns is a union of those of df1 and df2.

Example

product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)

df1 = data.frame(product, skew, version)
df2 = data.frame(product, skew, color, price)

My desire is to get the results as below.

I have tried a few options:

#option 1 with cbind
df <- cbind(df1,df2)

This returns a dataframe duplicated columns "product" and "skew".

# Option 2, use data.frame
df <- data.frame(df1,df2)

This gave me pretty much what I want, except that it had extra columns for "product" and "skew". They are suffixed with a ".1" though, so there is no duplicaton.

# option 3, use merge which seems to be the way to go
df <- merge(df1,df2) 

I think I am missing something with merge because this has actually created a union out of all the data set, making a total of 128 observations out of the 32 provided. I guess that's how merge works. I have run a "?merge" and tried a few options but could not get it to spit what I want.

So my question is:

What is the best way of getting my desired dataframe out of the df1 and df2 as above ?

Thx in advance for your help ! Riad.

     product skew  version color price
1       p1    b     0.1    C1     1
2       p1    b     0.1    C2     2
3       p1    b     0.2    C1     3
4       p1    b     0.2    C2     4
5       p1    a     0.1    C1     5
6       p1    a     0.1    C2     6
7       p1    a     0.2    C1     7
8       p1    a     0.2    C2     8
9       p2    b     0.1    C1     9
10      p2    b     0.1    C2    10
11      p2    b     0.2    C1    11
12      p2    b     0.2    C2    12
13      p2    a     0.1    C1    13
14      p2    a     0.1    C2    14
15      p2    a     0.2    C1    15
16      p2    a     0.2    C2    16
17      p3    b     0.1    C1    17
18      p3    b     0.1    C2    18
19      p3    b     0.2    C1    19
20      p3    b     0.2    C2    20
21      p3    a     0.1    C1    21
22      p3    a     0.1    C2    22
23      p3    a     0.2    C1    23
24      p3    a     0.2    C2    24
25      p4    b     0.1    C1    25
26      p4    b     0.1    C2    26
27      p4    b     0.2    C1    27
28      p4    b     0.2    C2    28
29      p4    a     0.1    C1    29
30      p4    a     0.1    C2    30
31      p4    a     0.2    C1    31
32      p4    a     0.2    C2    32

回答1:


merge() does not work the way you want because your columns "product" and "skew" are no unique identifiers. The combinations occur multiple times. So merge() computes each possible combination. You can either include a third column as an id:

product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)
id = 1:32

df1 = data.frame(product, skew, id, version)
df2 = data.frame(product, skew, id, color, price)
merge(df1, df2)

Or you merge your data.frames manually:

cbind(df1, df2[, 3:4])



回答2:


You can use union() but it will mess up the column names.

df_c <- union(df1, df2)
names(df_c) <- union(names(df1), names(df2))
df_c <- as.data.frame(df_c)


来源:https://stackoverflow.com/questions/20344414/r-programming-combining-two-data-frames

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!