问题
What I would like to do is combine 2 dataframes, keeping all columns (which is not done in the example below) and input zeros where there are gaps in the dataframe from uncommon variables.
This seems like a plyr or dplyr theme. However, a full join in plyr does not keep all of the columns, whilst a left or a right join does not keep all the rows I desire. Looking at the dplyr cheatsheet (http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf), a full_join seems to be the function I need, but R does not recognise this function after succesfully loading the package.
As an example:
col1 <- c("ab","bc","cd","de")
col2 <- c(1,2,3,4)
df1 <- as.data.frame(cbind(col1,col2))
col1 <- c("ab","ef","fg","gh")
col3 <- c(5,6,7,8)
df2 <- as.data.frame(cbind(col1,col3))
library(plyr)
Example <- join(df1,df2,by = "col1", type = "full") #Does not keep col3
library(dplyr)
Example <- full_join(df1,df2,by = "col1") #Function not recognised
I would like the output...
col1 col2 col3
ab 1 5
bc 2 0
cd 3 0
de 4 0
ef 0 6
fg 0 7
gh 0 8
回答1:
The solutions
Example <- merge(df1, df2, by = "col1", all = TRUE)`
and
Example <- join(df1,df2,by = "col1", type = "full")
give the same result, both with a number of NA's:
#> Example
# col1 col2 col3
#1 ab 1 5
#2 bc 2 <NA>
#3 cd 3 <NA>
#4 de 4 <NA>
#5 ef <NA> 6
#6 fg <NA> 7
#7 gh <NA> 8
One possibility to replace those entries with zeros is to convert the data frame into a matrix, change the entries, and convert back to a data frame:
Example <- as.matrix(Example)
Example[is.na(Example)] <- 0
Example <- as.data.frame(Example)
#> Example
# col1 col2 col3
#1 ab 1 5
#2 bc 2 0
#3 cd 3 0
#4 de 4 0
#5 ef 0 6
#6 fg 0 7
#7 gh 0 8
PS: I'm almost certain that @akrun knows another way to achieve this in a single line ;)
回答2:
Following David Arenberg's comment above...
Example <- merge(df1, df2, by = "col1", all = TRUE)
来源:https://stackoverflow.com/questions/31025026/combining-two-dataframes-keeping-all-columns