How can I combine two dataframes with different lengths in R?

ε祈祈猫儿з 提交于 2020-01-04 05:36:12

问题


I have two dataframes, like those:

and

I want to get something like that:

Here are the data frames for reproducibility:

df1 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(10, 11, 12, 13))
df2 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento","Prestaciones", "Servicios"), anualidad = rep(2014, 5), valor = c(11, 20, 8, 9))

A dataframe in which missed positions are filled with 0, because in some cases I am not getting data frames with the same amount of rows, and in those cases rbind fails, and I get an error.

Which instruccion should use to combine those dataframes?

Thanks

PS: I know I can erase the repeated rows once the data frames are together.


回答1:


Try using left_join in the dplyr package.

library(dplyr)

# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))

# use left_join
df3 <- left_join(df1, df2, by = "id")

# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0



回答2:


First of all it is not a good practice to have 2 variables with the same name or the same variable in two columns. It is best to have duplicate observations (e.g. Consmos twice in this case).

Based on that it is as simple as row binding or merging the two data frames:

df1 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(10, 11, 12, 13))
df2 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(11, 20, 8, 9))
df <- merge(df1, df2, all = TRUE)

which gives:

  descripcion_cuenta_N2 anualidad valor
1              Consumos      2014    10
2              Consumos      2014    11
3       Costes Personal      2014    11
4       Costes Personal      2014    20
5 Fungible Equipamiento      2014     8
6 Fungible Equipamiento      2014    12
7             Servicios      2014     9
8             Servicios      2014    13

This way is better as stated above.

If you insist on what you're asking you simply specify the variables to use in the merging:

df <- merge(df1, df2, by = c("descripcion_cuenta_N2", "anualidad"))

which results in:

  descripcion_cuenta_N2 anualidad valor.x valor.y
1              Consumos      2014      10      11
2       Costes Personal      2014      11      20
3 Fungible Equipamiento      2014      12       8
4             Servicios      2014      13       9

PS: It would make it easier to answer if your data frames where given in R. That is make your question reproducible and easy to answer. See How to make a great R reproducible example?




回答3:


You probably want to use something like merge():

merge(df1, df2, by=c("descripcion_cuenta_N2", "anualidad"))

In SQL lingo, you are trying to join together the two tables on the descripcion_cuenta_N2 and anualidad columns (presumably both of them).



来源:https://stackoverflow.com/questions/37257572/how-can-i-combine-two-dataframes-with-different-lengths-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!