问题
I am having 2 data frames in the below format:
dt1
id col1 col2 col3 col4
___ ____ ____ _____ _____
1 2 3 1 2
2 3 4 1 1
3 1 1 1 1
4 1 2 1 2
5 1 1 1 1
6 1 2 1 2
dt2
id col1 col2 col3 col4
___ ____ ____ _____ _____
1 1 3 1 2
2 3 4 1 0
4 1 1 1 1
6 1 2 1 2
9 2 1 1 1
12 1 2 1 2
and I want to aggregate and combine these two data frames by the id and the resulting dataframe like
dt3
id col1 col2 col3 col4
___ ____ ____ _____ _____
1 3 6 2 4
2 6 8 2 1
3 1 1 1 1
4 2 3 2 3
5 1 1 1 1
6 2 4 2 4
9 2 1 1 1
12 1 2 1 2
I tried with dt3=merge(dt1,dt2,all=TRUE)
but did not work.Also tried dt3=merge(dt1,dt2,by=id)
too did not work.Any help is appreciated.
回答1:
We can use rbindlist
in data.table
and get the sum
of each column after grouping by 'id'
library(data.table)
rbindlist(mget(paste0('dt', 1:2)))[, lapply(.SD, sum), by = id]
# id col1 col2 col3 col4
#1: 1 3 6 2 4
#2: 2 6 8 2 1
#3: 3 1 1 1 1
#4: 4 2 3 2 3
#5: 5 1 1 1 1
#6: 6 2 4 2 4
#7: 9 2 1 1 1
#8: 12 1 2 1 2
Or using bind_rows
with group_by
and summarise_each
from tidyverse
librarydplyr)
bind_rows(dt1, dt2) %>%
group_by(id) %>%
summarise_each(funs(sum))
回答2:
The magic word you're looking for is rbind
:
dt3 = rbind(dt1, dt2)
回答3:
Since they have the same format and the columns match up put them in row by row.
dt3 <- data.frame(dt1)
dt3 <- rbind(dt2) # rbind lines up your observations row by row.
You could probably put that all in one line
dt3 <- data.frame(rbind(dt1, dt2))
回答4:
Here is a dplyr
solution:
library(dplyr)
bind_rows(dt1, dt2) %>% group_by(id) %>%
summarise_all(sum)
Data
dt1 <- structure(
list(id = 1:6, col1 = c(2L, 3L, 1L, 1L, 1L, 1L),
col2 = c(3L, 4L, 1L, 2L, 1L, 2L),
col3 = c(1L, 1L, 1L, 1L, 1L, 1L),
col4 = c(2L, 1L, 1L, 2L, 1L, 2L)),
.Names = c("id", "col1", "col2", "col3", "col4"),
class = "data.frame", row.names = c(NA, -6L))
dt2 <- structure(
list(id = c(1L, 2L, 4L, 6L, 9L, 12L),
col1 = c(1L, 3L, 1L, 1L, 2L, 1L),
col2 = c(3L, 4L, 1L, 2L, 1L, 2L),
col3 = c(1L, 1L, 1L, 1L, 1L, 1L),
col4 = c(2L, 0L, 1L, 2L, 1L, 2L)),
.Names = c("id", "col1", "col2", "col3", "col4"),
class = "data.frame", row.names = c(NA, -6L))
来源:https://stackoverflow.com/questions/42966535/combine-two-data-frames-and-aggregate