I would like to merge two data frames, but do not want to duplicate rows if there is more than one match. Instead I would like to sum the observations on that day.
I'd suggest you merge them and then aggregate them (essentially perform a SUM for each unique Date
).
df <- merge(z.days,obs.days, by.x="Date", by.y="Date", all.x=TRUE)
Date Count
1 2012-01-01 NA
2 2012-01-02 1
3 2012-01-03 1
4 2012-01-03 1
5 2012-01-04 NA
Now to do the merge you could use aggregate
:
df2 <- aggregate(df$Count,list(df$Date),sum)
Group.1 x
1 2012-01-01 NA
2 2012-01-02 1
3 2012-01-03 2
4 2012-01-04 NA
names(df2)<-names(df)
BUT I'd recommend package plyr
, which is awesome! In particular, function ddply
.
library(plyr)
ddply(df,.(Date),function(x) data.frame(Date=x$Date[1],Count=sum(x$Count)))
Date Count
1 2012-01-01 NA
2 2012-01-02 1
3 2012-01-03 2
4 2012-01-04 NA
The command ddply(df,.(Date),FUN)
essentially does:
for each date in unique(df$Date):
add to output dataframe FUN( df[df$Date==date,] )
So the function I've provided creates a data frame of one row with columns Date
and Count
, being the sum of all counts for that date.