I have the following dataframe (df1):
ID someText PSM OtherValues ABC c 2 qwe CCC v 3 wer DDD b 56 ert EEE m 78 yu FFF
Using aggregate function seems to be better than dplyr if you want to just keep the original column names and operate inside one column at a time. Avoiding the use of summarize function,
Note from summarize function documentation
Be careful when using existing variable names; the corresponding columns will be immediately updated with the new data and this can affect subsequent operations referring to those variables.
For instance
## modified example from aggregate documentation with character variables and NAs
testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )
by <- c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12)
aggregate(x = testDF, by = list(by1), FUN = "sum")
Group.1 v1 v2
1 1 15 165
2 12 9 99
3 2 NA NA
4 big 3 33
5 blue 3 33
6 red 5 55
You get what you want, but when you use summarise and ddply you need to specify names. So if you have many columns aggregate seems to be convenient.
testDF$ID=by1
ddply(testDF, .(ID), summarize, v1=sum(v1), v2=sum(v2) )
ID v1 v2
1 1 15 165
2 12 9 99
3 2 NA NA
4 big 3 33
5 blue 3 33
6 red 5 55
7 15 165
To see the effect of the immediate update of the columns with summarize you can check the following examples,
ddply(testDF, .(ID), summarize, v1=max(v1,v2), v2=min(v1,v2) )
ID v1 v2
1 1 55 55
2 12 99 99
3 2 NA NA
4 big 33 33
5 blue 33 33
6 red 44 11
7 88 77
ddply(testDF, .(ID), summarize, v1=min(v1,v2), v2=min(v1,v2) )
ID v1 v2
1 1 5 5
2 12 9 9
3 2 NA NA
4 big 3 3
5 blue 3 3
6 red 1 1
7 7 7
Note that when V1 uses max, the col is already update when calculating v2, so for instance in the case of ID=1 we can't get the number 5 when using min in v2.