问题
I'm fairly new to R and I'm trying to sum columns by groups based on their names. I have a data frame like this one:
DT <- data.frame(a011=c(0,10,20,0),a012=c(010,10,0,0),a013=c(10,30,0,10),
a021=c(10,20,20,10),a022=c(0,0,0,10),a023=c(20,0,0,0),a031=c(30,0,10,0),
a032=c(0,0,10,0),a033=c(20,0,0,0))
I would like to obtain the sum of all the columns starting with "a01", of all the columns starting with "a02" and all the columns starting with "a03":
a01tot a02tot a03tot
20 30 50
50 20 0
20 20 20
10 20 0
So far I have used
DT$a01tot <- rowSums(DT[,grep("a01", names(DT))])
and so on, but my real data frame has many more groups and I would like to avoid having to write a line of code for each group. I was wondering if it is possible to include "a01","a02","a03"... in a vector or list and have something that adds the columns "a01tot","a02tot","a03tot"... to the data frame automatically.
I know that my question is very similar to this one: R sum of rows for different group of columns that start with similar string, but the solution pointed out there,
cbind(df, t(rowsum(t(df), sub("_.*", "_t", names(df)))))
does not work in my case because there isn't a common element (like "_") to replace (I cannot change the names of the variables to a01_1, a02_2 etc.).
Switching to the "long" format is not a viable solution in my case either.
Any help will be greatly appreciated.
回答1:
You can store the patterns in a vector and loop through them. With your example you can use something like this:
patterns <- unique(substr(names(DT), 1, 3)) # store patterns in a vector
new <- sapply(patterns, function(xx) rowSums(DT[,grep(xx, names(DT)), drop=FALSE])) # loop through
# a01 a02 a03
#[1,] 20 30 50
#[2,] 50 20 0
#[3,] 20 20 20
#[4,] 10 20 0
You can adjust the names like this:
colnames(new) <- paste0(colnames(new), "tot") # rename
回答2:
Another possible solution
library(dplyr)
library(reshape2)
library(tidyr)
DT %>%
mutate(id = 1:n()) %>%
melt(id.vars = c('id')) %>%
mutate(Group = substr(variable, 1, 3)) %>%
group_by(id, Group) %>%
summarise(tot = sum(value)) %>%
spread(Group, tot) %>%
select(-id)
Results
Source: local data frame [4 x 3]
a01 a02 a03
1 20 30 50
2 50 20 0
3 20 20 20
4 10 20 0
Then as @Jota suggests colnames(new) <- paste0(colnames(new), "tot")
来源:https://stackoverflow.com/questions/32052723/sum-all-columns-whose-names-start-with-a-pattern-by-group