Sum Values of Every Column in Data Frame with Conditional For Loop

问题

So I want to go through a data set and sum the values from each column based on the condition of my first column. The data and my code so far looks like this:

x    v1    v2    v3
1    0     1     5
2    4     2     10 
3    5     3     15
4    1     4     20

for(i in colnames(data)){
    if(data$x>2){
        x1 <-sum(data[[i]])
        }
    else{
        x2 <-sum(data[[i]])
        }
      }

My assumption was that the for loop would call each column by name from the data and then sum the values in each column based on whether they matched the condition of column x.

I want to sum half the values from each column and assign them to a value x1 and do the same for the remainder, assigning it to x2. I keep getting an error saying the following:

the condition has length > 1 and only the first element will be used

What am I doing wrong and is there a better way to go about this? Ideally I want a table that looks like this:

       v1    v2    v3
x1     6     7     35
x2     4     3     15

回答1:

Here's a dplyr solution. First, I define the data frame.

df <- read.table(text = "x    v1    v2    v3
1    0     1     5
2    4     2     10 
3    5     3     15
4    1     4     20", header = TRUE)  

#   x v1 v2 v3
# 1 1  0  1  5
# 2 2  4  2 10
# 3 3  5  3 15
# 4 4  1  4 20

Then, I create a label (x_check) to indicate which group each row belongs to based on your criterion (x > 2), group by this label, and summarise each column with a v in its name using sum.

# Load library
library(dplyr)

df %>% 
  mutate(x_check = ifelse(x>2, "x1", "x2")) %>% 
  group_by(x_check) %>% 
  summarise_at(vars(contains("v")), funs(sum))

# # A tibble: 2 x 4
#   x_check    v1    v2    v3
#   <chr>   <int> <int> <int>
# 1 x1          6     7    35
# 2 x2          4     3    15

回答2:

Not sure if I understood your intention correctly, but here is how you would reproduce your results with base R:

df <- data.frame(
  x = c(1:4),
  v1 = c(0, 4, 5, 1),
  v2 = 1:4,
  v3 = (1:4)*5
)

x1 <- colSums(df[df$x > 2, 2:4, drop = FALSE])
x2 <- colSums(df[df$x <= 2, 2:4, drop = FALSE])

Where

df[df$x > 2, 2:4, drop = FALSE] will create a subset of df where the rows satisfy df$x > 2 and the columns are 2:4 (meaning the second, third and fourth column), drop = FALSE is there mainly to prevent R from simplifying the results in some special cases
colSums does a by-column sum on the subsetted data.frame

If your x column was really a condition (e.g. a logical vector) you could just do

x1 <- colSums(df[df$x, 2:4, drop = FALSE])
x2 <- colSums(df[!df$x, 2:4, drop = FALSE])

Note that there is no loop needed to get to the results, with R you should use vectorized functions as much as possible.

More generally, you could do such aggregation with aggregate:

aggregate(df[, 2:4], by = list(condition = df$x <= 2), FUN = sum)

来源：https://stackoverflow.com/questions/53618329/sum-values-of-every-column-in-data-frame-with-conditional-for-loop

标签

for-loop