Sum Values of Every Column in Data Frame with Conditional For Loop

喜你入骨 提交于 2021-02-10 23:27:32

问题


So I want to go through a data set and sum the values from each column based on the condition of my first column. The data and my code so far looks like this:

x    v1    v2    v3
1    0     1     5
2    4     2     10 
3    5     3     15
4    1     4     20

for(i in colnames(data)){
    if(data$x>2){
        x1 <-sum(data[[i]])
        }
    else{
        x2 <-sum(data[[i]])
        }
      }

My assumption was that the for loop would call each column by name from the data and then sum the values in each column based on whether they matched the condition of column x.

I want to sum half the values from each column and assign them to a value x1 and do the same for the remainder, assigning it to x2. I keep getting an error saying the following:

the condition has length > 1 and only the first element will be used

What am I doing wrong and is there a better way to go about this? Ideally I want a table that looks like this:

       v1    v2    v3
x1     6     7     35
x2     4     3     15

回答1:


Here's a dplyr solution. First, I define the data frame.

df <- read.table(text = "x    v1    v2    v3
1    0     1     5
2    4     2     10 
3    5     3     15
4    1     4     20", header = TRUE)  

#   x v1 v2 v3
# 1 1  0  1  5
# 2 2  4  2 10
# 3 3  5  3 15
# 4 4  1  4 20

Then, I create a label (x_check) to indicate which group each row belongs to based on your criterion (x > 2), group by this label, and summarise each column with a v in its name using sum.

# Load library
library(dplyr)

df %>% 
  mutate(x_check = ifelse(x>2, "x1", "x2")) %>% 
  group_by(x_check) %>% 
  summarise_at(vars(contains("v")), funs(sum))

# # A tibble: 2 x 4
#   x_check    v1    v2    v3
#   <chr>   <int> <int> <int>
# 1 x1          6     7    35
# 2 x2          4     3    15



回答2:


Not sure if I understood your intention correctly, but here is how you would reproduce your results with base R:

df <- data.frame(
  x = c(1:4),
  v1 = c(0, 4, 5, 1),
  v2 = 1:4,
  v3 = (1:4)*5
)

x1 <- colSums(df[df$x > 2, 2:4, drop = FALSE])
x2 <- colSums(df[df$x <= 2, 2:4, drop = FALSE])

Where

  • df[df$x > 2, 2:4, drop = FALSE] will create a subset of df where the rows satisfy df$x > 2 and the columns are 2:4 (meaning the second, third and fourth column), drop = FALSE is there mainly to prevent R from simplifying the results in some special cases
  • colSums does a by-column sum on the subsetted data.frame

If your x column was really a condition (e.g. a logical vector) you could just do

x1 <- colSums(df[df$x, 2:4, drop = FALSE])
x2 <- colSums(df[!df$x, 2:4, drop = FALSE])

Note that there is no loop needed to get to the results, with R you should use vectorized functions as much as possible.

More generally, you could do such aggregation with aggregate:

aggregate(df[, 2:4], by = list(condition = df$x <= 2), FUN = sum)


来源:https://stackoverflow.com/questions/53618329/sum-values-of-every-column-in-data-frame-with-conditional-for-loop

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!