For loop in R that calculates for the group and then the components

笑着哭i 提交于 2020-06-18 05:31:27

问题


I have a set of data and a loop containing numerous calculations for the data set, where the individual components of the set are split into a subset and cycled through one by one. However I need to be able to execute the same calculations across the original data set as a whole first.

For a fictional data set called masterdata with 3 components (column D1) and numerous variables (X2-X10) as such:

# masterdata
#   D1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#   A  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   C  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   C  NA NA NA NA NA NA NA NA  NA
#   C  NA NA NA NA NA NA NA NA  NA
#   A  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   A  NA NA NA NA NA NA NA NA  NA

A loop is in place to split off a subset for component A, perform the calculations, output the results and then repeat this for B and C:

Component.List = c("A", "B", "C")

for(k in 1:length(Component.List)) {        
      subdata = subset(masterdata, D1 == Component.List[k])
      # Numerous calculations performed on "subdata" within the loop
}
# End of loop

What I am trying to do is initially perform the same numerous calculations against the whole of masterdata and then start looping through the individual components.

Part of the output from the calculations is that two vectors that are created are placed into the first column of the data frames created just prior to executing the loop:

# Prior to the start of the loop two frames below created
Components = 3 # In this example 3 components in column D1 - "A", "B", "C"

Result.Frame.V1 = as.data.frame(matrix(0, nrow = 200, ncol = Components))
Result.Frame.V2 = as.data.frame(matrix(0, nrow = 200, ncol = Components))

# Loop runs and contains all of the calculations and within the calculations the last two  
# lines below place two vectors generated into the the kth columns of the frames.

Result.Frame.V1[,k] = V1.Result
Result.Frame.V2[,k] = V2.Result

# First run of the loop for "A" will place the outputs in the 1st columns 
# Second run of the loop for "B" will place the outputs in the 2nd columns, etc.
# With the expansion to also calculate against the whole group, the above data frames
# would be expanded to an extra column that would hold the result vector for the whole 
# masterdata run through the calculations 

My initial theoretical solution is to write every calculation in the loop once for masterdata and then have the above loop, however the calculations are hundreds of lines of code!

Is it possible to incorporate into the For loop a way to calculate for the original data and then continue cycling through the components?


回答1:


It seems like dplyr would solve this elegantly, among the other options

For the whole data:

library(dplyr)  
masterdata %>%
  summarise(result = your_function(arg1 = X1, arg2 = X2, ...))

For each component, just add group_by

masterdata %>%
  group_by(D1) %>%
  summarise(result = your_function(arg1 = X1, arg2 = X2, ...))



回答2:


If you are outputting dataframes then creating a function that performs your calculations when passed a dataframe, and outputs a dataframe will be key. In the below example the function is called your_function().

For simplicity a Three stage process is used, first to create the output dataframe on the overall dataset then lapply to perform the same calculations on the sub datasets. The sub datasets are then bound together into a single dataframe before finally being combined with the output of the full dataset.

note: I created a new variable called "Subset" so that the outputs are all identifiable as belonging to each distinct set.

library(dplyr)
FullSet <- your_function(masterdata) %>% mutate(Subset = "Full")

SubSets <- lapply(unique(D1), function(n){
    masterdata %>% filter(D1 == n) %>%
      your_function(.) %>% mutate(Subset = n)
  }) %>% bind_rows()

FinalSet <- bind_rows(FullSet, SubSets)

if you want to run the process in parallel for speed then use

mclapply(unique(D1), function..., mc.cores=detectCores())




回答3:


As @Ossan suggests:

  1. Wrap your code in a function

  2. Call super simple for loop (or lapply, as suggested by @Maurits Evers)

How to:

humongous_function = function(data) {
  //All the code you have written to do on 'data'
  result
}

Result.List = list()

for(k in c("A", "B", "C")) {        
  subdata = subset(masterdata, D1 == k)
  Result.List[[k]] = humongous_function(subdata)
}


来源:https://stackoverflow.com/questions/39873800/for-loop-in-r-that-calculates-for-the-group-and-then-the-components

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!