问题
I have a set of data and a loop containing numerous calculations for the data set, where the individual components of the set are split into a subset and cycled through one by one. However I need to be able to execute the same calculations across the original data set as a whole first.
For a fictional data set called masterdata
with 3 components (column D1) and numerous variables (X2-X10) as such:
# masterdata
# D1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# A NA NA NA NA NA NA NA NA NA
# B NA NA NA NA NA NA NA NA NA
# C NA NA NA NA NA NA NA NA NA
# B NA NA NA NA NA NA NA NA NA
# B NA NA NA NA NA NA NA NA NA
# C NA NA NA NA NA NA NA NA NA
# C NA NA NA NA NA NA NA NA NA
# A NA NA NA NA NA NA NA NA NA
# B NA NA NA NA NA NA NA NA NA
# A NA NA NA NA NA NA NA NA NA
A loop is in place to split off a subset for component A, perform the calculations, output the results and then repeat this for B and C:
Component.List = c("A", "B", "C")
for(k in 1:length(Component.List)) {
subdata = subset(masterdata, D1 == Component.List[k])
# Numerous calculations performed on "subdata" within the loop
}
# End of loop
What I am trying to do is initially perform the same numerous calculations against the whole of masterdata
and then start looping through the individual components.
Part of the output from the calculations is that two vectors that are created are placed into the first column of the data frames created just prior to executing the loop:
# Prior to the start of the loop two frames below created
Components = 3 # In this example 3 components in column D1 - "A", "B", "C"
Result.Frame.V1 = as.data.frame(matrix(0, nrow = 200, ncol = Components))
Result.Frame.V2 = as.data.frame(matrix(0, nrow = 200, ncol = Components))
# Loop runs and contains all of the calculations and within the calculations the last two
# lines below place two vectors generated into the the kth columns of the frames.
Result.Frame.V1[,k] = V1.Result
Result.Frame.V2[,k] = V2.Result
# First run of the loop for "A" will place the outputs in the 1st columns
# Second run of the loop for "B" will place the outputs in the 2nd columns, etc.
# With the expansion to also calculate against the whole group, the above data frames
# would be expanded to an extra column that would hold the result vector for the whole
# masterdata run through the calculations
My initial theoretical solution is to write every calculation in the loop once for masterdata and then have the above loop, however the calculations are hundreds of lines of code!
Is it possible to incorporate into the For loop a way to calculate for the original data and then continue cycling through the components?
回答1:
It seems like dplyr would solve this elegantly, among the other options
For the whole data:
library(dplyr)
masterdata %>%
summarise(result = your_function(arg1 = X1, arg2 = X2, ...))
For each component, just add group_by
masterdata %>%
group_by(D1) %>%
summarise(result = your_function(arg1 = X1, arg2 = X2, ...))
回答2:
If you are outputting dataframes then creating a function that performs your calculations when passed a dataframe, and outputs a dataframe will be key. In the below example the function is called your_function()
.
For simplicity a Three stage process is used, first to create the output dataframe on the overall dataset then lapply to perform the same calculations on the sub datasets. The sub datasets are then bound together into a single dataframe before finally being combined with the output of the full dataset.
note: I created a new variable called "Subset" so that the outputs are all identifiable as belonging to each distinct set.
library(dplyr)
FullSet <- your_function(masterdata) %>% mutate(Subset = "Full")
SubSets <- lapply(unique(D1), function(n){
masterdata %>% filter(D1 == n) %>%
your_function(.) %>% mutate(Subset = n)
}) %>% bind_rows()
FinalSet <- bind_rows(FullSet, SubSets)
if you want to run the process in parallel for speed then use
mclapply(unique(D1), function..., mc.cores=detectCores())
回答3:
As @Ossan suggests:
Wrap your code in a function
Call super simple
for
loop (orlapply
, as suggested by @Maurits Evers)
How to:
humongous_function = function(data) {
//All the code you have written to do on 'data'
result
}
Result.List = list()
for(k in c("A", "B", "C")) {
subdata = subset(masterdata, D1 == k)
Result.List[[k]] = humongous_function(subdata)
}
来源:https://stackoverflow.com/questions/39873800/for-loop-in-r-that-calculates-for-the-group-and-then-the-components