Essentially, I have constructed a sizable predictive model in R with about 10~15 separate script files for collecting, sorting, analyzing and presenting my data. Rather than just put everything into one gigantic script file, I would like to maintain some level of modularity and run each piece from a control script, or some kind of comparable control mechanism, as I've done in matlab before. Is this possible in R?
I have read this thread as well as its related threads, but couldn't find this exact answer. Organizing R Source Code
I think you're simply looking for the source
function. See ?source
. I usually have a master script which source
other .R
files.
I am a new developer and I am answering with an example that worked for me because no one has given an example. Example of using source("myscript.R"), to call another R script "myscript_A.R" or "myscript_B.R" is as follows-
if(condition==X){
source("myscript_A.R")
}else{
source("myscript_B.R")
}
Although I understand your need for modularity, why not simply create a single script for the run of interest. Sourcing multiple scripts results in complexities of not being able to pass variables across scripts unless you write to files (which wastes CPU cycles). You could even build a master script that would read the text contents of each script and then create the master script and then run that script.
I've done what you described and split up chunks of code in separate R files and have been running source(this) and source(that), but I've been painfully learning that sourcing functions (rather than subroutines/script files) is the better way to go.
Here are 3 possible reasons why we might have developed their scripts in this way and stuck to it, and 3 reasons why switching to functions makes sense:
1) We wanted to debug directly when a script went wrong (be able to track all variables and their status in the single global environment).
- I've now realized that RStudio's debugger / traceback is a much better way to do true debugging.
2a) We didn't know what variables needed to be kept for later (didn't want to keep track of which variables to put into functions and which variables to output from functions)
- Functions help force us to be explicit about what gets used in one part of a script and what doesn't, and what is essential to keep from a part of a script, since it's unnecessary to output every part of it. Variables are better kept in only the environments they are needed, rather than everything passed in and out of the global environment.
- Also, I think environments can be act as lists, so I think it's possible to throw a whole environment into functions and out?? Need to do more reading/learning about this.
2b) We have a large number of variables for everything (parameters/variables, settings, different parts of data) so it's impractical to stuff everything in and out of functions.
- With structures like lists, we can lump categories of variables together and send them into functions. Functions can also return lists (rather than variables).
Related SO:
来源:https://stackoverflow.com/questions/25273166/combine-and-run-multiple-r-scripts-from-another-script