I\'ve heard it said that it is bad practice to use setwd()
in a script.
I can't think of any particular issues with using setwd()
in a script run on a server I manage as it does return an error which can be trapped with try(), and you can manage it. I have used setwd()
when being lazy about paths - see below!
I use file.path()
extensively in scripts production or otherwise. Working across the files in an input directory and putting the output graphics and reports elsewhere. So something along the lines of... (untested) This would be a bit tedious using setwd()
.
kInDir <- '~/Indir'
kOutDir <- '~/Outdir'
flist <- dir(path=kInDir, pattern='^[a-z]{2,5}\\.csv$')
# note I could have used full.names=T - but it's easier not to...
for (fnam in flist) {
# full path to the report file created
sfnam <- file.path(kOutDir, gsub('.csv', '_report.txt', fnam))
# full path to the csv file that will be created
ofnam <- file.path(kOutDir, gsub('.csv', '_b.csv', fnam))
#
# ok... we're going to process this CSV file...
r1 <- read.csv(file.path(kInDir, fnam))
#
# we''ll put the output from the analysis into this report file
sink(sfnam, split=TRUE)
# processs it... into a new data.frame k1
# blah blah blah...
#
write.csv(k1, file=ofnam, row.names=FALSE)
sink() # turn off this particular report file
}
To make things a bit more portable where I work we all put this in a Rprofile
hdrive=
switch(Sys.info()[[1]],
'Linux'="/mnt/hdrive",
'Windows'="H:/",
"Darwin"="/Volumes/hdrive/mnt/hdrive"
)
So i always have that variable to get me to our shared drive. Then in my script we can write
setwd(paste(hdrive,"/relative/path/",sep="/"))
So that gets us around some of the problems that others are talking about.
Though problems with setwd() have been targeted, I would like to add one more to the what are the alternatives part of the question. We often work with git where the relative path is very convenient
setrelwd <- function(rel_path){
curr_dir <- getwd()
abs_path <- file.path(curr_dir,rel_path)
if(dir.exists(abs_path)){
setwd(abs_path)
}
else
{
warning('Directory does not exist. Please create it first.')
}
}
> setrelwd("Summer2016")
Warning message:
In setrelwd("Summer2016") : Directory does not exist. Please create it first.
Also if you don't want to see the warning message but create a folder right away see Check existence of directory and create if doesn't exist
It's an issue of reproducible code. If you specify a directory that doesn't exist on someone else's computer, then they can't use your code. This is particularly bad with absolute file paths, and particularly bad with Windows file paths (which are absolutely impossible to replicate on a Unix system).
My preferred solution is to specify that the user should be in the relevant directory on their own system before starting to run the code. If for your own convenience you want to put a setwd(...)
right at the top of your code, where other people can notice it and comment it out as appropriate, but the rest of your code assumes only relative paths from that starting directory, that's OK with me.
Yihui Xie (author of knitr
) feels particularly strongly about this:
https://groups.google.com/forum/?fromgroups=#!topic/knitr/knM0VWoexT0
Whenever you want to manipulate files, they are assumed to be under the same directory of your source (e.g. Rnw documents). Then you can always use relative paths and you will never need to setwd(). Using setwd() contradicts with the principle of reproducibility, e.g. you use setwd('foo/bar/') and the directory may not exist in other people's computers. See FAQ 7: https://github.com/yihui/knitr/blob/master/FAQ.md
And from the aforementioned FAQ 7:
You'd better not do this [change working directory inside knitr code chunks]. Your working directory is always getwd() (all output files will be written here), but the code chunks are evaluated under the directory where your input document comes from. Changing working directories while running R code is a bad practice in general. See #38 for a discussion. You should also try to avoid absolute directories whenever possible (use relative directories instead), because it makes things less reproducible.
See also: https://github.com/yihui/knitr/issues/38
Toward the better alternatives question:
I mainly use R for individual projects (meaning I'm the primary analyst). However, we do use these in projects which sometimes need to be shared with others.
I have found RStudio's Projects functionality goes a long way to keeping your files organized. If other users also adopt RStudio, they will have the nice feeling of being able to open a single file ("*.Rproj") and have the project load in the same state you last saved it to.
On top of this, I've found a new tool, ProjectTemplate that goes a step further! The technique the author developed is used to provide structure to what you are doing. Please go over to the website for more detail.
I personally added the following code. I use Sys.info() and any() with unique information.
First step is to use Sys.info() and find the unique identifier for your computer.
if(any(Sys.info() == "COMPUTER1")) {
setwd("c:/Users/user1/repos/project/")
}
if(any(Sys.info() == "COMPUTER2")) {
setwd("home/user1/repos/project/")
}
and just add the name of the computer to the if statement and add the correct path. Just add a new if for each machine.
For reproduction it does not change anyone's working directory unless they are that specific user.