Goal:
- Import the newest file (.csv) from a local directory into R
Goal Details:
- A csv file is uploaded to a folder dail
The following function uses a timestamp file to "keep track" of files that have been processed with the use of a timestamp file. It can be run either continually in an R instance (as you first suggested), or by way of single-run instances, lending to @andrew's suggestion of a cron job. (The cat()
command is included primarily for testing; feel free to remove it.)
processDir <- function(directory = '.', pattern = '*.csv', loop = FALSE, delay = 600,
stampFile = file.path(directory, '.csvProcessor')) {
if (! file.exists(stampFile))
file.create(stampFile)
firstRun <- TRUE
while (firstRun || loop) {
firstRun <- FALSE
stampTime <- file.info(stampFile)$mtime
allFilesDF <- file.info(list.files(path = directory, pattern = pattern,
full.names = TRUE, no.. = TRUE))
unprocessedFiles <- allFilesDF[(! allFilesDF$isdir) &
(allFilesDF$mtime > stampTime), ]
if (nrow(unprocessedFiles)) {
## We need to update the timestamp on stampFile quickly so
## that files added while this is running will be found in the
## next loop.
## WARNING: this blindly truncates the stampFile.
file.create(stampFile, showWarnings = FALSE)
for (fn in rownames(unprocessedFiles)) {
cat('Processing ', fn, '\n')
## read.csv(fn)
## ...
}
}
if (loop) Sys.sleep(delay)
}
}
As you initially suggested, running it in a continually-running R instance would simply be:
processDir(loop = TRUE)
To use @andrew's suggestion of a cron job, append the following line after the function definition:
processDir()
... and use a crontab file similar to the following:
# crontab
0 8 * * * path/to/Rscript path/to/processDir.R
Hope this helps.
-- readfile.R --
files <- file.info(list.files(directory))
read.csv(rownames(files)[order(files$mtime)][nrow(files)])
I'd put the above script in a cron job that runs every morning at a time when the file for the day will have been written. The below crontab runs it every morning at 8am.
-- in crontab --
0 8 * * * Rscript readfile.R
Read more about cron here.
A more efficient solution using dplyr
/magrittr
pacman::p_load(magrittr)
path <- list.files(path = directory,
pattern = "csv$",
full.names = TRUE) %>%
extract(which.max(file.mtime(.)))