Import newest csv file in directory

前端未结

关注

 3  1570

慢半拍i

Goal:
- Import the newest file (.csv) from a local directory into R

Goal Details:
- A csv file is uploaded to a folder dail

相关标签:

3条回答

深忆病人

2021-01-14 17:17

The following function uses a timestamp file to "keep track" of files that have been processed with the use of a timestamp file. It can be run either continually in an R instance (as you first suggested), or by way of single-run instances, lending to @andrew's suggestion of a cron job. (The cat() command is included primarily for testing; feel free to remove it.)

processDir <- function(directory = '.', pattern = '*.csv', loop = FALSE, delay = 600,
                       stampFile = file.path(directory, '.csvProcessor')) {
    if (! file.exists(stampFile))
        file.create(stampFile)
    firstRun <- TRUE
    while (firstRun || loop) {
        firstRun <- FALSE
        stampTime <- file.info(stampFile)$mtime
        allFilesDF <- file.info(list.files(path = directory, pattern = pattern,
                                           full.names = TRUE, no.. = TRUE))
        unprocessedFiles <- allFilesDF[(! allFilesDF$isdir) &
                                       (allFilesDF$mtime > stampTime), ]
        if (nrow(unprocessedFiles)) {
            ## We need to update the timestamp on stampFile quickly so
            ## that files added while this is running will be found in the
            ## next loop.
            ## WARNING: this blindly truncates the stampFile.
            file.create(stampFile, showWarnings = FALSE)
            for (fn in rownames(unprocessedFiles)) {
                cat('Processing ', fn, '\n')
                ## read.csv(fn)
                ## ...
            }
        }
        if (loop) Sys.sleep(delay)
    }
}

As you initially suggested, running it in a continually-running R instance would simply be:

processDir(loop = TRUE)

To use @andrew's suggestion of a cron job, append the following line after the function definition:

processDir()

... and use a crontab file similar to the following:

# crontab
0 8 * * * path/to/Rscript path/to/processDir.R

Hope this helps.

0 讨论(0)

离开以前

2021-01-14 17:26
-- readfile.R --
```
files <- file.info(list.files(directory))
read.csv(rownames(files)[order(files$mtime)][nrow(files)])
```
I'd put the above script in a cron job that runs every morning at a time when the file for the day will have been written. The below crontab runs it every morning at 8am.

-- in crontab --
```
0 8 * * *  Rscript readfile.R
```
Read more about cron here.
0 讨论(0)
发布评论:

提交评论
- 加载中...

[愿得一人]

2021-01-14 17:32

A more efficient solution using dplyr/magrittr

pacman::p_load(magrittr)

path <- list.files(path = directory,
                   pattern = "csv$",
                   full.names = TRUE) %>%
  extract(which.max(file.mtime(.)))

0 讨论(0)