Import newest csv file in directory

前端 未结 3 1565
慢半拍i
慢半拍i 2021-01-14 16:47

Goal:
- Import the newest file (.csv) from a local directory into R

Goal Details:
- A csv file is uploaded to a folder dail

相关标签:
3条回答
  • 2021-01-14 17:17

    The following function uses a timestamp file to "keep track" of files that have been processed with the use of a timestamp file. It can be run either continually in an R instance (as you first suggested), or by way of single-run instances, lending to @andrew's suggestion of a cron job. (The cat() command is included primarily for testing; feel free to remove it.)

    processDir <- function(directory = '.', pattern = '*.csv', loop = FALSE, delay = 600,
                           stampFile = file.path(directory, '.csvProcessor')) {
        if (! file.exists(stampFile))
            file.create(stampFile)
        firstRun <- TRUE
        while (firstRun || loop) {
            firstRun <- FALSE
            stampTime <- file.info(stampFile)$mtime
            allFilesDF <- file.info(list.files(path = directory, pattern = pattern,
                                               full.names = TRUE, no.. = TRUE))
            unprocessedFiles <- allFilesDF[(! allFilesDF$isdir) &
                                           (allFilesDF$mtime > stampTime), ]
            if (nrow(unprocessedFiles)) {
                ## We need to update the timestamp on stampFile quickly so
                ## that files added while this is running will be found in the
                ## next loop.
                ## WARNING: this blindly truncates the stampFile.
                file.create(stampFile, showWarnings = FALSE)
                for (fn in rownames(unprocessedFiles)) {
                    cat('Processing ', fn, '\n')
                    ## read.csv(fn)
                    ## ...
                }
            }
            if (loop) Sys.sleep(delay)
        }
    }
    

    As you initially suggested, running it in a continually-running R instance would simply be:

    processDir(loop = TRUE)
    

    To use @andrew's suggestion of a cron job, append the following line after the function definition:

    processDir()
    

    ... and use a crontab file similar to the following:

    # crontab
    0 8 * * * path/to/Rscript path/to/processDir.R
    

    Hope this helps.

    0 讨论(0)
  • 2021-01-14 17:26

    -- readfile.R --

    files <- file.info(list.files(directory))
    read.csv(rownames(files)[order(files$mtime)][nrow(files)])
    

    I'd put the above script in a cron job that runs every morning at a time when the file for the day will have been written. The below crontab runs it every morning at 8am.

    -- in crontab --

    0 8 * * *  Rscript readfile.R
    

    Read more about cron here.

    0 讨论(0)
  • 2021-01-14 17:32

    A more efficient solution using dplyr/magrittr

    pacman::p_load(magrittr)
    
    path <- list.files(path = directory,
                       pattern = "csv$",
                       full.names = TRUE) %>%
      extract(which.max(file.mtime(.)))
    
    0 讨论(0)
提交回复
热议问题