Using R to download zipped data file, extract, and import data

前端 未结 8 1211
Happy的楠姐
Happy的楠姐 2020-11-22 14:12

@EZGraphs on Twitter writes: \"Lots of online csvs are zipped. Is there a way to download, unzip the archive, and load the data to a data.frame using R? #Rstats\"

I

相关标签:
8条回答
  • 2020-11-22 14:25

    Here is an example that works for files which cannot be read in with the read.table function. This example reads a .xls file.

    url <-"https://www1.toronto.ca/City_Of_Toronto/Information_Technology/Open_Data/Data_Sets/Assets/Files/fire_stns.zip"
    
    temp <- tempfile()
    temp2 <- tempfile()
    
    download.file(url, temp)
    unzip(zipfile = temp, exdir = temp2)
    data <- read_xls(file.path(temp2, "fire station x_y.xls"))
    
    unlink(c(temp, temp2))
    
    0 讨论(0)
  • 2020-11-22 14:26

    To do this using data.table, I found that the following works. Unfortunately, the link does not work anymore, so I used a link for another data set.

    library(data.table)
    temp <- tempfile()
    download.file("https://www.bls.gov/tus/special.requests/atusact_0315.zip", temp)
    timeUse <- fread(unzip(temp, files = "atusact_0315.dat"))
    rm(temp)
    

    I know this is possible in a single line since you can pass bash scripts to fread, but I am not sure how to download a .zip file, extract, and pass a single file from that to fread.

    0 讨论(0)
  • 2020-11-22 14:32

    I used CRAN package "downloader" found at http://cran.r-project.org/web/packages/downloader/index.html . Much easier.

    download(url, dest="dataset.zip", mode="wb") 
    unzip ("dataset.zip", exdir = "./")
    
    0 讨论(0)
  • 2020-11-22 14:35

    Zip archives are actually more a 'filesystem' with content metadata etc. See help(unzip) for details. So to do what you sketch out above you need to

    1. Create a temp. file name (eg tempfile())
    2. Use download.file() to fetch the file into the temp. file
    3. Use unz() to extract the target file from temp. file
    4. Remove the temp file via unlink()

    which in code (thanks for basic example, but this is simpler) looks like

    temp <- tempfile()
    download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
    data <- read.table(unz(temp, "a1.dat"))
    unlink(temp)
    

    Compressed (.z) or gzipped (.gz) or bzip2ed (.bz2) files are just the file and those you can read directly from a connection. So get the data provider to use that instead :)

    0 讨论(0)
  • 2020-11-22 14:37

    I found that the following worked for me. These steps come from BTD's YouTube video, Managing Zipfile's in R:

    zip.url <- "url_address.zip"
    
    dir <- getwd()
    
    zip.file <- "file_name.zip"
    
    zip.combine <- as.character(paste(dir, zip.file, sep = "/"))
    
    download.file(zip.url, destfile = zip.combine)
    
    unzip(zip.file)
    
    0 讨论(0)
  • 2020-11-22 14:44

    Just for the record, I tried translating Dirk's answer into code :-P

    temp <- tempfile()
    download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
    con <- unz(temp, "a1.dat")
    data <- matrix(scan(con),ncol=4,byrow=TRUE)
    unlink(temp)
    
    0 讨论(0)
提交回复
热议问题