Decompress gz file using R

后端 未结 5 1159
逝去的感伤
逝去的感伤 2020-11-27 04:52

I have used ?unzip in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file whic

相关标签:
5条回答
  • 2020-11-27 05:01
    library(vroom)
    columns3 = c('A', 'B',...) ## define column names
    Data1<- vroom(".../XXX.tsv",col_names = columns3)
    

    works fine with tsv.gz

    0 讨论(0)
  • 2020-11-27 05:02

    Here is a worked example that may help illustrate what gzfile() and gzcon() are for

    foo <- data.frame(a=LETTERS[1:3], b=rnorm(3))
    foo
    #  a        b
    #1 A 0.586882
    #2 B 0.218608
    #3 C 1.290776
    write.table(foo, file="/tmp/foo.csv")
    system("gzip /tmp/foo.csv")             # being very explicit
    

    Now that the file is written, instead of implicit use of file(), use gzfile():

    read.table(gzfile("/tmp/foo.csv.gz"))   
    #  a        b
    #1 A 0.586882
    #2 B 0.218608
    #3 C 1.290776
    

    The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives. These are commonly used to distribute source code--as for example for R packages and R sources.

    0 讨论(0)
  • 2020-11-27 05:09

    To un-gz a file in R you can do

    library(R.utils)
    gunzip("file.gz", remove=FALSE)
    

    or

    gunzip("file.gz")
    

    But then you get the default (remove=TRUE) behavior in which the input file is removed after that the output file is fully created and closed.

    0 讨论(0)
  • 2020-11-27 05:14

    If you really want to uncompress the file, just use the untar function which does support gzip. E.g.:

    untar('chadwick-0.5.3.tar.gz')
    
    0 讨论(0)
  • 2020-11-27 05:20

    http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

    R added transparent decompression for certain kinds of compressed files in the latest version (2.10). If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.

    The command...

    myData <- read.table('myFile.gz')  
    

    #gzip compressed files have a "gz" extension

    Will work just as if 'myFile.gz' were the raw text file.

    0 讨论(0)
提交回复
热议问题