R Reading in a zip data file without unzipping it

后端未结

关注

 7  2126

I have a very large zip file and i am trying to read it into R without unzipping it like so:

temp <- tempfile(\"Sales\", fileext=c(\"zip\"))
data <- re


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2020-12-04 15:36
              
            
            
                                                                       
The gzfile function along with read_csv and read.table can read compressed files.

library(readr)
df = read_csv(gzfile("file.csv.gz"))

library(data.table)
df = read.table(gzfile("file.csv.gz"))


read_csv from the readr package can read compressed files even without using gzfile function. 

library(readr)  
df = read_csv("file.csv.gz")


read_csv is recommended because it is faster than read.table
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2020-12-04 15:38
              
            
            
                                                                       
In this expression you lost a dot

temp <- tempfile("Sales", fileext=c("zip"))


It should be:

temp <- tempfile("Sales", fileext=c(".zip"))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  花落未央        
                
              
                            
                2020-12-04 15:48
              
            
            
                                                                       
This should work just fine if the file is sales.csv. 

data <- readr::read_csv(unzip("Sales.zip", "Sales.csv"))


To check the filename without extracting the file. This works

unzip("sales.zip", list = TRUE)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-04 15:49
              
            
            
                                                                       
If you have zcat installed on your system (which is the case for linux, macos, and cygwin) you could also use:

zipfile<-"test.zip"
myData <- read.delim(pipe(paste("zcat", zipfile)))


This solution also has the advantage that no temporary files are created.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-12-04 15:52
              
            
            
                                                                       
The methods of the readr package also support compressed files if the file suffix indicates the nature of the file, that is files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed.

require(readr)
myData <- read_csv("foo.txt.gz")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-12-04 15:57
              
            
            
                                                                       
If your zip file is called Sales.zip and contains only a file called Sales.dat, I think you can simply do the following (assuming the file is in your working directory):

data <- read.table(unz("Sales.zip", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复