How to read big json?

后端未结

关注

 3  1280

I receive json-files with data to be analyzed in R, for which I use the RJSONIO-package:

library(RJSONIO)
filename <- \"Indata.json\"
jFile <- fromJSON


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  小蘑菇        
                
              
                            
                2020-12-31 06:45
              
            
            
                                                                       
Although your question doesn't specify this detail, you may want to make sure that loading the entire JSON in memory is actually what you want.  It looks like RJSONIO is a DOM-based API.  

What computation do you need to do?  Can you use a streaming parser?  An example of a SAX-like streaming parser for JSON is yajl.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-31 06:48
              
            
            
                                                                       
Even though the question is very old, this might be of use for someone with a similar problem.

The function jsonlite::stream_in() allows to define pagesize to set the number of lines read at a time, and a custom function that is applied to this subset in each iteration can be provided as handler. This allows working with very large JSON-files without reading everything into memory at the same time.

stream_in(con, pagesize = 5000, handler = function(x){
    # Do something with the data here
})

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2020-12-31 06:50
              
            
            
                                                                       
Not on the memory size, but on the speed, for the quite small iris dataset (only 7088 bytes), the RJSONIO package is an order of magnitude slower than rjson.  Don't use the method 'R' unless you really have to!  Note the different units in the two sets of results.

library(rjson) # library(RJSONIO)
library(plyr)
library(microbenchmark)
x <- toJSON(iris)
(op <- microbenchmark(CJ=toJSON(iris), RJ=toJSON(iris, method='R'),
  JC=fromJSON(x), JR=fromJSON(x, method='R') ) )

# for rjson on this machine...
Unit: microseconds
  expr        min          lq     median          uq        max
1   CJ    491.470    496.5215    501.467    537.6295    561.437
2   JC    242.079    249.8860    259.562    274.5550    325.885
3   JR 167673.237 170963.4895 171784.270 172132.7540 190310.582
4   RJ    912.666    925.3390    957.250   1014.2075   1153.494

# for RJSONIO on the same machine...
Unit: milliseconds
  expr      min       lq   median       uq      max
1   CJ 7.338376 7.467097 7.563563 7.639456 8.591748
2   JC 1.186369 1.234235 1.247235 1.265922 2.165260
3   JR 1.196690 1.238406 1.259552 1.278455 2.325789
4   RJ 7.353977 7.481313 7.586960 7.947347 9.364393

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复