How to read big json?

后端 未结 3 1277
执念已碎
执念已碎 2020-12-31 06:18

I receive json-files with data to be analyzed in R, for which I use the RJSONIO-package:

library(RJSONIO)
filename <- \"Indata.json\"
jFile <- fromJSON         


        
相关标签:
3条回答
  • 2020-12-31 06:45

    Although your question doesn't specify this detail, you may want to make sure that loading the entire JSON in memory is actually what you want. It looks like RJSONIO is a DOM-based API.

    What computation do you need to do? Can you use a streaming parser? An example of a SAX-like streaming parser for JSON is yajl.

    0 讨论(0)
  • 2020-12-31 06:48

    Even though the question is very old, this might be of use for someone with a similar problem.

    The function jsonlite::stream_in() allows to define pagesize to set the number of lines read at a time, and a custom function that is applied to this subset in each iteration can be provided as handler. This allows working with very large JSON-files without reading everything into memory at the same time.

    stream_in(con, pagesize = 5000, handler = function(x){
        # Do something with the data here
    })
    
    0 讨论(0)
  • 2020-12-31 06:50

    Not on the memory size, but on the speed, for the quite small iris dataset (only 7088 bytes), the RJSONIO package is an order of magnitude slower than rjson. Don't use the method 'R' unless you really have to! Note the different units in the two sets of results.

    library(rjson) # library(RJSONIO)
    library(plyr)
    library(microbenchmark)
    x <- toJSON(iris)
    (op <- microbenchmark(CJ=toJSON(iris), RJ=toJSON(iris, method='R'),
      JC=fromJSON(x), JR=fromJSON(x, method='R') ) )
    
    # for rjson on this machine...
    Unit: microseconds
      expr        min          lq     median          uq        max
    1   CJ    491.470    496.5215    501.467    537.6295    561.437
    2   JC    242.079    249.8860    259.562    274.5550    325.885
    3   JR 167673.237 170963.4895 171784.270 172132.7540 190310.582
    4   RJ    912.666    925.3390    957.250   1014.2075   1153.494
    
    # for RJSONIO on the same machine...
    Unit: milliseconds
      expr      min       lq   median       uq      max
    1   CJ 7.338376 7.467097 7.563563 7.639456 8.591748
    2   JC 1.186369 1.234235 1.247235 1.265922 2.165260
    3   JR 1.196690 1.238406 1.259552 1.278455 2.325789
    4   RJ 7.353977 7.481313 7.586960 7.947347 9.364393
    
    0 讨论(0)
提交回复
热议问题