I receive json-files with data to be analyzed in R, for which I use the RJSONIO-package:
library(RJSONIO)
filename <- \"Indata.json\"
jFile <- fromJSON
Although your question doesn't specify this detail, you may want to make sure that loading the entire JSON in memory is actually what you want. It looks like RJSONIO is a DOM-based API.
What computation do you need to do? Can you use a streaming parser? An example of a SAX-like streaming parser for JSON is yajl.
Even though the question is very old, this might be of use for someone with a similar problem.
The function jsonlite::stream_in()
allows to define pagesize
to set the number of lines read at a time, and a custom function that is applied to this subset in each iteration can be provided as handler
. This allows working with very large JSON-files without reading everything into memory at the same time.
stream_in(con, pagesize = 5000, handler = function(x){
# Do something with the data here
})
Not on the memory size, but on the speed, for the quite small iris
dataset (only 7088 bytes), the RJSONIO
package is an order of magnitude slower than rjson
. Don't use the method 'R' unless you really have to! Note the different units in the two sets of results.
library(rjson) # library(RJSONIO)
library(plyr)
library(microbenchmark)
x <- toJSON(iris)
(op <- microbenchmark(CJ=toJSON(iris), RJ=toJSON(iris, method='R'),
JC=fromJSON(x), JR=fromJSON(x, method='R') ) )
# for rjson on this machine...
Unit: microseconds
expr min lq median uq max
1 CJ 491.470 496.5215 501.467 537.6295 561.437
2 JC 242.079 249.8860 259.562 274.5550 325.885
3 JR 167673.237 170963.4895 171784.270 172132.7540 190310.582
4 RJ 912.666 925.3390 957.250 1014.2075 1153.494
# for RJSONIO on the same machine...
Unit: milliseconds
expr min lq median uq max
1 CJ 7.338376 7.467097 7.563563 7.639456 8.591748
2 JC 1.186369 1.234235 1.247235 1.265922 2.165260
3 JR 1.196690 1.238406 1.259552 1.278455 2.325789
4 RJ 7.353977 7.481313 7.586960 7.947347 9.364393