How to read stream of JSON objects per object

非 Y 不嫁゛ 提交于 2020-01-24 15:48:08

问题


I have a binary application which generates a continuous stream of json objects (not an array of json objects). Json object can sometimes span multiple lines (still being a valid json object but prettified).

I can connect to this stream and read it without problems like:

var child = require('child_process').spawn('binary', ['arg','arg']);

child.stdout.on('data', data => {
  console.log(data);
});

Streams are buffers and emit data events whenever they please, therefore I played with readline module in order to parse the buffers into lines and it works (I'm able to JSON.parse() the line) for Json objects which don't span on multiple lines.

Optimal solution would be to listen on events which return single json object, something like:

child.on('json', object => {

});

I have noticed objectMode option in streams node documentation however I' getting a stream in Buffer format so I belive I'm unable to use it.

Had a look at npm at pixl-json-stream, json-stream but in my opinnion none of these fit the purpose. There is clarinet-object-stream but it would require to build the json object from ground up based on the events.

I'm not in control of the json object stream, most of the time one object is on one line, however 10-20% of the time json object is on multiple lines (\n as EOL) without separator between objects. Each new object always starts on a new line.

Sample stream:

{ "a": "a", "b":"b" }
{ "a": "x",
  "b": "y", "c": "z"
}
{ "a": "a", "b":"b" }

There must be a solution already I'm just missing something obvious. Would rather find appropriate module then to hack with regexp the stream parser to handle this scenario.


回答1:


I'd recommend to try parsing every line:

const readline = require('readline');

const rl = readline.createInterface({
 input: child.stdout
});

var tmp = ''
rl.on('line', function(line) {
  tmp += line
  try {
    var obj = JSON.parse(tmp)
    child.emit('json', obj)
    tmp = ''
  } catch(_) {
    // JSON.parse may fail if JSON is not complete yet
  }
})

child.on('json', function(obj) {
  console.log(obj)
})

As the child is an EventEmitter, one can just call child.emit('json', obj).




回答2:


Having the same requirement, I was uncomfortable enforcing a requirement for newlines to support readline, needed to be able to handle starting the read in the middle of a stream (possibly the middle of a JSON document), and didn't like constantly parsing and checking for errors (seemed inefficient).

As such I preferred using the clarinet sax parser, collecting the documents as I went and emitting doc events once whole JSON documents have been parsed.

I just published this class to NPM

https://www.npmjs.com/package/json-doc-stream



来源:https://stackoverflow.com/questions/36813649/how-to-read-stream-of-json-objects-per-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!