Efficient parsing of first four elements of large JSON arrays

旧时模样 提交于 2019-12-20 04:13:09

问题


I am using Jackson to parse JSON from a json inputStream which looks like following:

[
      [ 36,
        100,
        "The 3n + 1 problem",
         56717,
         0,
         1000000000,
         0,
         6316,
         0,
         0,
         88834,
         0,
         45930,
         0,
         46527,
         5209,
         200860,
         3597,
         149256,
         3000,
         1
      ],
      [
         ........
      ],
      [
         ........
      ],
         .....// and almost 5000 arrays like above
]

This is the original feed link: http://uhunt.felix-halim.net/api/p

I want to parse it and keep only the first 4 elements of every array and skip other 18 elements.

36
100
The 3n + 1 problem
56717

Code structure I have tried so far:

while (jsonParser.nextToken() != JsonToken.END_ARRAY) {

        jsonParser.nextToken(); // '['
        while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
            // I tried many approaches here but not found appropriate one
         }

}

As this feed is pretty big, I need to do this efficiently with less overhead and memory. Also there are three models to procress JSON: Streaming API, Data Binding and Tree Model. Which one is appropriate for my purpose?

How can I parse this json efficiently with Jackson? How can I skip those 18 elements and jump to next array for better performance?

Edit: (Solution)

Jackson and GSon both works in almost in the same mechanism (incremental mode, since content is read and written incrementally), I am switching to GSON as it has a function skipValue() (pretty appropriate with name). Although Jackson's nextToken() will work like skipValue(), GSON seems more flexible to me. Thanks @Kowser bro for his recommendation, I came to know about GSON before but somehow ignored it. This is my working code:

reader.beginArray();
while (reader.hasNext()) {
   reader.beginArray(); 
   int a = reader.nextInt(); 
   int b = reader.nextInt();
   String c = reader.nextString();
   int d = reader.nextInt();
   System.out.println(a + " " + b + " " + c + " " + d);
   while (reader.hasNext())
      reader.skipValue();
   reader.endArray();
} 
reader.endArray();
reader.close();

回答1:


This is for Jackson

Follow this tutorial.

Judicious use of jasonParser.nextToken() should help you.

while (jasonParser.nextToken() != JsonToken.END_ARRAY) { // might be JsonToken.START_ARRAY?

The pseudo-code is

  1. find next array
    1. read values
    2. skip other values
    3. skip next end token

This is for gson. Take a look at this tutorial. Consider following second example from the tutorial.

Judicious use of reader.begin* reader.end* and reader.skipValue should do the job for you.

And here is the documentation for JsonReader



来源:https://stackoverflow.com/questions/18040368/efficient-parsing-of-first-four-elements-of-large-json-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!