Reading JSON objects from a large file

后端 未结 3 1264
轻奢々
轻奢々 2020-12-21 19:12

I am looking for a JSON Parser that can allow me to iterate through JSON objects from a large JSON file (with size few hundreds of MBs). I tried JsonTextReader from Json.NET

相关标签:
3条回答
  • 2020-12-21 19:16

    Let's assume you have a json array similar to this:

    [{"text":"0"},{"text":"1"}......]
    

    I'll declare a class for the object type

    public class TempClass
    {
        public string text;
    }
    

    Now, the deserializetion part

    JsonSerializer ser = new JsonSerializer();
    ser.Converters.Add(new DummyConverter<TempClass>(t =>
        {
           //A callback method
            Console.WriteLine(t.text);
        }));
    
    ser.Deserialize(new JsonTextReader(new StreamReader(File.OpenRead(fName))), 
                    typeof(List<TempClass>));
    

    And a dummy JsonConverter class to intercept the deserialization

    public class DummyConverter<T> : JsonConverter
    {
        Action<T> _action = null;
        public DummyConverter(Action<T> action)
        {
            _action = action;
        }
        public override bool CanConvert(Type objectType)
        {
            return objectType == typeof(TempClass);
        }
    
        public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
        {
            serializer.Converters.Remove(this);
            T item = serializer.Deserialize<T>(reader);
            _action( item);
            return null;
        }
    
        public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
        {
            throw new NotImplementedException();
        }
    }
    
    0 讨论(0)
  • 2020-12-21 19:19

    I would use this library JSON.net. The command for Nuget is as follows -> Install-Package Newtonsoft.Json

    0 讨论(0)
  • 2020-12-21 19:38

    This is one of the use cases I contemplated for my own parser/deserializer.

    I've recently made a simple example (by feeding the parser with JSON text that is read thru a StreamReader) of deserializing this JSON shape:

    { 
    "fathers" : [ 
    { 
      "id" : 0,
      "married" : true,
      "name" : "John Lee",
      "sons" : [ 
        { 
          "age" : 15,
          "name" : "Ronald"
          }
        ],
      "daughters" : [ 
        { 
          "age" : 7,
          "name" : "Amy"
          },
        { 
          "age" : 29,
          "name" : "Carol"
          },
        { 
          "age" : 14,
          "name" : "Barbara"
          }
        ]
      },
    { 
      "id" : 1,
      "married" : false,
      "name" : "Kenneth Gonzalez",
      "sons" : [
        ],
      "daughters" : [
        ]
      },
    { 
      "id" : 2,
      "married" : false,
      "name" : "Larry Lee",
      "sons" : [ 
        { 
          "age" : 4,
          "name" : "Anthony"
          },
        { 
          "age" : 2,
          "name" : "Donald"
          }
        ],
      "daughters" : [ 
        { 
          "age" : 7,
          "name" : "Elizabeth"
          },
        { 
          "age" : 15,
          "name" : "Betty"
          }
        ]
      },
    
      //(... etc)
      ]
    }
    

    ... into these POCOs:

    https://github.com/ysharplanguage/FastJsonParser#POCOs

    (i.e., specifically: "FathersData", "Father", "Son", "Daughter")

    That sample also presents:

    (1) a sample filter on the relative item index in the Father[] array (e.g., to fetch only the first 10), and

    (2) how to populate dynamically a property of the father's daughters, as the deserialization of their respective father returns - (that is, thanks to a delegate that the caller passes on to the parser's Parse method, for callback purposes).

    For the rest of the bits, see:

    ParserTests.cs : static void FilteredFatherStreamTestDaughterMaidenNamesFixup()

    (lines #829 to #904)

    The performance I observe on my humble laptop (*) for parsing some ~ 12MB to ~ 180MB JSON files and deserializing an arbitrary subset of their content into POCOs

    (or into loosely-typed dictionaries (just (string, object) key/value pairs) also supported)

    is anywhere in the ballpark from ~ 20MB/sec to 40MB/sec (**).

    (e.g., ~ 300 milliseconds in the case of the 12MB JSON file, into POCOs)

    More detailed info available here:

    https://github.com/ysharplanguage/FastJsonParser#Performance

    'HTH,

    (*) (running Win7 64bit @ 2.5Ghz)

    (**) (the throughput is quite dependent on the input JSON shape/complexity, e.g., sub-objects nesting depth, and other factors)

    0 讨论(0)
提交回复
热议问题