Way to read (or edit) big JSON from / to stream

后端 未结 1 477
一生所求
一生所求 2020-12-12 08:28

(Answered yet - at least 3 solutions left there instead of original question.)
I was been trying to parse & split big JSON, but did not want to modify content.
Fl

相关标签:
1条回答
  • 2020-12-12 08:48

    Gason translated to C# is probably quickest parser in C# language now, speed similar to C++ version (Debug Build, 2x slower in Release), memory consumption 2x bigger: https://github.com/eltomjan/gason

    (Disclaimer: I am affiliated with this C# fork of Gason.)

    Parser has experimental feature - exit after parsing predefined # of lines in last array and next time continue after last item with next batch:

    using Gason;
    
    int endPos = -1;
    JsonValue jsn;
    Byte[] raw;
    
    String json = @"{""id"":""0001"",""type"":""donut"",""name"":""Cake"",""ppu"":0.55, 
      ""batters"": [ { ""id"": ""1001"", ""type"": ""Regular"" },
                     { ""id"": ""1002"", ""type"": ""Chocolate"" },
                     { ""id"": ""1003"", ""type"": ""Blueberry"" }, 
                     { ""id"": ""1004"", ""type"": ""Devil's Food"" } ]
      }"
    raw = Encoding.UTF8.GetBytes(json);
    ByteString[] keys = new ByteString[]
    {
        new ByteString("batters"),
        null
    };
    Parser jsonParser = new Parser(true); // FloatAsDecimal (,JSON stack array size=32)
    jsonParser.Parse(raw, ref endPos, out jsn, keys, 2, 0, 2); // batters / null path...
    ValueWriter wr = new ValueWriter(); // read only 1st 2
    using (StreamWriter sw = new StreamWriter(Console.OpenStandardOutput()))
    {
        sw.AutoFlush = true;
        wr.DumpValueIterative(sw, jsn, raw);
    }
    Parser.Parse(raw, ref endPos, out jsn, keys, 2, endPos, 2); // and now following 2
    using (StreamWriter sw = new StreamWriter(Console.OpenStandardOutput()))
    {
        sw.AutoFlush = true;
        wr.DumpValueIterative(sw, jsn, raw);
    }
    

    It is a quick and simple option to split long JSONs now - whole 1/4GB, <18Mio rows in main array in <5,3s on a quick machine (Debug Build) using <950MB RAM, Newtonsoft.Json consumed >30s/5.36GB. If parsing only first 100 rows <330ms, >250MB RAM.
    In Release Build even better <3.2s where Newton spent >29.3s (>10.8x better performance).

    1st Parse:
    {
      "id": "0001",
      "type": "donut",
      "name": "Cake",
      "ppu": 0.55,
      "batters": [
        {
          "id": "1001",
          "type": "Regular"
        },
        {
          "id": "1002",
          "type": "Chocolate"
        }
      ]
    }
    2nd Parse:
    {
      "id": "0001",
      "type": "donut",
      "name": "Cake",
      "ppu": 0.55,
      "batters": [
        {
          "id": "1003",
          "type": "Blueberry"
        },
        {
          "id": "1004",
          "type": "Devil's Food"
        }
      ]
    }
    
    0 讨论(0)
提交回复
热议问题