Parse huge OData JSON by streaming certain sections of the json to avoid LOH

前端 未结 1 976
花落未央
花落未央 2021-01-21 10:30

I have an OData response as JSON (Which is in few MBs) and the requirement is to stream \"certain parts of JSON\" without even loading them to memory.<

相关标签:
1条回答
  • 2021-01-21 10:59

    If you are copying portions of JSON from one stream to another, you can do this more efficiently with JsonWriter.WriteToken(JsonReader) thus avoiding the intermediate Current = JToken.Load(jsonTextReader) and Encoding.ASCII.GetBytes(Current.ToString()) representations and their associated memory overhead:

    using (var textWriter = new StreamWriter(destinationStream, new UTF8Encoding(false, true), 1024, true))
    using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.Indented, CloseOutput = false })
    {
        // Use Formatting.Indented or Formatting.None as required.
        jsonWriter.WriteToken(jsonTextReader);
    }
    

    However, Json.NET's JsonTextReader does not have the ability to read a single string value in "chunks" in the same way as XmlReader.ReadValueChunk(). It will always fully materialize each atomic string value. If your strings values are so large that they are going on the large object heap, even using JsonWriter.WriteToken() will not prevent these strings from being completely loaded into memory.

    As an alternative, you might consider the readers and writers returned by JsonReaderWriterFactory. These readers and writers are used by DataContractJsonSerializer and translate JSON to XML on-the-fly as it is being read and written. Since the base classes for these readers and writers are XmlReader and XmlWriter, they do support reading and writing string values in chunks. Using them appropriately will avoid allocation of strings in the large object heap.

    To do this, first define the following extension methods, that copy a selected subset of JSON value(s) from an input stream to an output stream, as specified by a path to the data to be streamed:

    public static class JsonExtensions
    {
        public static void StreamNested(Stream from, Stream to, string [] path)
        {
            var reversed = path.Reverse().ToArray();
    
            using (var xr = JsonReaderWriterFactory.CreateJsonReader(from, XmlDictionaryReaderQuotas.Max))
            {
                foreach (var subReader in xr.ReadSubtrees(s => s.Select(n => n.LocalName).SequenceEqual(reversed)))
                {
                    using (var xw = JsonReaderWriterFactory.CreateJsonWriter(to, Encoding.UTF8, false))
                    {
                        subReader.MoveToContent();
    
                        xw.WriteStartElement("root");
                        xw.WriteAttributes(subReader, true);
    
                        subReader.Read();
    
                        while (!subReader.EOF)
                        {
                            if (subReader.NodeType == XmlNodeType.Element && subReader.Depth == 1)
                                xw.WriteNode(subReader, true);
                            else
                                subReader.Read();
                        }
    
                        xw.WriteEndElement();
                    }
                }
            }
        }
    }
    
    public static class XmlReaderExtensions
    {
        public static IEnumerable<XmlReader> ReadSubtrees(this XmlReader xmlReader, Predicate<Stack<XName>> filter)
        {
            Stack<XName> names = new Stack<XName>();
    
            while (xmlReader.Read())
            {
                if (xmlReader.NodeType == XmlNodeType.Element)
                {
                    names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));
                    if (filter(names))
                    {
                        using (var subReader = xmlReader.ReadSubtree())
                        {
                            yield return subReader;
                        }
                    }
                }
    
                if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)
                    || xmlReader.NodeType == XmlNodeType.EndElement)
                {
                    names.Pop();
                }
            }
        }
    }
    

    Now, the string [] path argument to StreamNested() is not any sort of jsonpath path. Instead, it is a path corresponding to the hierarchy of XML elements corresponding to the JSON you want to select as translated by the XmlReader returned by JsonReaderWriterFactory.CreateJsonReader(). The mapping used for this translation is, in turn, documented by Microsoft in Mapping Between JSON and XML. To select and stream only those JSON values matching value[*], the XML path required is //root/value/item. Thus, you can select and stream your desired nested objects by doing:

    JsonExtensions.StreamNested(inputStream, destinationStream, new[] { "root", "value", "item" });
    

    Notes:

    • Mapping Between JSON and XML is somewhat complex. It's often easier just to load some sample JSON into an XDocument using the following extension method:

      static XDocument ParseJsonAsXDocument(string json)
      {
          using (var xr = JsonReaderWriterFactory.CreateJsonReader(new MemoryStream(Encoding.UTF8.GetBytes(json)), Encoding.UTF8, XmlDictionaryReaderQuotas.Max, null))
          {
              return XDocument.Load(xr);
          }
      }
      

      And then determine the correct XML path observationally.

    • For a related question, see JObject.SelectToken Equivalent in .NET.

    0 讨论(0)
提交回复
热议问题