Load XDocument asynchronously

佐手、 提交于 2021-02-07 05:51:24

问题


I want to load large XML documents into XDocument objects. The simple synchronous approach using XDocument.Load(path, loadOptions) works great, but blocks for an uncomfortably long time in a GUI context when loading large files (particularly from network storage).

I wrote this async version with the intention of improving responsiveness in document loading, particularly when loading files over the network.

    public static async Task<XDocument> LoadAsync(String path, LoadOptions loadOptions = LoadOptions.PreserveWhitespace)
    {
        String xml;

        using (var stream = File.OpenText(path))
        {
            xml = await stream.ReadToEndAsync();
        }

        return XDocument.Parse(xml, loadOptions);
    }

However, on a 200 MB XML raw file loaded from local disk, the synchronous version completes in a few seconds. The asynchronous version (running in a 32-bit context) instead throws an OutOfMemoryException:

   at System.Text.StringBuilder.ToString()
   at System.IO.StreamReader.<ReadToEndAsyncInternal>d__62.MoveNext()

I imagine this is because of the temporary string variable used to hold the raw XML in memory for parsing by the XDocument. Presumably in the synchronous scenario, XDocument.Load() is able to stream through the source file, and never needs to create a single huge String to hold the entire file.

Is there any way to get the best of both worlds? Load the XDocument with fully asynchronous I/O, and without needing to create a large temporary string?


回答1:


First of all the task is not being run asynchronously. You would need to use either a built in async IO command or spin up a task on the thread pool yourself. For example

public static Task<XDocument> LoadAsync
 ( String path
 , LoadOptions loadOptions = LoadOptions.PreserveWhitespace
 )
{
    return Task.Run(()=>{
     using (var stream = File.OpenText(path))
        {
            return XDocument.Load(stream, loadOptions);
        }
    });
}

and if you use the stream version of Parse then you don't get a temporary string.




回答2:


XDocument.LoadAsync() is available in .NET Core 2.0: https://docs.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.loadasync?view=netcore-2.0




回答3:


Late answer, but I needed the async read as well on a "legacy" .NET Framework version so I figured out a way to truly read the content in an async way without reverting to buffering the XML data in memory.

Since the writer provided by XDocument.CreateWriter() does not support async writing and thus XmlWriter.WriteNodeAsync() fails, the code performs async reads and converts this to sync writes on the XDocument-writer. The code is inspired by the way XmlWriter.WriteNodeAsync() works however. Since the writer builds an in-memory DOM this is actually even better than actually doing async writes.

public static async Task<XDocument> LoadAsync(Stream stream, LoadOptions loadOptions) {
    using (var reader = XmlReader.Create(stream, new XmlReaderSettings() {
            DtdProcessing = DtdProcessing.Ignore,
            IgnoreWhitespace = (loadOptions&LoadOptions.PreserveWhitespace) == LoadOptions.None,
            XmlResolver = null,
            CloseInput = false,
            Async = true
    })) {
        var result = new XDocument();
        using (var writer = result.CreateWriter()) {
            do {
                switch (reader.NodeType) {
                case XmlNodeType.Element:
                    writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                    writer.WriteAttributes(reader, true);
                    if (reader.IsEmptyElement) {
                        writer.WriteEndElement();
                    }
                    break;
                case XmlNodeType.Text:
                    writer.WriteString(await reader.GetValueAsync().ConfigureAwait(false));
                    break;
                case XmlNodeType.CDATA:
                    writer.WriteCData(reader.Value);
                    break;
                case XmlNodeType.EntityReference:
                    writer.WriteEntityRef(reader.Name);
                    break;
                case XmlNodeType.ProcessingInstruction:
                case XmlNodeType.XmlDeclaration:
                    writer.WriteProcessingInstruction(reader.Name, reader.Value);
                    break;
                case XmlNodeType.Comment:
                    writer.WriteComment(reader.Value);
                    break;
                case XmlNodeType.DocumentType:
                    writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
                    break;
                case XmlNodeType.Whitespace:
                case XmlNodeType.SignificantWhitespace:
                    writer.WriteWhitespace(await reader.GetValueAsync().ConfigureAwait(false));
                    break;
                case XmlNodeType.EndElement:
                    writer.WriteFullEndElement();
                    break;
                }
            } while (await reader.ReadAsync().ConfigureAwait(false));
        }
        return result;
    }
}


来源:https://stackoverflow.com/questions/43590338/load-xdocument-asynchronously

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!