Loading large XML on DataSet (OutOfMemory Exception)

问题

I am trying to read a 3GB XML file through a URl and store all the jobs in dataset. XML looks like this:

    <?xml version="1.0"?>
    <feed total="1621473">
      <job>
        <title><![CDATA[Certified Medical Assistant]]></title>
        <date>2016-03-25 14:19:38</date>
        <referencenumber>2089677765</referencenumber>
        <url><![CDATA[http://www.jobs2careers.com/click.php?id=2089677765.1347]]></url>
        <company><![CDATA[Broadway Medical Clinic]]></company>
        <city>Portland</city>
        <state>OR</state>
        <zip>97213</zip>
     </job>
     <job>
        <title><![CDATA[Certified Medical Assistant]]></title>
        <date>2016-03-25 14:19:38</date>
        <referencenumber>2089677765</referencenumber>
        <url><![CDATA[http://www.jobs2careers.com/click.php?id=2089677765.1347]]></url>
        <company><![CDATA[Broadway Medical Clinic]]></company>
        <city>Portland</city>
        <state>OR</state>
        <zip>97213</zip>
     </job>
    </feed>

This is my code

XmlDocument doc = new XmlDocument();
            doc.Load(url);
            DataSet ds = new DataSet();
            XmlNodeReader xmlReader = new XmlNodeReader(doc);

            while (xmlReader.ReadToFollowing("job"))
            {
                ds.ReadXml(xmlReader);
            }

But I got memory out of bound exception. Browsed on google and found this:

DataSet ds = new DataSet();
        FileStream filestream = File.OpenRead(url);
        BufferedStream buffered = new BufferedStream(filestream);
        ds.ReadXml(buffered);

still the same exception. I also read about XmlTextReader but i don't know how to make use of it in my case. I know why i am getting the exception but i don't know how to overcome that.Thanks

回答1:

Instead of trying to load the entire file into the DataSet or other container, how about loading batches and write each batch to the database so whatever is holding the batch can be cleared each time?

How to: Perform Streaming Transform of Large XML Documents https://msdn.microsoft.com/en-us/library/bb387013.aspx

        List<XElement> jobs = new List<XElement>();
        using (XmlReader reader = XmlReader.Create(filePath))
        {
            XElement job;
            reader.MoveToContent();
            while (reader.Read())
            {
                if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "job"))
                {
                    job = XElement.ReadFrom(reader) as XElement;
                    jobs.Add(job);

                    if (jobs.Count >= 1000)
                    {
                        // TODO: write batch to database
                        jobs.Clear();
                    }
                }
            }

            if (jobs.Count > 0)
            {
                // TODO: write remainder to database
                jobs.Clear();
            }

        }

Alternative using a DataSet.

        DataSet ds = new DataSet();
        using (XmlReader reader = XmlReader.Create(filePath))
        {
            reader.MoveToContent();
            while (reader.Read())
            {
                if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "job"))
                {
                    ds.ReadXml(reader);

                    DataTable dt = ds.Tables["job"];
                    if (dt.Rows.Count >= 1000)
                    {
                        // TODO: write batch to database
                        dt.Rows.Clear();
                    }
                }
            }

            if (ds.Tables["job"].Rows.Count > 0)
            {
                // TODO: write remainder to database
                ds.Tables["job"].Rows.Clear();
            }
        }

回答2:

The doc.Load() is going to read entire file and give error. XmlNodeReader will not really do anything for you. Try this

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.Data;

namespace ConsoleApplication1
{
    class Program
    {
        const string url = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            int count = 0;
            DataSet ds = new DataSet();
            XmlReader xmlReader = XmlReader.Create(url);
            xmlReader.MoveToContent();
            try
            {
                while (!xmlReader.EOF)
                {
                    count++;
                    xmlReader.ReadToFollowing("job");
                    if (!xmlReader.EOF)
                    {
                        ds.ReadXml(xmlReader);
                    }
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("Count : {0}", count);
                Console.ReadLine();
            }
            
        }
    }

}

来源：https://stackoverflow.com/questions/36292830/loading-large-xml-on-dataset-outofmemory-exception

标签

xml

visual-studio-2012

dataset