问题
I am trying to read a 3GB XML file through a URl and store all the jobs in dataset. XML looks like this:
<?xml version="1.0"?>
<feed total="1621473">
<job>
<title><![CDATA[Certified Medical Assistant]]></title>
<date>2016-03-25 14:19:38</date>
<referencenumber>2089677765</referencenumber>
<url><![CDATA[http://www.jobs2careers.com/click.php?id=2089677765.1347]]></url>
<company><![CDATA[Broadway Medical Clinic]]></company>
<city>Portland</city>
<state>OR</state>
<zip>97213</zip>
</job>
<job>
<title><![CDATA[Certified Medical Assistant]]></title>
<date>2016-03-25 14:19:38</date>
<referencenumber>2089677765</referencenumber>
<url><![CDATA[http://www.jobs2careers.com/click.php?id=2089677765.1347]]></url>
<company><![CDATA[Broadway Medical Clinic]]></company>
<city>Portland</city>
<state>OR</state>
<zip>97213</zip>
</job>
</feed>
This is my code
XmlDocument doc = new XmlDocument();
doc.Load(url);
DataSet ds = new DataSet();
XmlNodeReader xmlReader = new XmlNodeReader(doc);
while (xmlReader.ReadToFollowing("job"))
{
ds.ReadXml(xmlReader);
}
But I got memory out of bound exception. Browsed on google and found this:
DataSet ds = new DataSet();
FileStream filestream = File.OpenRead(url);
BufferedStream buffered = new BufferedStream(filestream);
ds.ReadXml(buffered);
still the same exception. I also read about XmlTextReader but i don't know how to make use of it in my case. I know why i am getting the exception but i don't know how to overcome that.Thanks
回答1:
Instead of trying to load the entire file into the DataSet or other container, how about loading batches and write each batch to the database so whatever is holding the batch can be cleared each time?
How to: Perform Streaming Transform of Large XML Documents https://msdn.microsoft.com/en-us/library/bb387013.aspx
List<XElement> jobs = new List<XElement>();
using (XmlReader reader = XmlReader.Create(filePath))
{
XElement job;
reader.MoveToContent();
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "job"))
{
job = XElement.ReadFrom(reader) as XElement;
jobs.Add(job);
if (jobs.Count >= 1000)
{
// TODO: write batch to database
jobs.Clear();
}
}
}
if (jobs.Count > 0)
{
// TODO: write remainder to database
jobs.Clear();
}
}
Alternative using a DataSet.
DataSet ds = new DataSet();
using (XmlReader reader = XmlReader.Create(filePath))
{
reader.MoveToContent();
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "job"))
{
ds.ReadXml(reader);
DataTable dt = ds.Tables["job"];
if (dt.Rows.Count >= 1000)
{
// TODO: write batch to database
dt.Rows.Clear();
}
}
}
if (ds.Tables["job"].Rows.Count > 0)
{
// TODO: write remainder to database
ds.Tables["job"].Rows.Clear();
}
}
回答2:
The doc.Load() is going to read entire file and give error. XmlNodeReader will not really do anything for you. Try this
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.Data;
namespace ConsoleApplication1
{
class Program
{
const string url = @"c:\temp\test.xml";
static void Main(string[] args)
{
int count = 0;
DataSet ds = new DataSet();
XmlReader xmlReader = XmlReader.Create(url);
xmlReader.MoveToContent();
try
{
while (!xmlReader.EOF)
{
count++;
xmlReader.ReadToFollowing("job");
if (!xmlReader.EOF)
{
ds.ReadXml(xmlReader);
}
}
}
catch (Exception ex)
{
Console.WriteLine("Count : {0}", count);
Console.ReadLine();
}
}
}
}
来源:https://stackoverflow.com/questions/36292830/loading-large-xml-on-dataset-outofmemory-exception