问题
I have a very large xml file. This is the simplified version of xml format.
<?xml version='1.0' encoding='UTF-8'?>
<Sender>
<SenderID>571099948</SenderID>
<Sponsors>
<Sponsor>
<SponsorID>TEST01</SponsorID>
<Contracts>
<Contract>
<ContractID>000001</ContractID>
<Member>
<SSN>1111111111</SSN>
<Gender>M</Gender>
<Benefits>
<Benefit BenefitType="AAA">
</Benefit>
<Benefit BenefitType="BBB">
</Benefit>
</Benefits>
</Member>
<Member>
<SSN>4444444444</SSN>
<Gender>F</Gender>
<Benefits>
<Benefit BenefitType="AAA">
</Benefit>
</Benefits>
</Member>
</Contract>
<Contract>
<ContractID>0000002</ContractID>
<Member>
<SSN>2222222222</SSN>
<Gender>F</Gender>
<Benefits>
<Benefit BenefitType="CCC">
</Benefit>
<Benefit BenefitType="DDD">
</Benefit>
</Benefits>
</Member>
</Contract>
<Contract>
<ContractID>0000003</ContractID>
<Member>
<SSN>333333333</SSN>
<Gender>F</Gender>
<Benefits>
<Benefit BenefitType="CCC">
</Benefit>
</Benefits>
</Member>
</Contract>
</Contracts>
</Sponsor>
<Sponsor>
<SponsorID>TEST02</SponsorID>
<Contracts>
<Contract>
<ContractID>0000011</ContractID>
<Member>
<SSN>1111111111</SSN>
<Gender>M</Gender>
<Benefits>
</Benefits>
</Member>
</Contract>
<Contract>
<ContractID>0000002</ContractID>
<Member>
<SSN>2222222222</SSN>
<Gender>F</Gender>
<Benefits>
</Benefits>
</Member>
</Contract>
</Contracts>
</Sponsor>
</Sponsors>
</Sender>
I want get all information of contract node, as well as SponsorID from the parent node. Here is the code to partially read xml file using XmlReader:
static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string elementName)
{
using (XmlReader reader = XmlReader.Create(inputUrl))
{
reader.MoveToContent();
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name == elementName)
{
XElement el = XNode.ReadFrom(reader) as XElement;
if (el != null)
{
yield return el;
}
}
}
}
}
}
Here is the issue. I cannot use this, because the whole sponsor tree may be too large for the memory.
var sponsor = SimpleStreamAxis(file, "Sponsor");
I cannot use this either, because I cannot tell SponsorID with only Contract node info.
var contract = SimpleStreamAxis(file, "Contract");
Is there a way that I can read the SponsorID in Sponsor, move cursor forward, and read all the Contract nodes under this Sponsor, then move to next Sponsor and read SponsorID and its Contract nodes and so on?
回答1:
Try this:
using (XmlReader xmlReader = XmlReader.Create("file.xml"))
{
while (xmlReader.Read())
{
if (xmlReader.ReadToFollowing("SponsorID"))
{
string sponsorId = xmlReader.ReadElementContentAsString();
// process SponsorID
Console.WriteLine(sponsorId);
if (xmlReader.ReadToFollowing("Contract"))
{
do
{
XmlReader contractSubtree = xmlReader.ReadSubtree();
XElement contractElement = XElement.Load(contractSubtree);
// process Contract
Console.WriteLine(contractElement.Element("ContractID"));
} while (xmlReader.ReadToNextSibling("Contract"));
}
}
}
}
回答2:
Yes, this can be done assuming that SponsorID
always precedes the Contract
nodes.
The basic idea is to read through the XML file until you find elements with the desired names "SponsorID"
or"Contract"
, then yield them for higher processing
public static IEnumerable<XElement> StreamNamedElements(XmlReader reader, IEnumerable<XName> names)
{
var nameSet = new HashSet<XName>(names);
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
{
XElement el = XNode.ReadFrom(reader) as XElement;
if (el != null)
yield return el;
}
}
}
In cases where SponsorID
is always present and precedes Contract
, this will enumerate through these elements correctly. However, if a sponsor ID is missing or out of order, the sponsor ID from a previous sponsor might get picked up. This error can be trapped by restricting the scope of each "SponsorID
" to the containing "Sponsor
" element using ReadSubtree():
public static IEnumerable<XmlReader> StreamNamedSubtrees(XmlReader reader, IEnumerable<XName> names)
{
var nameSet = new HashSet<XName>(names);
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
{
var subReader = reader.ReadSubtree();
yield return subReader;
((IDisposable)subReader).Dispose(); // Be sure to advance to the end of the subtree if the caller did not.
}
}
}
And then use it like:
using (var sr = new StringReader(xml))
using (var reader = XmlReader.Create(sr))
{
foreach (var subReader in StreamNamedSubtrees(reader, new[] { (XName)"Sponsor" }))
{
XElement sponsorID = null;
foreach (var el in StreamNamedElements(subReader, new[] { (XName)"SponsorID", (XName)"Contract" }))
{
if (el.Name == "SponsorID")
{
sponsorID = el;
}
else if (el.Name == "Contract")
{
if (sponsorID == null)
throw new InvalidOperationException();
// Example "higher processing"
Debug.WriteLine(string.Format("{0}: {1}", sponsorID.Value, el.ToString()));
}
}
}
}
来源:https://stackoverflow.com/questions/31062289/xmlreader-read-continually