Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

久未见 提交于 2019-12-13 08:52:30

问题


I need to take a very large XML file and create multiple output xml files from what could be thousands of repeating nodes of the input file. There is no whitespace in the source file "AnimalBatch.xml" which looks like this:

<?xml version="1.0" encoding="utf-8" ?><Animals><Animal id="1001"><Quantity>One</Quantity><Adjective>Red</Adjective><Name>Rooster</Name></Animal><Animal id="1002"><Quantity>Two</Quantity><Adjective>Stubborn</Adjective><Name>Donkeys</Name></Animal><Animal id="1003"><Quantity>Three</Quantity><Adjective>Blind</Adjective><Name>Mice</Name></Animal><Animal id="1004"><Quantity>Four</Quantity><Adjective>Purple</Adjective><Name>Horses</Name></Animal><Animal id="1005"><Quantity>Five</Quantity><Adjective>Long</Adjective><Name>Centipedes</Name></Animal><Animal id="1006"><Quantity>Six</Quantity><Adjective>Dark</Adjective><Name>Owls</Name></Animal></Animals>

The program needs to split the repeating "Animal" and produce the appropriate number of files named: Animal_1001.xml, Animal_1002.xml, Animal_1003.xml, etc.


Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>


Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>


Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>

The code below works, but only if the input file has CR/LF after the <Animal id="xxxx"> elements. If it has no "whitespace" (I don't, and can't get it like that), I get every other one (the odd numbered animals)

    static void SplitXMLReader()
    {
        string strFileName;
        string strSeq = "";

        XmlReader doc = XmlReader.Create("C:\\AnimalBatch.xml");

        while (doc.Read())
        {
            if ( doc.Name == "Animal"  && doc.NodeType == XmlNodeType.Element )
            {
                strSeq = doc.GetAttribute("id"); 

                XmlDocument outdoc = new XmlDocument();
                XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);                     
                XmlElement rootNode = outdoc.CreateElement(doc.Name);

                rootNode.InnerXml = doc.ReadInnerXml();  
                // This seems to be advancing the cursor in doc too far.

                outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
                outdoc.AppendChild(rootNode);

                strFileName = "Animal_" + strSeq + ".xml";
                outdoc.Save("C:\\" + strFileName);                    
            }
        }
    }

My understanding is that "whitespace" or formatting in XML should make no difference to XmlReader - but I've tried this both ways, with and without CR/LF's after the <Animal id="xxxx">, and can confirm there is a difference. If it has CR/LFs (possibly even just a space, which I'll try next) - it gets each <Animal> node processed fully, and saved under the right filename that comes from the id attribute.

Can someone let me know what's going on here - and a possible workaround?


回答1:


yes, when using the doc.readInnerXml() white space is important.

From the documentation of the function. This returns a string. so of course white space will matter. If you want the inner text as a xmlNode you should use something like this




回答2:


Thanks for the guidance on using the ReadSubTree() method:

This code works for the XML input file with no linefeeds:

    static void SplitXMLReaderSubTree()
    {
        string strFileName;
        string strSeq = "";
        XmlReader doc = XmlReader.Create("C:\\AnimalBatch.xml");

        while (!doc.EOF)
        {
            if ( doc.Name == "Animal"  && doc.NodeType == XmlNodeType.Element )
            {
                strSeq = doc.GetAttribute("id");
                XmlReader inner = doc.ReadSubtree();
                inner.Read();
                XmlDocument outdoc = new XmlDocument();
                XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
                XmlElement myElement;
                myElement = outdoc.CreateElement(doc.Name);
                myElement.InnerXml = inner.ReadInnerXml();
                inner.Close();
                myElement.Attributes.RemoveAll();
                outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
                outdoc.ImportNode(myElement, true);
                outdoc.AppendChild(myElement);
                strFileName = "Animal_" + strSeq + ".xml";
                outdoc.Save("C:\\" + strFileName);                    
            }
            else
            {
                doc.Read();
            }
        }


来源:https://stackoverflow.com/questions/12188383/parsing-a-large-xml-file-to-multiple-output-xmls-using-xmlreader-getting-ever

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!