Reading XML with unclosed tags in C#

前端 未结 3 694
我寻月下人不归
我寻月下人不归 2021-01-13 23:48

I have a program which runs tests and generates a grid-view with all the results in it, and also an XML log file. The program also has the functionality to load logs to repl

相关标签:
3条回答
  • 2021-01-14 00:12

    As a last resort and depending on what you're doing, you could use an HTML reader like HtmlAgilityPack(Nuget page) or SGMLReader. SGMLReader will actually convert it to an XmlDocument, so that might be more what you're looking for.

    Of course, HTML isn't XML so you get what you get when using this method.

    0 讨论(0)
  • 2021-01-14 00:16

    There is no such thing in the Framework taht does this by default, neither is there a good solution available that will somehow parse generic invalid xml.

    The most sensable thing yu can do is fixing the XML before starting to read it. Since only the end is cut off you should be able to figure out all open tags and close them.

    0 讨论(0)
  • 2021-01-14 00:26

    Presumably it's all valid until it's truncated... so using XmlReader could work... just be prepared to handle it going bang when it reaches the truncation point.

    Now the XmlReader API isn't terribly pleasant (IMO) so you might want to move to the start of some interesting data (which would have to be complete in itself) and then call the XNode.ReadFrom(XmlReader) method to get that data in a simple-to-use form. Then move to the start of the next element and do the same, etc.

    Sample code:

    using System;
    using System.Linq;
    using System.Xml;
    using System.Xml.Linq;
    
    class Program
    {
        static void Main(string[] args)
        {
            using (XmlReader reader = XmlReader.Create("test.xml"))
            {
                while (true)
                {
                    while (reader.NodeType != XmlNodeType.Element ||
                        reader.LocalName != "Child")
                    {
                        if (!reader.Read())
                        {
                            Console.WriteLine("Finished!");
                        }
                    }
                    XElement element = (XElement) XNode.ReadFrom(reader);
                    Console.WriteLine("Got child: {0}", element.Value);
                }
            }
        }
    }
    

    Sample XML:

    <Root>
      <Parent>
        <Child>First child</Child>
        <Child>Second child</Child>
        <Child>Broken
    

    Sample output:

    Got child: First child Got child: Second child

    Unhandled Exception: System.Xml.XmlException: Unexpected end of file has occurred
    The following elements are not closed: Child, Parent, Root. Line 5, position 18.
       at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
       at System.Xml.XmlTextReaderImpl.ParseElementContent()
       at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
       at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
       at System.Xml.Linq.XElement.ReadElementFrom(XmlReader r, LoadOptions o)
       at System.Xml.Linq.XNode.ReadFrom(XmlReader reader)
       at Program.Main(String[] args)
    

    So obviously you'd want to catch the exception, but you can see that it managed to read the first two elements correctly.

    0 讨论(0)
提交回复
热议问题