HtmlAgilityPack — Does <form> close itself for some reason?

落爺英雄遲暮 提交于 2019-12-17 04:05:07

问题


I just wrote up this test to see if I was crazy...

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace HtmlAgilityPackFormBug
{
    class Program
    {
        static void Main(string[] args)
        {
            var doc = new HtmlDocument();
            doc.LoadHtml(@"
<!DOCTYPE html>
<html>
    <head>
        <title>Form Test</title>
    </head>
    <body>
        <form>
            <input type=""text"" />
            <input type=""reset"" />
            <input type=""submit"" />
        </form>
    </body>
</html>
");
            var body = doc.DocumentNode.SelectSingleNode("//body");
            foreach (var node in body.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element))
                Console.WriteLine(node.XPath);
            Console.ReadLine();
        }
    }
}

And it outputs:

/html[1]/body[1]/form[1]
/html[1]/body[1]/input[1]
/html[1]/body[1]/input[2]
/html[1]/body[1]/input[3]

But, if I change <form> to <xxx> it gives me:

/html[1]/body[1]/xxx[1]

(As it should). So... it looks like those input elements are not contained within the form, but directly within the body, as if the <form> just closed itself off immediately. What's up with that? Is this a bug?


Digging through the source, I see:

ElementsFlags.Add("form", HtmlElementFlag.CanOverlap | HtmlElementFlag.Empty);

It has the "empty" flag, like META and IMG. Why?? Forms are most definitely not supposed to be empty.


回答1:


This is also reported in this workitem. It contains a suggested workaround from DarthObiwan.

You can change this without recompiling. The ElementFlags list is a static property on the HtmlNode class. It can be removed with

    HtmlNode.ElementsFlags.Remove("form");

before doing the document load




回答2:


Since I'm the original HAP author, I can explain why it's marked as empty :)

This is because when HAP was designed, back in 2000, HTML 3.2 was the standard. You're probably aware that tags can perfectly overlap in HTML. That is: <b>bold<i>italic and bold</b>italic</i> (bolditalic and bolditalic) is supported by all browsers (although it's not officially in the HTML specification). And the FORM tag can also perfectly overlap as well.

Since HAP has been designed to handle any HTML content, rather than break most pages that you could find at that time, we just decided to handle overlapping tags as EMPTY (using the ElementFlags property) so:

  • you can still load them
  • you can save them back without breaking the original HTML (If you don't need what's inside the form in any programmatic way).

The only thing you cannot do is work with them with the API, using the tree model, nor with XSL, or anything programmatic. Today, with XHTML/XML almost everywhere, this sounds strange, but that's why I created the ElementFlags :)



来源:https://stackoverflow.com/questions/4218847/htmlagilitypack-does-form-close-itself-for-some-reason

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!