HtmlAgilityPack: how to create indented HTML?

假装没事ソ 提交于 2019-11-30 20:03:11
Alex

As far as I know, HtmlAgilityPack cannot do this. But you could look through html tidy packs which are proposed in similar questions:

No, and it's a "by design" choice. There is a big difference between XML (or XHTML, which is XML, not HTML) where - most of the times - whitespaces are no specific meaning, and HTML.

This is not a so minor improvement, as changing whitespaces can change the way some browsers render a given HTML chunk, especially malformed HTML (that is in general well handled by the library). And the Html Agility Pack was designed to keep the way the HTML is rendered, not to minimize the way the markup is written.

I'm not saying it's not feasible or plain impossible. Obviously you can convert to XML and voilà (and you could write an extension method to make this easier) but the rendered output may be different, in the general case.

Fast, Reliable, Pure C#, .NET Core compatible AngleSharp

You can parse it with AngleSharp which provides a way to auto indent:

var parser = new HtmlParser();
var document = parser.ParseDocument(text);
using (var writer = new StringWriter())
{
    document.ToHtml(writer, new PrettyMarkupFormatter
                            {
                                Indentation = "\t",
                                NewLine = "\n"
                            });
    var indentedText = writer.ToString();
}

I made the same experience even though HtmlAgilityPack is great to read and modify Html (or in my case asp) files you cannot create readable output.

However, I ended up in writing some lines of code which work for me:

Having a HtmlDocument named "m_htmlDocument" I create my HTML file as follows:

file = new System.IO.StreamWriter(_sFullPath);
            if (m_htmlDocument.DocumentNode != null)
                foreach (var node in m_htmlDocument.DocumentNode.ChildNodes)
                    WriteNode(file, node, 0);

and

void WriteNode(System.IO.StreamWriter _file, HtmlNode _node, int _indentLevel)
    {
        // check parameter
        if (_file == null) return;
        if (_node == null) return;

        // init 
        string INDENT = " ";
        string NEW_LINE = System.Environment.NewLine;

        // case: no children
        if(_node.HasChildNodes == false)
        {
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(_node.OuterHtml);
            _file.Write(NEW_LINE);
        }

        // case: node has childs
        else
        {
            // indent
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);

            // open tag
            _file.Write(string.Format("<{0} ",_node.Name));
            if(_node.HasAttributes)
                foreach(var attr in _node.Attributes)
                    _file.Write(string.Format("{0}=\"{1}\" ", attr.Name, attr.Value));
            _file.Write(string.Format(">{0}",NEW_LINE));

            // childs
            foreach(var chldNode in _node.ChildNodes)
                WriteNode(_file, chldNode, _indentLevel + 1);

            // close tag
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(string.Format("</{0}>{1}", _node.Name,NEW_LINE));
        }
    }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!