Image tag not closing with HTMLAgilityPack

前端 未结 4 1914
感情败类
感情败类 2020-12-01 16:59

Using the HTMLAgilityPack to write out a new image node, it seems to remove the closing tag of an image, e.g. should be but when you check outer html, has .



        
相关标签:
4条回答
  • 2020-12-01 17:35

    This seems to be a bug with HtmlAgilityPack. There are many ways to reproduce this, for example:

    Debug.WriteLine(HtmlNode.CreateNode("<img id=\"bla\"></img>").OuterHtml);
    

    Outputs malformed HTML. Using the suggested fixes in the other answers does nothing.

    HtmlDocument doc = new HtmlDocument();
    doc.OptionOutputAsXml = true;
    HtmlNode node = doc.CreateElement("x");
    node.InnerHtml = "<img id=\"bla\"></img>";
    doc.DocumentNode.AppendChild(node);
    Debug.WriteLine(doc.DocumentNode.OuterHtml);
    

    Produces malformed XML / XHTML like <x><img id="bla"></x>

    I have created a issue in CodePlex for this.

    0 讨论(0)
  • 2020-12-01 17:36

    Telling it to output XML as Micky suggests works, but if you have other reasons not to want XML, try this:

    doc.OptionWriteEmptyNodes = true;
    
    0 讨论(0)
  • 2020-12-01 17:41

    There is an option to turn on XML output that makes this issue go away.

    var htmlDoc = new HtmlDocument();
    htmlDoc.OptionOutputAsXml = true;
    htmlDoc.LoadHtml(rawHtml);
    
    0 讨论(0)
  • 2020-12-01 17:49

    Edit 1:Here is how to fix an HTML Agilty Pack document to correctly display image (img) tags:

    if (HtmlNode.ElementsFlags.ContainsKey("img"))
    {   HtmlNode.ElementsFlags["img"] = HtmlElementFlag.Closed;}
    else
    {   HtmlNode.ElementsFlags.Add("img", HtmlElementFlag.Closed);}
    

    replace "img" for any other tag to fix them as well (input, select, and option come up frequently). Repeat as needed. Keep in mind that this will produce rather than , because of the HAP bug preventing the "closed" and "empty" flags from being set simultaneously. Source: Mike Bridge

    Original answer: Having just labored over solutions to this issue, and not finding any sufficient answers (doctype set properly, using Output as XML, Check Syntax, AutoCloseOnEnd, and Write Empty Node options), I was able to solve this with a dirty hack. This will certainly not solve the issue outright for everyone, but for anyone returning their generated html/xml as a string (EG via a web service), the simple solution is to use fake tags that the agility pack doesn't know to break. Once you have finished doing everything you need to do on your document, call the following method once for each tag giving you a headache (notable examples being option, input, and img). Immediately after, render your final string and do a simple replace for each tag prefixed with some string (in this case "Fix_", and return your string. This is only marginally better in my opinion than the regex solution proposed in another question I cannot locate at the moment (something along the lines of )

    private void fixHAPUnclosedTags(ref HtmlDocument doc, string tagName, bool hasInnerText = false)
    {
        HtmlNode tagReplacement = null;
        foreach(var tag in doc.DocumentNode.SelectNodes("//"+tagName))
        {
            tagReplacement = HtmlTextNode.CreateNode("<fix_"+tagName+"></fix_"+tagName+">");
            foreach(var attr in tag.Attributes)
            {
                tagReplacement.SetAttributeValue(attr.Name, attr.Value);
            }
            if(hasInnerText)//for option tags and other non-empty nodes, the next (text) node will be its inner HTML
            {
                tagReplacement.InnerHtml = tag.InnerHtml + tag.NextSibling.InnerHtml;
                tag.NextSibling.Remove();
            }
            tag.ParentNode.ReplaceChild(tagReplacement, tag);
        }
    }
    

    As a note, if I were a betting man I would guess that MikeBridge's answer above inadvertently identifies the source of this bug in the pack - something is causing the closed and empty flags to be mutually exclusive

    Additionally, after a bit more digging, I don't appear to be the only one who has taken this approach: HtmlAgilityPack Drops Option End Tags

    Furthermore, in cases where you ONLY need non-empty elements, there is a very simple fix listed in that same question, as well as the HAP codeplex discussion here: This essentially sets the empty flag option listed in Mike Bridge's answer above permanently everywhere.

    0 讨论(0)
提交回复
热议问题