How would I get the inputs from a certain form with HtmlAgility Pack? Lang: C#.net

前端 未结 1 1220
失恋的感觉
失恋的感觉 2021-01-06 11:50

Code can explain this problem much better than I can. I have also included alternate ways i\'ve tried to do this. If possible, please explain why these other methods didn\'t

相关标签:
1条回答
  • 2021-01-06 12:44

    I found the answer! Look at code below as it contains solution and explanation! :)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using HtmlAgilityPack;
    using System.Collections;
    
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                string source = @"
                    <form name='form1' action='action1' method='method1' id='id1'>
                    <input type='text1.1' name='name1.1' value='value1.1' />
                    <input type='text1.2' name='name1.2' value='value1.2' />
                </form>
                <form name='form2' action='action2' method='method2' id='id2'>
                    <input type='text2.1' name='name2.1' value='value2.1' />
                    <input type='text2.2' name='name2.2' value='value2.2' />
                </form>
                        ";
                List<HtmlAttribute> formAttributes = new List<HtmlAttribute>();
                IEnumerable<HtmlNode> inputs;
                /*
                 * The line below is the major reason that this solution "worked" and the other didn't
                 * */
                HtmlNode.ElementsFlags.Remove("form");
                /*
                 * I was going through the HtmlAgilityPack forum, and stumbled upon this little tidbit of info:
                 * 
                 * "This is because by default, Forms are parsed as empty nodes - this is because forms are allowed to
                 overlap other elements in the HTML spec.
    
                    In other words, the following is technically legal HTML, even though it gives us developer hives:
    
                    <table>
                    <form>
                    <some input elements>
                    </table>
                    </form>
    
                    Here, the form overlaps the closing of the table and when properly rendered, will be contained inside the table.
                    Since HtmlDocument attempts to allow this as valid without automatically correcting the HTML, HtmlDocument by default
                    makes no attempt to populate the child nodes of the form.
                    Ok. All that is merely an introduction. You can get around this default behavior by adding the following line:
                    HtmlNode.ElementsFlags.Remove("form");
                    before you make ANY use of HtmlDocument. This will allow it to parse the nodes of the form, but it sacrifices 
                    the ability of the form to overlap other nodes. It will force the form to be closed properly."
                 * 
                 * HtmlAgilityPack didn't put the inputs as the childnode of each form because of "technically legal HTML" that could mess things up a bit,
                 * so the only thing I had to do is remove the element flag! Enjoy the code below, it should be pretty self explanatory.
                 * */
    
                HtmlDocument htmlDoc = new HtmlDocument();
                htmlDoc.OptionOutputAsXml = true;
                htmlDoc.OptionAutoCloseOnEnd = true;
                htmlDoc.LoadHtml(source);
    
                var forms = htmlDoc.DocumentNode.Descendants("form");
                foreach (var form in forms)
                {
                    inputs = form.ChildNodes
                        .Where<HtmlNode>(a => a.OriginalName.Contains("input")); // woo hoo, finally figuring out what linq is. Sort of like mysql when I was coding php!
    
                    Console.WriteLine(form.Attributes[0].Value + " attributes:" + Environment.NewLine + "------------------");
                    foreach (var input in inputs)
                    {
                        IEnumerable<HtmlAttribute> attributes;
                        attributes = input.Attributes;
                        foreach (var att in attributes)
                        {
                            Console.WriteLine("Name: " + att.Name + Environment.NewLine
                                   + "Value: " + att.Value + Environment.NewLine);
                            formAttributes.Add(att);
                        }
                    }
                    Console.WriteLine(); // Simply making everything look pretty with a newline after each form name/input outerhtml display.
                }
                Console.Read();
            }
    
        }
    }
    
    0 讨论(0)
提交回复
热议问题