Retrieve data from HTML table in C#

后端 未结 2 981
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-19 07:15

I want to retrieve data from HTML document. I am scraping data from a web site I almost done but get issue when tried to retrieve data from the table. Here is HTML code

相关标签:
2条回答
  • 2021-01-19 07:39

    I prefer using the dynamic type and the DomElement property, but you must be using .net 4+.

    For tables, the main advantage here is that you don't have to loop through everything. If you know the row and column that you are looking for, then you can just target the important data by row and column numbers instead of looping through the whole table.

    The other big advantage is that you can basically use the entire DOM, reading more than just the contents of the table. Make sure you use lowercase properties as required in javascript, even though you are in c#.

    HtmlElement myTableElement;
    //Set myTableElement using any GetElement...  method.
    //Use a loop or square bracket index if the method returns an HtmlElementCollection.
    dynamic myTable = myTableElement.DomElement;
    for (int i = 0; i < myTable.rows.length; i++)
    {
        for (int j = 0; j < myTable.rows[i].cells.length; j++)
        {
            string CellContents = myTable.rows[i].cells[j].innerText;
    
            //You are not limited to innerText; you have the whole DOM available.
    
            //Do something with the CellContents.
    
        }
    }
    
    0 讨论(0)
  • 2021-01-19 07:42

    Don't you have any control over the page being displayed within the Webbrowser control? If you do it's better you add an id field for status TD. Then your life would be much easier.

    Anyway, here's how you could search a value within a table.

    HtmlElementCollection tables = this.WB.Document.GetElementsByTagName("table");
    
                foreach (HtmlElement TBL in tables)
                {
                    foreach (HtmlElement ROW in TBL.All)
                    {
    
                        foreach (HtmlElement CELL in ROW.All)
                        {
    
                            // Now you are looping through all cells in each table
    
                            // Here you could use CELL.InnerText to search for "Status" or "Approved"
                        }
                    }
                }
    

    But, this is not a good approach as you are looping through each table and each cell within each table to find your text. Keep this as the last option.

    Hope this helps you to get an idea.

    0 讨论(0)
提交回复
热议问题