Html Agility Pack how to get dynamically generated content after page loads

こ雲淡風輕ζ 提交于 2021-01-28 05:50:45

问题


I am attempting to get information from "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys" specifically Div c-ProductList row ss-targeted but no information seems to be retrieved, any clues

var test = page.DocumentNode.SelectNodes("//div[@class='c-ProductList row ss-targeted']");

回答1:


The content you want to get is generated after the page loads, using Javascript and Ajax. HAP cannot get it unless it runs a browser in background and execute the scripts on the page.

.Net Core 2.0

Pre-requisites: you need Chrome web browser installed in your PC.

  1. Create a console application

  2. Install Nuget packages Install-Package HtmlAgilityPack Install-Package Selenium.WebDriver Install-Package Selenium.Chrome.WebDriver

  3. Replace Main method by the following

Code:

    static void Main(string[] args)
    {
        string url = "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys";
        var browser = new ChromeDriver(Environment.CurrentDirectory);
        browser.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(30);
        browser.Navigate().GoToUrl(url);

        var results = browser.FindElementByClassName("ss-results");
        var doc = new HtmlDocument();
        doc.LoadHtml(results.GetAttribute("innerHTML"));

        // Show results
        var list = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row ss-targeted']");
        foreach (var title in list.SelectNodes(".//h2[@class='c-ProductListItem__title ng-binding']"))
        {
            Console.WriteLine(title.InnerText);
        }
        Console.ReadLine();
    }

.Net 4.6

  1. Create a console application

  2. Install Nuget package Install-Package HtmlAgilityPack

  3. In Solution Explorer add reference to System.Windows.Form

  4. Add using statements as required

  5. Replace Main method by the following

Code:

[STAThread]
static void Main(string[] args)
{
    string url = "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys";

    var web = new HtmlWeb();
    web.BrowserTimeout = TimeSpan.FromSeconds(30);

    var doc = web.LoadFromBrowser(url, o =>
    {
        var webBrowser = (WebBrowser)o;

        // Wait until the list shows up
        return webBrowser.Document.Body.InnerHtml.Contains("c-ProductList");
    });

    // Show results
    var list = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row ss-targeted']");
    foreach (var title in list.SelectNodes(".//h2[@class='c-ProductListItem__title ng-binding']"))
    {
        Console.WriteLine(title.InnerText);
    }
    Console.ReadLine();
}

Displays a list starting with:

Iron Man Mark L

John Wick

The Punisher War Machine Armor

Wonder Woman Deluxe Version



来源:https://stackoverflow.com/questions/62671301/html-agility-pack-how-to-get-dynamically-generated-content-after-page-loads

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!