问题
I am attempting to get information from "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys" specifically Div c-ProductList row ss-targeted but no information seems to be retrieved, any clues
var test = page.DocumentNode.SelectNodes("//div[@class='c-ProductList row ss-targeted']");
回答1:
The content you want to get is generated after the page loads, using Javascript and Ajax. HAP cannot get it unless it runs a browser in background and execute the scripts on the page.
.Net Core 2.0
Pre-requisites: you need Chrome web browser installed in your PC.
Create a console application
Install Nuget packages
Install-Package HtmlAgilityPack
Install-Package Selenium.WebDriver
Install-Package Selenium.Chrome.WebDriver
Replace
Main
method by the following
Code:
static void Main(string[] args)
{
string url = "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys";
var browser = new ChromeDriver(Environment.CurrentDirectory);
browser.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(30);
browser.Navigate().GoToUrl(url);
var results = browser.FindElementByClassName("ss-results");
var doc = new HtmlDocument();
doc.LoadHtml(results.GetAttribute("innerHTML"));
// Show results
var list = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row ss-targeted']");
foreach (var title in list.SelectNodes(".//h2[@class='c-ProductListItem__title ng-binding']"))
{
Console.WriteLine(title.InnerText);
}
Console.ReadLine();
}
.Net 4.6
Create a console application
Install Nuget package
Install-Package HtmlAgilityPack
In Solution Explorer add reference to
System.Windows.Form
Add
using
statements as requiredReplace
Main
method by the following
Code:
[STAThread]
static void Main(string[] args)
{
string url = "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys";
var web = new HtmlWeb();
web.BrowserTimeout = TimeSpan.FromSeconds(30);
var doc = web.LoadFromBrowser(url, o =>
{
var webBrowser = (WebBrowser)o;
// Wait until the list shows up
return webBrowser.Document.Body.InnerHtml.Contains("c-ProductList");
});
// Show results
var list = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row ss-targeted']");
foreach (var title in list.SelectNodes(".//h2[@class='c-ProductListItem__title ng-binding']"))
{
Console.WriteLine(title.InnerText);
}
Console.ReadLine();
}
Displays a list starting with:
Iron Man Mark L
John Wick
The Punisher War Machine Armor
Wonder Woman Deluxe Version
来源:https://stackoverflow.com/questions/62671301/html-agility-pack-how-to-get-dynamically-generated-content-after-page-loads