问题
I have a method to get ids and xpaths if given a particular url. How do I pass in the username and password with the request so that I can scrape a url that requires a username and password?
using HtmlAgilityPack;
_web = new HtmlWeb();
internal Dictionary<string, string> GetidsAndXPaths(string url)
{
var webidsAndXPaths = new Dictionary<string, string>();
var doc = _web.Load(url);
var nodes = doc.DocumentNode.SelectNodes("//*[@id]");
if (nodes == null) return webidsAndXPaths;
// code to get all the xpaths and ids
Should I use a web request to get the page source and then pass that file into the method above?
var wc = new WebClient();
wc.Credentials = new NetworkCredential("UserName", "Password");
wc.DownloadFile("http://somewebsite.com/page.aspx", @"C:\localfile.html");
回答1:
HtmlWeb.Load
has a number of overloads, these accept either an instance of NetworkCredential
or you can pass in a username and password directly.
Name // Description
Public method Load(String) //Gets an HTML document from an Internet resource.
Public method Load(String, String) //Loads an HTML document from an Internet resource.
Public method Load(String, String, WebProxy, NetworkCredential) //Loads an HTML document from an Internet resource.
Public method Load(String, String, Int32, String, String) //Loads an HTML document from an Internet resource.
You do not need to pass in a WebProxy
instance, or you can pass in the system default one.
Alternatively you can wire up the HtmlWeb.PreRequest
and setup the credentials for the request.
htmlWeb.PreRequest += (request) => {
request.Credentials = new NetworkCredential(...);
return true;
};
来源:https://stackoverflow.com/questions/23298532/htmlagilitypack-and-authentication