问题
I have a "small" problem htmlagilitypack(HAP). When I tried to get data from a website I get this error:
An unhandled exception of type 'System.ArgumentException' occurred in mscorlib.dll
Additional information: 'gzip' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
I'm using this piece of code to get the data from the website:
HtmlWeb page = new HtmlWeb();
var url = "https://kat.cr/";
var data = page.Load(url);
After this code i get that error. I tried everything from the google but nothing helped.
Can someone tell me how to resolve this problem ?
Thank you
回答1:
HtmlWeb
doesn't support downloading from https. So instead, you can use WebClient
with a bit of modification to automatically decompress GZip
:
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
Then use HtmlDocument.LoadHtml()
to populate your HtmlDocument
instance from HTML string :
var url = "https://kat.cr/";
var data = new MyWebClient().DownloadString(url);
var doc = new HtmlDocument();
doc.LoadHtml(data);
回答2:
You can intercept the request when using HtmlWeb
to modify it based on your requirements.
var page = new HtmlWeb()
{
PreRequest = request =>
{
// Make any changes to the request object that will be used.
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return true;
}
};
var url = "https://kat.cr/";
var data = page.Load(url);
来源:https://stackoverflow.com/questions/36219685/cant-download-html-data-from-https-url-using-htmlagilitypack