HtmlAgilityPack HtmlWeb.Load returning empty Document

笑着哭i 提交于 2019-12-17 16:55:14

问题


I have been using HtmlAgilityPack for the last 2 months in a Web Crawler Application with no issues loading a webpage.

Now when I try to load a this particular webpage, the document OuterHtml is empty, so this test fails

var url = "http://www.prettygreen.com/";
var htmlWeb = new HtmlWeb();
var htmlDoc = htmlWeb.Load(url);
var outerHtml = htmlDoc.DocumentNode.OuterHtml;
Assert.AreNotEqual("", pageHtml);

I can load another page from the site with no problems, such as setting

url = "http://www.prettygreen.com/news/";

In the past I once had an issue with encodings, I played around with htmlWeb.OverrideEncoding and htmlWeb.AutoDetectEncoding with no luck. I have no idea what could be the issue here with this webpage.


回答1:


It seems this website requires cookies to be enabled. So creating a cookie container for your web request should solve the issue:

var url = "http://www.prettygreen.com/";
var htmlWeb = new HtmlWeb();
htmlWeb.PreRequest += request =>
    {
        request.CookieContainer = new System.Net.CookieContainer();
        return true;
    };
var htmlDoc = htmlWeb.Load(url);
var outerHtml = htmlDoc.DocumentNode.OuterHtml;
Assert.AreNotEqual("", outerHtml);


来源:https://stackoverflow.com/questions/13400493/htmlagilitypack-htmlweb-load-returning-empty-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!