WebRequest not returning HTML

一世执手 提交于 2019-12-14 03:37:36

问题


I want to load this http://www.yellowpages.ae/categories-by-alphabet/h.html url, but it returns null

In some question I have heard about adding Cookie container but it is already there in my code.

var MainUrl = "http://www.yellowpages.ae/categories-by-alphabet/h.html";
HtmlWeb web = new HtmlWeb();
web.PreRequest += request =>
{
    request.CookieContainer = new System.Net.CookieContainer();
    return true;
};
web.CacheOnly = false;
var doc = web.Load(MainUrl);

the website opens perfectly fine in browser.


回答1:


You need CookieCollection to get cookies and set UseCookie to true in HtmlWeb.

CookieCollection cookieCollection = null;
var web = new HtmlWeb
{
    //AutoDetectEncoding = true,
    UseCookies = true,
    CacheOnly = false,
    PreRequest = request =>
    {
        if (cookieCollection != null && cookieCollection.Count > 0)
            request.CookieContainer.Add(cookieCollection);

        return true;
    },
    PostResponse = (request, response) => { cookieCollection = response.Cookies; }
};

var doc = web.Load("https://www.google.com");



回答2:


I doubt it is a cookie issue. Looks like a gzip encryption since I got nothing but gibberish when I tried to fetch the page. If it was a cookie issue the response should return an error saying so. Anyhow. Here is my solution to your problem.

public static void Main(string[] args)
{
    HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    try
    {
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.yellowpages.ae/categories-by-alphabet/h.html");
        request.Method = "GET";
        request.ContentType = "text/html;charset=utf-8";
        request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

        using (var response = (HttpWebResponse)request.GetResponse())
        {
            using (var stream = response.GetResponseStream())
            {
                doc.Load(stream, Encoding.GetEncoding("utf-8"));
            }
        }
    }
    catch (WebException ex)
    {
        Console.WriteLine(ex.Message);
    }
    Console.WriteLine(doc.DocumentNode.InnerHtml);
    Console.ReadKey();
}

All it does is that it decrypts/extracts the gzip message that we receive. How did I know it was GZIP you ask? The response stream from the debugger said that the ContentEncoding was gzip.

Basically just add:

request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

To your code and you're good.



来源:https://stackoverflow.com/questions/47299893/webrequest-not-returning-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!