WebClient hangs until timeout

房东的猫 提交于 2019-12-02 09:04:49

问题


I try do download a web page using the WebClient, but it hangs until the timeout in WebClient is reached, and then fails with an Exception.

The following code will not work

WebClient client = new WebClient();
string url = "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
string page = client.DownloadString(url);

Using a different URL, the transfer works fine. For example

WebClient client = new WebClient();
string url = "https://www.ariva.de/apple-aktie";
string page = client.DownloadString(url);

completes very quick and has the whole html in the page variable.

Using a HttpClient or WebRequest/WebResponse gives the same result on the first URL: block until timeout exception.

Both URLs load fine in a browser, in roughly 2-5 seconds. Any idea what the problem is, and what solution is available?

I noticed that when using a WebBrowser control on a Windows Forms dialog, the first URL loads with 20+ javascript errors that need to be confirm-clicked. Same can be observed when developer tools are open in a browser when accessing the first URL.

However, WebClient does NOT act on the return it gets. It does not run the javascript, and does not load referenced pictures, css or other scripts, so this should not be a problem.

Thanks!

Ralf


回答1:


The first site, "https://www.nasdaq.com/de/symbol/aapl/dividend-history";, requires:

  • ServicePointManager.SecurityProtocol= SecurityProtocolType.Tls12
  • ServicePointManager.ServerCertificateValidationCallback
  • A set User-Agent Header
  • A CookieContainer is, apparently, not required. It should be set anyway.

The User-agent here is important. If a recent User-agent is specified in the WebRequest.UserAgent, the WebSite will activate the Http 2.0 protocol and HSTS (HTTP Strict Transport Security)) that are supported/understood only by recent Browsers (as a reference, FireFox 56 or newer).

Using a less recent Browser as User-agent is necessary, otherwise the WebSite will expect (and wait for) a dynamic response. Using an older User-agent, the WebSite will activate the Http 1.1 protocol.

The second site, "https://www.ariva.de/apple-aktie";, requires:

  • ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
  • No Server Certificate validation is required
  • No specific User-agent is required

I suggest to setup a WebRequest (or a correspnding HttpClient setup) this way:
(WebClient could work, but it'ld probably require a derived Custom Control)

private async void button1_Click(object sender, EventArgs e)
{
    button1.Enabled = false;
    Uri uri = new Uri("https://www.nasdaq.com/de/symbol/aapl/dividend-history");
    string destinationFile = "[Some Local File]";
    await HTTPDownload(uri, destinationFile);
    button1.Enabled = true;
}


CookieContainer httpCookieJar = new CookieContainer();

//The 32bit IE11 header is the User-agent used here
public async Task HTTPDownload(Uri resourceURI, string filePath)
{
    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
    ServicePointManager.ServerCertificateValidationCallback += (s, cert, ch, sec) => { return true; };
    ServicePointManager.DefaultConnectionLimit = 50;

    HttpWebRequest httpRequest = WebRequest.CreateHttp(resourceURI);

    try
    {
        httpRequest.CookieContainer = httpCookieJar;
        httpRequest.Timeout = (int)TimeSpan.FromSeconds(15).TotalMilliseconds;
        httpRequest.AllowAutoRedirect = true;
        httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
        httpRequest.ServicePoint.Expect100Continue = false;
        httpRequest.UserAgent = "Mozilla / 5.0(Windows NT 6.1; WOW32; Trident / 7.0; rv: 11.0) like Gecko";
        httpRequest.Accept = "ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        httpRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8");
        httpRequest.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");

        using (HttpWebResponse httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
        using (Stream responseStream = httpResponse.GetResponseStream())
        {
            if (httpResponse.StatusCode == HttpStatusCode.OK)
            {
                try
                {
                    int buffersize = 132072;
                    using (FileStream fileStream = File.Create(filePath, buffersize, FileOptions.Asynchronous))
                    {
                        int read;
                        byte[] buffer = new byte[buffersize];
                        while ((read = await responseStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
                        {
                            await fileStream.WriteAsync(buffer, 0, read);
                        }
                    };
                }
                catch (DirectoryNotFoundException) { /* Log or throw */}
                catch (PathTooLongException) { /* Log or throw */}
                catch (IOException) { /* Log or throw */}
            }
        };
    }
    catch (WebException) { /* Log and message */} 
    catch (Exception) { /* Log and message */}
}

The first WebSite (nasdaq.com) returned payload length is 101.562 bytes
The second WebSite (www.ariva.de) returned payload length is 56.919 bytes




回答2:


Obviously there is a problem with downloading that link (incorrect url, unothorized access, ...), however you may use Async Method to solve the socking part:

  WebClient client = new WebClient();
  client.DownloadStringCompleted += (s, e) =>
  {
       //here deal with downloaded file
  };
  client.DownloadStringAsync(url);


来源:https://stackoverflow.com/questions/53872461/webclient-hangs-until-timeout

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!