问题
I try do download a web page using the WebClient, but it hangs until the timeout in WebClient is reached, and then fails with an Exception.
The following code will not work
WebClient client = new WebClient();
string url = "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
string page = client.DownloadString(url);
Using a different URL, the transfer works fine. For example
WebClient client = new WebClient();
string url = "https://www.ariva.de/apple-aktie";
string page = client.DownloadString(url);
completes very quick and has the whole html in the page variable.
Using a HttpClient or WebRequest/WebResponse gives the same result on the first URL: block until timeout exception.
Both URLs load fine in a browser, in roughly 2-5 seconds. Any idea what the problem is, and what solution is available?
I noticed that when using a WebBrowser control on a Windows Forms dialog, the first URL loads with 20+ javascript errors that need to be confirm-clicked. Same can be observed when developer tools are open in a browser when accessing the first URL.
However, WebClient does NOT act on the return it gets. It does not run the javascript, and does not load referenced pictures, css or other scripts, so this should not be a problem.
Thanks!
Ralf
回答1:
The first site, "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
, requires:
- ServicePointManager.SecurityProtocol
= SecurityProtocolType.Tls12
- ServicePointManager.ServerCertificateValidationCallback
- A set User-Agent Header
- A CookieContainer is, apparently, not required. It should be set anyway.
The User-agent
here is important. If a recent User-agent
is specified in the WebRequest.UserAgent, the WebSite will activate the Http 2.0
protocol and HSTS
(HTTP Strict Transport Security)) that are supported/understood only by recent Browsers (as a reference, FireFox 56 or newer).
Using a less recent Browser as User-agent
is necessary, otherwise the WebSite will expect (and wait for) a dynamic response. Using an older User-agent
, the WebSite will activate the Http 1.1
protocol.
The second site, "https://www.ariva.de/apple-aktie";
, requires:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
- No Server Certificate validation is required
- No specific User-agent is required
I suggest to setup a WebRequest (or a correspnding HttpClient setup) this way:
(WebClient could work, but it'ld probably require a derived Custom Control)
private async void button1_Click(object sender, EventArgs e)
{
button1.Enabled = false;
Uri uri = new Uri("https://www.nasdaq.com/de/symbol/aapl/dividend-history");
string destinationFile = "[Some Local File]";
await HTTPDownload(uri, destinationFile);
button1.Enabled = true;
}
CookieContainer httpCookieJar = new CookieContainer();
//The 32bit IE11 header is the User-agent used here
public async Task HTTPDownload(Uri resourceURI, string filePath)
{
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
ServicePointManager.ServerCertificateValidationCallback += (s, cert, ch, sec) => { return true; };
ServicePointManager.DefaultConnectionLimit = 50;
HttpWebRequest httpRequest = WebRequest.CreateHttp(resourceURI);
try
{
httpRequest.CookieContainer = httpCookieJar;
httpRequest.Timeout = (int)TimeSpan.FromSeconds(15).TotalMilliseconds;
httpRequest.AllowAutoRedirect = true;
httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
httpRequest.ServicePoint.Expect100Continue = false;
httpRequest.UserAgent = "Mozilla / 5.0(Windows NT 6.1; WOW32; Trident / 7.0; rv: 11.0) like Gecko";
httpRequest.Accept = "ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
httpRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8");
httpRequest.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");
using (HttpWebResponse httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
using (Stream responseStream = httpResponse.GetResponseStream())
{
if (httpResponse.StatusCode == HttpStatusCode.OK)
{
try
{
int buffersize = 132072;
using (FileStream fileStream = File.Create(filePath, buffersize, FileOptions.Asynchronous))
{
int read;
byte[] buffer = new byte[buffersize];
while ((read = await responseStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await fileStream.WriteAsync(buffer, 0, read);
}
};
}
catch (DirectoryNotFoundException) { /* Log or throw */}
catch (PathTooLongException) { /* Log or throw */}
catch (IOException) { /* Log or throw */}
}
};
}
catch (WebException) { /* Log and message */}
catch (Exception) { /* Log and message */}
}
The first WebSite (nasdaq.com
) returned payload length is 101.562
bytes
The second WebSite (www.ariva.de
) returned payload length is 56.919
bytes
回答2:
Obviously there is a problem with downloading that link (incorrect url, unothorized access, ...), however you may use Async Method to solve the socking part:
WebClient client = new WebClient();
client.DownloadStringCompleted += (s, e) =>
{
//here deal with downloaded file
};
client.DownloadStringAsync(url);
来源:https://stackoverflow.com/questions/53872461/webclient-hangs-until-timeout