I am trying to use WebClient
to download a file from web using a WinForms application. However, I really only want to download HTML file. Any other type I will
I apologize for not been very clear. I wrote a wrapper class that extends WebClient. In this wrapper class, I added cookie container and exposed the timeout property for the WebRequest.
I was using DownloadDataAsync() from this wrapper class and I wasn't able to retrieve content-type from WebResponse of this wrapper class. My main intention is to intercept the response and determine if its of text/html nature. If it isn't, I will abort this request.
I managed to obtain the content-type after overriding WebClient.GetWebResponse(WebRequest, IAsyncResult) method.
The following is a sample of my wrapper class:
public class MyWebClient : WebClient
{
private CookieContainer _cookieContainer;
private string _userAgent;
private int _timeout;
private WebReponse _response;
public MyWebClient()
{
this._cookieContainer = new CookieContainer();
this.SetTimeout(60 * 1000);
}
public MyWebClient SetTimeout(int timeout)
{
this.Timeout = timeout;
return this;
}
public WebResponse Response
{
get { return this._response; }
}
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request.GetType() == typeof(HttpWebRequest))
{
((HttpWebRequest)request).CookieContainer = this._cookieContainer;
((HttpWebRequest)request).UserAgent = this._userAgent;
((HttpWebRequest)request).Timeout = this._timeout;
}
this._request = request;
return request;
}
protected override WebResponse GetWebResponse(WebRequest request)
{
this._response = base.GetWebResponse(request);
return this._response;
}
protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
{
this._response = base.GetWebResponse(request, result);
return this._response;
}
public MyWebClient ServerCertValidation(bool validate)
{
if (!validate) ServicePointManager.ServerCertificateValidationCallback += delegate(object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors) { return true; };
return this;
}
}
WebResponse is an abstract class and the ContentType property is defined in inheriting classes. For instance in the HttpWebRequest object this method is overloaded to provide the content-type header. I'm not sure what instance of WebResponse the WebClient is using. If you ONLY want HTML files, your best of using the HttpWebRequest object directly.
Here is a method using TCP, which http is built on top of. It will return when connected or after the timeout (milliseconds), so the value may need to be changed depending on your situation
var result = false;
try {
using (var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)) {
var asyncResult = socket.BeginConnect(yourUri.AbsoluteUri, 80, null, null);
result = asyncResult.AsyncWaitHandle.WaitOne(100, true);
socket.Close();
}
}
catch { }
return result;
Given your update, you can do this by changing the .Method in GetWebRequest:
using System;
using System.Net;
static class Program
{
static void Main()
{
using (MyClient client = new MyClient())
{
client.HeadOnly = true;
string uri = "http://www.google.com";
byte[] body = client.DownloadData(uri); // note should be 0-length
string type = client.ResponseHeaders["content-type"];
client.HeadOnly = false;
// check 'tis not binary... we'll use text/, but could
// check for text/html
if (type.StartsWith(@"text/"))
{
string text = client.DownloadString(uri);
Console.WriteLine(text);
}
}
}
}
class MyClient : WebClient
{
public bool HeadOnly { get; set; }
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest req = base.GetWebRequest(address);
if (HeadOnly && req.Method == "GET")
{
req.Method = "HEAD";
}
return req;
}
}
Alternatively, you can check the header when overriding GetWebRespons(), perhaps throwing an exception if it isn't what you wanted:
protected override WebResponse GetWebResponse(WebRequest request)
{
WebResponse resp = base.GetWebResponse(request);
string type = resp.Headers["content-type"];
// do something with type
return resp;
}
You could issue the first request with the HEAD verb, and check the content-type response header? [edit] It looks like you'll have to use HttpWebRequest for this, though.
I'm not sure the cause, but perhaps you hadn't downloaded anything yet. This is the lazy way to get the content type of a remote file/page (I haven't checked if this is efficient on the wire. For all I know, it may download huge chunks of content)
Stream connection = new MemoryStream(""); // Just a placeholder
WebClient wc = new WebClient();
string contentType;
try
{
connection = wc.OpenRead(current.Url);
contentType = wc.ResponseHeaders["content-type"];
}
catch (Exception)
{
// 404 or what have you
}
finally
{
connection.Close();
}