问题
I'm trying to download a number of pdf files automagically given a list of urls.
Here's the code I have:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
var encoding = new UTF8Encoding();
request.Headers.Add(HttpRequestHeader.AcceptLanguage, "en-gb,en;q=0.5");
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0";
HttpWebResponse resp = (HttpWebResponse)request.GetResponse();
BinaryReader reader = new BinaryReader(resp.GetResponseStream());
FileStream stream = new FileStream("output/" + date.ToString("yyyy-MM-dd") + ".pdf",FileMode.Create);
BinaryWriter writer = new BinaryWriter(stream);
while (reader.PeekChar() != -1)
{
writer.Write(reader.Read());
}
writer.Flush();
writer.Close();
So, I know the first part works. I was originally getting it and reading it using a TextReader - but that gave me corrupted pdf files (since pdfs are binary files).
Right now if I run it, reader.PeekChar() is always -1 and nothing happens - I get an empty file.
While debugging it, I noticed that reader.Read() was actually giving different numbers when I was invoking it - so maybe Peek is broken.
So I tried something very dirty
try
{
while (true)
{
writer.Write(reader.Read());
}
}
catch
{
}
writer.Flush();
writer.Close();
Now I'm getting a very tiny file with some garbage in it, but its still not what I'm looking for.
So, anyone can point me in the right direction?
Additional Information:
The header doesn't suggest its compressed or anything else.
HTTP/1.1 200 OK
Content-Type: application/pdf
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Fri, 10 Aug 2012 11:15:48 GMT
Content-Length: 109809
回答1:
Skip the BinaryReader
and BinaryWriter
and just copy the input stream to the output FileStream
. Briefly
var fileName = "output/" + date.ToString("yyyy-MM-dd") + ".pdf";
using (var stream = File.Create(fileName))
resp.GetResponseStream().CopyTo(stream);
回答2:
Why not use the WebClient
class?
using (WebClient webClient = new WebClient())
{
webClient.DownloadFile("url", "filePath");
}
回答3:
Your question asks about WebClient
but your code shows you using Raw HTTP Requests & Resposnses.
Why don't you actually use the System.Net.WebClient ?
using(System.Net.WebClient wc = new WebClient())
{
wc.DownloadFile("http://www.site.com/file.pdf", "C:\\Temp\\File.pdf");
}
回答4:
private void Form1_Load(object sender, EventArgs e) {
WebClient webClient = new WebClient();
webClient.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed);
webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);
webClient.DownloadFileAsync(new Uri("https://www.colorado.gov/pacific/sites/default/files/Income1.pdf"), @"output/" + DateTime.Now.Ticks ("")+ ".pdf", FileMode.Create);
}
private void ProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
progressBar = e.ProgressPercentage;
}
private void Completed(object sender, AsyncCompletedEventArgs e)
{
MessageBox.Show("Download completed!");
}
}
}
来源:https://stackoverflow.com/questions/11901381/downloading-pdf-file-using-webrequests