Get .txt file instead of .jpg - using Webclient and DownloadFile();

邮差的信 提交于 2021-02-11 12:09:21

问题


Get .txt file instead of .jpg - using Webclient and DownloadFile();

I'm trying to download the .jpg from this URL:

http://1.bp.blogspot.com/_pK6J3MTn5co/S6kuH3aqbeI/AAAAAAAACUY/06axvmjU91k/s1600-h/avengers02_B&W_UL.jpg

Using this code:

private void TEST_button1_Click(object sender, EventArgs e)
{
    WebClient MyDownloader = new WebClient();
    MyDownloader.DownloadFile(@"http://1.bp.blogspot.com/_pK6J3MTn5co/S6kuH3aqbeI/AAAAAAAACUY/06axvmjU91k/s1600-h/avengers02_B&W_UL.jpg", @"c:\test.jpg");
}

However, when I run this, I end up with a file called test.jpg, which contains html mark up... :

<html>
<head>
<title>avengers02_B&amp;W_UL.jpg (image)</title>
<script type="text/javascript">
<!--
if (top.location != self.location) top.location = self.location;
// -->
</script>
</head>
<body bgcolor="#ffffff" text="#000000">
<img src="http://1.bp.blogspot.com/_pK6J3MTn5co/S6kuH3aqbeI/AAAAAAAACUY/06axvmjU91k/s1600/avengers02_B%26W_UL.jpg" alt="[avengers02_B&amp;W_UL.jpg]" border=0>
</body>
</html>

How can I download the actual .jpg?

Any help is greatly appreciated - thank you!


回答1:


There is a way to do it. First you download the HTML content to a string and extract the correct image URL. Then use the correct URL to download the file.

 WebClient client = new WebClient();
 var path = @"http://1.bp.blogspot.com/_pK6J3MTn5co/S6kuH3aqbeI/AAAAAAAACUY/06axvmjU91k/s1600-h/avengers02_B&W_UL.jpg";

 var content = client.DownloadString(path);
 System.Text.RegularExpressions.Regex regex = new Regex(@"(?<=<img\s+[^>]*?src=(?<q>['""]))(?<url>.+?)(?=\k<q>)");
 var match = regex.Match(content);
 if (match.Success)
 {
     client.DownloadFile(match.Value, @"e:\test1.jpg");
 } 



回答2:


If server returns HTML to your request at particular Url you can't do much to force it to return something else at that Url.

What you can do is parse response with HtmlAgilityPack and find url to actual image and get it in another request.




回答3:


Clicking that link causes 2 downloads, first a page of HTML (mislabelled with suffix .jpg), and next an image in the HTML.

So perhaps you need to fetch the url of the img tag in the HTML fetched by the previous request?

http://1.bp.blogspot.com/_pK6J3MTn5co/S6kuH3aqbeI/AAAAAAAACUY/06axvmjU91k/s1600/avengers02_B%26W_UL.jpg

I'm guessing that removing -h from the original URL might point to the actual file that you're after.

Here's hoping that you have permission to scrape these files...



来源:https://stackoverflow.com/questions/11303101/get-txt-file-instead-of-jpg-using-webclient-and-downloadfile

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!